Bug 489862 – Basic URI operations

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 489862 - Basic URI operations


Summary:	Basic URI operations


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Duplicates:	550110 (view as bug list)
Depends on:
Blocks:	746993

Reported:	2007-10-24 16:41 UTC by Owen Taylor
Modified:	2018-05-24 11:08 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
guri: new URI parsing and generating functions (143.17 KB, patch) 2015-03-27 15:54 UTC, Marc-Andre Lureau	none	Details \| Review
guri: new URI parsing and generating functions (143.17 KB, patch) 2015-03-27 16:04 UTC, Marc-Andre Lureau	none	Details \| Review
guri: new URI parsing and generating functions (142.15 KB, patch) 2015-03-27 16:07 UTC, Marc-Andre Lureau	none	Details \| Review
guri: new URI parsing and generating functions (142.34 KB, patch) 2015-03-30 15:46 UTC, Marc-Andre Lureau	needs-work	Details \| Review

Description Owen Taylor 2007-10-24 16:41:11 UTC

URI's are very frequently manipulated these days. It would be convenient,
and prevent a lot of hacky-not-quite-right code if GLib had functions
to do things like:

 * Check if an URI is absolute
 * Resolve an URI relative to a base URI
 * Check if an URI has a particular scheme
 * Parse an URI into components

Comment 1 Havoc Pennington 2007-10-25 18:51:36 UTC

Don't forget escaping.

Comment 2 Matthias Clasen 2008-01-07 05:04:20 UTC

Escaping is at least partially covered by alex' gurifuncs.h now, the rest could conveivably be added there.

Comment 3 Christian Neumair 2009-05-17 21:18:28 UTC

>  * Check if an URI has a particular scheme

Works with g_uri_parse_scheme () [since glib 2.16]

>  * Parse an URI into components

I'd like to request this as well. Nautilus-Open-Terminal currently uses GnomeVFS [1] just for decomposing an URI into host name, port, user name and path.

Additional helpers named like

g_uri_parse_user_info ()
g_uri_parse_host_name ()
g_uri_parse_host_port ()
g_uri_parse_path ()
g_uri_parse_query ()
g_uri_parse_fragment () 

would be nice. Is anyone interested in writing those? If not, I'd volunteer for doing so.

[1] http://git.gnome.org/cgit/nautilus-open-terminal/tree/src/nautilus-open-terminal.c?id=a883bd21b62065c54e22bb5e400b2aa01306a68f#n133

Comment 4 Dan Winship 2009-05-17 23:41:59 UTC

(In reply to comment #3)
> >  * Check if an URI has a particular scheme
> 
> Works with g_uri_parse_scheme () [since glib 2.16]

If you are not planning to parse the URI into components, it would be more useful to have

    gboolean g_uri_has_scheme (const char *uri, const char *scheme);

instead, because (a) it lets you save a malloc/free, (b) it doesn't require the caller to remember to use g_ascii_strcasecmp(), and (c) it gives us a little room to fudge around future URI syntax modifications. (Eg, http://tools.ietf.org/html/draft-wood-tae-specifying-uri-transports suggests "http++sctp://example.org/" for HTTP-over-SCTP. g_uri_has_scheme() could potentially recognize this as matching "http", while g_uri_parse_scheme() would require the app itself to gain new smarts. Of course, it's possible that this new syntax idea will be rejected.)

> >  * Parse an URI into components
...
> Is anyone interested in writing those?

I'd suggest SoupURI (and its regression test) as a starting point. It also does the "Resolve an URI relative to a base URI" part.

It is a tiny bit specialized for http URIs. To make it fully generic you'd want to:

    1. Possibly don't split the "userinfo" into username and password.
       (An older version of the RFC had username and password, but this is
       now deprecated, and at the generic syntax level you're not supposed
       to assume that it's split into those two subfields.) OTOH, doing
       this would mean you couldn't hide passwords when converting back
       to string form...

    2. Remove the default port stuff and soup_uri_equal(). (Both are scheme
       specific, and glib isn't going to know about every scheme, and it
       would be confusing IMHO to have them work correctly for some schemes
       but not others.)

    3. Remove soup_uri_set_query_from_form(),
       soup_uri_set_query_from_fields(), and the just_path_and_query argument
       to soup_uri_to_string(), which are all about doing HTTP, not about
       parsing URIs.

Comment 5 Christian Neumair 2009-05-18 16:15:04 UTC

>> Works with g_uri_parse_scheme () [since glib 2.16]

> If you are not planning to parse the URI into components, it would be more
> useful to have

>    gboolean g_uri_has_scheme (const char *uri, const char *scheme);

> instead, because (a) it lets you save a malloc/free, (b) it doesn't require the
> caller to remember to use g_ascii_strcasecmp(), and (c) it gives us a little
> room to fudge around future URI syntax modifications.

Thanks for your feedback, you really seem to be into URI handling. When we discuss "future URI syntax modifications", we should discuss strictness of parsing. For instance, the current g_uri_parse_scheme() implementation just demands that the very beginning of the passed-in string is a valid scheme specifier, not that the whole string is a valid URI as such. However, if we actually demand that the passed-in URI is RFC 3986-compliant, we'd have to parse it as a whole, and not just its beginning. My actual idea was to add an internally used function

G_GNUC_WARN_UNUSED_RESULT gboolean
g_uri_parse (const char *uri,
             char **scheme,
             char **user_info,
             char **host,
             guint *port,
             char **path,
             char **query,
             char **fragment);

which would parse the entire URI, optionally decompose it (of course only the passed-in valid pointers would be malloced) and have a wrapper for the _parse_foo() variants .

Talking about g_uri_has_scheme (), wouldn't

#define g_uri_is_valid(uri) g_uri_parse(uri, NULL, NULL, NULL, NULL, NULL, NULL, NULL)?

and a the user-written code

  g_uri_is_valid (uri) && (strncmp (uri, "scheme", strlen(scheme)) == 0)

be equivalent?

However, what about GVFS URIs? Are they all RFC 3986-compliant (i.e. a syntactical subset)? I remember Alex saying that they are not really comparable to classical URIs.

Best regards,
 Christian Neumair

Comment 6 Dan Winship 2009-05-18 16:45:45 UTC

(In reply to comment #5)
> My actual idea was to add an internally used function
> 
> G_GNUC_WARN_UNUSED_RESULT gboolean
> g_uri_parse (const char *uri,
>              char **scheme,
>              char **user_info,
>              char **host,
>              guint *port,
>              char **path,
>              char **query,
>              char **fragment);

Yeah, you definitely want to parse the URI fully. Not sure if the right way to do that is to to parse it into multiple variables like that, or to have a struct for the decomposed form like SoupURI (and gvfs's internal GDecodedUri).

> Talking about g_uri_has_scheme (), wouldn't
> 
> #define g_uri_is_valid(uri) g_uri_parse(uri, NULL, NULL, NULL, NULL, NULL,
> NULL, NULL)?
> 
> and a the user-written code
> 
>   g_uri_is_valid (uri) && (strncmp (uri, "scheme", strlen(scheme)) == 0)
> 
> be equivalent?

No, because you forgot to use g_ascii_strncasecmp() :), and because the scheme name might have "scheme" as a prefix but have additional letters after that (if you ask if the URI has scheme "http", you don't want "https" to match).

> However, what about GVFS URIs? Are they all RFC 3986-compliant (i.e. a
> syntactical subset)? I remember Alex saying that they are not really comparable
> to classical URIs.

IIRC, the problem is primarily with semantics, not syntax. Eg, with a "real" ftp URI, ftp://foo.com/bar.txt means "bar.txt in whatever the current directory is after you connect to the server", whereas in gvfs, it means "bar.txt in the root directory of the ftp server". (IIRC)

Also, I forget how gvfs deals with character encoding. I think it assumes/requires that everything is UTF-8. But that also reminds me that SoupURI doesn't deal with "IRI"s (Internationalized URIs), and gurifuncs might want to deal with that.

Comment 7 Alexander Larsson 2010-09-14 14:42:03 UTC

GVfs does not "deal" with character encoding. It assumes uris decode/encode into raw bytes, sidestepping the character encoding (although it does some charset encoding handling in the display name attribute handling, but that is an i/o function beside the raw uri handling).

Comment 8 Mikkel Kamstrup Erlandsen 2011-02-01 08:52:00 UTC

Please can we have a real GObject GURI or something so we can have a ref counted object. It also makes a tonne of sense from a coherency POW now that we have a GFile.

A proper GObject also lets us add (overridable) convenience functions analogous to the g_file_() ones; in particular:

  GInputStream* g_uri_read (...);

Comment 9 Dan Winship 2011-02-01 13:34:10 UTC

No, that is so entirely a different bug. This is about parsing and reassembling URIs. This API would be used by the API you talk about, but there is no reason to make them part of the same API, any more than we want g_basename and g_build_path to be part of GFile.

Comment 10 Mikkel Kamstrup Erlandsen 2011-02-01 14:48:57 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > My actual idea was to add an internally used function
> > 
> > G_GNUC_WARN_UNUSED_RESULT gboolean
> > g_uri_parse (const char *uri,
> >              char **scheme,
> >              char **user_info,
> >              char **host,
> >              guint *port,
> >              char **path,
> >              char **query,
> >              char **fragment);
> 
> Yeah, you definitely want to parse the URI fully. Not sure if the right way to
> do that is to to parse it into multiple variables like that, or to have a
> struct for the decomposed form like SoupURI (and gvfs's internal GDecodedUri).

Dan, sorry for being unclear, my comment was meant mostly as a reaction to the above. What I was trying to root for was something similar to:

 GURI*        g_uri_new (const gchar *uri, GError **error);
 const gchar* g_uri_get_scheme (GURI *uri);
 const gchar* g_uri_get_user (GURI *uri);
 ... etc ...

Focussing on API coherency I don't think it makes sense to regard "parsing and reassembling URIs" as completely disjoint from what else you might want to do on a URI.

Comment 11 Antono Vasiljev 2012-01-06 19:24:26 UTC

Anyone working on this?

Dan, what stops glib devs from merging this branch?

https://github.com/danwinship/glib/commits/guri


I found one more glib based library for uri parsing (maybe not as idiomatic as Dan's version)

https://github.com/toffaletti/libguri

Comment 12 Dan Winship 2012-01-06 19:49:34 UTC

(In reply to comment #11)
> Anyone working on this?

I'm not currently actively working on this, and I don't know of anyone else who is.

> Dan, what stops glib devs from merging this branch?

AFAIR, that branch does not actually compile. And I don't claim that the API that's currently there is in any sense "right".

One thing that slowed me down is that it turns out it's actually really hard to make this fully generic. You either have to not automatically handle %-decoding for the user and then make them parse certain subfields themselves (which is lame) or else you need a zillion flags to indicate particular special parsing behaviors for different URI schemes (which is lame). Or keep track of both encoded and unencoded versions of each component so the caller can automatically get the decoded ones for "simple" fields but is still able to reparse the annoying fields themselves... or something...

Comment 13 Antono Vasiljev 2012-01-15 17:51:42 UTC

I think we should use Pareto principle[1] here and make GUri usable in 80% of cases. For other 20% there should be some flags or additional method calls from developers.

I belive our 80% consist of http(s)://, file://, ftp://, mailto: and maybe some other. 

I've rebased your branch ontop of current master and made it buildable. Also i've added simple unit tests for HTML5 parser. I think we should define this 80% of cases and define nice parser API via unit tests.

As for me this 80% of cases is enough to add GUri to GLib.

What do you think, guys?

Let's solve this bug.

[1]: http://en.wikipedia.org/wiki/Pareto_principle

Comment 14 Antono Vasiljev 2012-01-15 17:52:21 UTC

Oh. And link to my branch: https://github.com/antono/glib/tree/guri2

Comment 15 Marc-Andre Lureau 2012-04-18 16:08:57 UTC

hey, I would love to get that, along perhaps with simple ipv4/ipv6 parsing check

Comment 16 Marc-Andre Lureau 2012-08-24 11:03:45 UTC

is someone still working on this?

Comment 17 Antono Vasiljev 2012-08-27 18:10:52 UTC

Probably up to date version is here:

https://github.com/chergert/mongo-glib/blob/master/cut-n-paste/guri.c

But noone preparing this for merge.

Comment 18 Antono Vasiljev 2012-08-27 18:12:40 UTC

Also, Christian Hergert have some ideas:

https://github.com/chergert/guri

Comment 19 Marc-Andre Lureau 2013-08-09 08:26:43 UTC

Also the one from qemu, which inherits from libxml2 and libvirt:

http://git.qemu.org/?p=qemu.git;a=blob;f=util/uri.c

Comment 20 Dan Winship 2014-11-22 22:48:37 UTC

Someone on another bug mentioned GUri and I realized I should probably dump my work-in-progress since I seem unlikely to ever finish it... It is now rebased and pushed to wip/danw/guri (on git.gnome.org).

As compared to the earlier version, this has more extensive API, to support both "I want a GUri structure" use cases and "I just want to split it into other strings" use cases (and likewise the "I want to assemble a valid URI string from these pieces" case, which seems to be pretty common, and which is not supported well by the older API, or SoupURI).

It is, at least theoretically, working and ready to land (well, except that you should "git rm glib/guri-notes.txt" first). But it seemed like it wouldn't make sense to land it until someone had done test ports of some of the existing URI-using code in GNOME (eg, libsoup, gvfs, multiple places in evolution) to make sure it really is what we want, API-wise. (The libsoup and evolution uses involve public APIs, so porting them to use GUri is likely to be messy, since it actually has to map between GUri and their existing APIs. Porting gvfs ought to be a little cleaner...)

Comment 21 Sebastian Dröge (slomo) 2014-11-23 21:06:36 UTC

FWIW, we have something like this in GStreamer too now: http://cgit.freedesktop.org/gstreamer/gstreamer/tree/gst/gsturi.h#n191

Comment 22 Marc-Andre Lureau 2015-02-19 13:00:30 UTC

Some of the differences between GstUri and GUri:

- guri has gerrors
- it seems guri implicitely normalizes, gsturi not
- gsturi can "join" a reference URI onto a base URI (vs only g_uri_parse_relative)
- gsturi allows to compare, copy and modify
- gsturi has more path an query functons

Sebastian, gst_uri_set_path() == gst_uri_set_path_string()

I'd consider adding more functions to copy and modify GUri. I'd leave compare out.

In GstUri, I doubt the path and query manipulation functions are so useful, for example GHashTable API is enough for query.

Comment 23 Marc-Andre Lureau 2015-02-19 13:30:17 UTC

more differences:

- it seems gsturi implicitely unescapes, guri not
- guri has more unescape functions (string, segment, bytes), and gsturi only relies on g_uri_unescape

Comment 24 Dan Winship 2015-02-19 13:53:28 UTC

(In reply to Marc-Andre Lureau from comment #22)
> - gsturi can "join" a reference URI onto a base URI (vs only
> g_uri_parse_relative)

A GUri always represents an absolute URI, so as it is now you couldn't have a version of g_uri_parse_relative() that took two GUris rather than a GUri and a string.

Does that API actually get used in GstUri?

> - gsturi allows to compare, copy and modify

Comparing can't happen in a generic URI API, because comparison rules are scheme-specific. (default ports, default path, default parameters, case sensitivity, etc)

SoupURI is modifiable, although generally URIs only ever get modified as part of initially building them. Eg:

    port = soup_server_get_port (test_server);
    test_uri = soup_uri_new ("http://localhost");
    soup_uri_set_port (test_uri, port);

But even though no one ever *actually* modifies URIs after building them, we still end up having to make copies all the time, just in case someone did modify one. So it has always seemed to me that having immutable refcounted URIs would be better, memory-management-wise, as long as you also had enough good URI-building functions that you didn't need to build them in multiple steps. Eg:

    port = soup_server_get_port (test_server);
    test_uri = g_uri_build (G_URI_FLAGS_NONE,
                            "http", NULL, "localhost", port,
                            NULL, NULL, NULL);

Maybe not actually an improvement... I haven't tried writing much code with GUri, so maybe it would turn out that this idea was wrong.


(In reply to Marc-Andre Lureau from comment #23)
> - it seems gsturi implicitely unescapes, guri not

It depends on whether you pass G_URI_ENCODED in the flags. There are situations where unescaping will change the meaning of the URI, so it has to be avoided

> - guri has more unescape functions (string, segment, bytes), and gsturi only
> relies on g_uri_unescape

The string and segment functions already exist in glib and were just moved to guri.h from gurifuncs.h. The bytes function is to address bug 620417.

Comment 25 Tim-Philipp Müller 2015-02-19 13:54:48 UTC

Just a random comment: if I remember correctly in GNet it was always a bit painful to deal with URIs because while it provided functions to escape/unescape them, it was never clear what the 'current state' was, one would have to track that externally, and also it was/is not always clear what one may get as input from certain places (even if it should be of course).

Comment 26 Marc-Andre Lureau 2015-02-19 14:01:49 UTC

I was about to make a similar comment as Tim:

Dan, in the documentation, could you describe what GUri does implicitely wrt "normalize" and "unescape"? (normalize in gsturi also deals with path resolution for ex)

It's a bit unfortunate if GUri and GstUri end up with different implicit rules, it's already confusing enough :)

Comment 27 Sebastian Dröge (slomo) 2015-02-19 14:05:18 UTC

(In reply to Marc-Andre Lureau from comment #26)

> It's a bit unfortunate if GUri and GstUri end up with different implicit
> rules, it's already confusing enough :)

See my comment here :) https://bugzilla.gnome.org/show_bug.cgi?id=725221#c28
We didn't have GstUri in a public release yet, so can still change it in any way.

Comment 28 Marc-Andre Lureau 2015-02-19 14:08:56 UTC

(In reply to Dan Winship from comment #24)
> (In reply to Marc-Andre Lureau from comment #22)
> > - gsturi allows to compare, copy and modify
> 
> Comparing can't happen in a generic URI API, because comparison rules are
> scheme-specific. (default ports, default path, default parameters, case
> sensitivity, etc)

Perhaps have a basic 1-1 compare function with extra flags? This could be considered as a seperate later bug imho.

>     port = soup_server_get_port (test_server);
>     test_uri = g_uri_build (G_URI_FLAGS_NONE,
>                             "http", NULL, "localhost", port,
>                             NULL, NULL, NULL);
> 
> Maybe not actually an improvement... I haven't tried writing much code with
> GUri, so maybe it would turn out that this idea was wrong.
> 

I agree with the rationale for immutable, however I would consider a function to build from an exisiting URI, similar to gst_uri_new_with_base(uri, scheme, usering, host, port...).

Then, it would probably be worth adding a function to build back a query string from a HashTable.

Comment 29 Dan Winship 2015-02-19 14:16:59 UTC

"normalization" in guri just means unescaping characters where it's guaranteed that the escaping is unnecessary. eg, "%41" can always be replaced with "A", regardless of the scheme. (But "%2F" can't always be replaced with "/", because that might change the meaning in some cases.)

Not sure what you mean about path resolution. g_uri_parse_relative() / g_uri_resolve_relative() do the relative path handling stuff, but nothing else ever modifies path.

I think the docs are pretty clear about when strings are and aren't %-encoded... Eg:

 * If @flags contains %G_URI_ENCODED, then `%`-encoded characters in
 * @uri_string will remain encoded in the output strings. (If not,
 * then all such characters will be decoded.)

Comment 30 Dan Winship 2015-02-19 14:25:43 UTC

(In reply to Marc-Andre Lureau from comment #28)
> I agree with the rationale for immutable, however I would consider a
> function to build from an exisiting URI, similar to
> gst_uri_new_with_base(uri, scheme, usering, host, port...).

That seems entirely plausible

> Then, it would probably be worth adding a function to build back a query
> string from a HashTable.

Yes. Although in libsoup I ended up adding a GData**->query-string function too (https://developer.gnome.org/libsoup/stable/libsoup-2.4-HTML-Form-Support.html#soup-form-encode-datalist) because some web APIs care about the order the parameters get serialized in.

Comment 31 Marc-Andre Lureau 2015-02-19 14:30:50 UTC

(In reply to Dan Winship from comment #29)
> "normalization" in guri just means unescaping characters where it's
> guaranteed that the escaping is unnecessary. eg, "%41" can always be
> replaced with "A", regardless of the scheme. (But "%2F" can't always be
> replaced with "/", because that might change the meaning in some cases.)
> 
> Not sure what you mean about path resolution. g_uri_parse_relative() /
> g_uri_resolve_relative() do the relative path handling stuff, but nothing
> else ever modifies path.

Ok, why not "normalize paths" too (implicitely or not), that is remove unnecessary "." and ".." ?

> I think the docs are pretty clear about when strings are and aren't
> %-encoded... Eg:
> 
>  * If @flags contains %G_URI_ENCODED, then `%`-encoded characters in
>  * @uri_string will remain encoded in the output strings. (If not,
>  * then all such characters will be decoded.)

Sorry, I was greping for "unescape".. that's clear enough. thanks

Comment 32 Marc-Andre Lureau 2015-02-19 14:42:40 UTC

(In reply to Tim-Philipp Müller from comment #25)
> Just a random comment: if I remember correctly in GNet it was always a bit
> painful to deal with URIs because while it provided functions to
> escape/unescape them, it was never clear what the 'current state' was, one
> would have to track that externally, and also it was/is not always clear
> what one may get as input from certain places (even if it should be of
> course).

It seems GUri could also use a g_uri_get_flags() to check encoding status, so you can have preconditions on !G_URI_ENCODED for ex.

Comment 33 Dan Winship 2015-02-19 15:03:43 UTC

(In reply to Marc-Andre Lureau from comment #31)
> Ok, why not "normalize paths" too (implicitely or not), that is remove
> unnecessary "." and ".." ?

RFC 3986 only says that this should be done as part of the process of resolving a relative URI against a base URI. Maybe it's implied that you can/should do this when parsing as well? What do other URL libraries do?

Comment 34 Marc-Andre Lureau 2015-02-19 16:04:27 UTC

(In reply to Dan Winship from comment #33)
> (In reply to Marc-Andre Lureau from comment #31)
> > Ok, why not "normalize paths" too (implicitely or not), that is remove
> > unnecessary "." and ".." ?
> 
> RFC 3986 only says that this should be done as part of the process of
> resolving a relative URI against a base URI. Maybe it's implied that you
> can/should do this when parsing as well? What do other URL libraries do?

using repl.it, I checked:

node url.parse: keep path
python urlparse: keep path
go net/url: keep path
java net URL: keep path
ruby uri: keep path

Btw, I just found https://url.spec.whatwg.org/ which seems to be a more recent attempt to standardize URL. Would be worth checking how it aligns with this API

Comment 35 Dan Winship 2015-02-19 16:21:15 UTC

The WHATWG spec is specifically about URLs in a web context (and is referenced from the HTML5 spec, IIRC). I had thought about having a GUriFlags value to specify using that spec rather than RFC 3986, but never implemented it.

There has also been talk about revising/updating 3986 in the IETF, but I'm not sure if that actually started yet.

Comment 36 Marc-Andre Lureau 2015-02-19 16:41:35 UTC

valgrind complains about:

/uri/parsing/relative: ==4258== Invalid read of size 1
==4258==    at 0x4F4B293: remove_dot_segments (guri.c:1041)

Trivial fix with:
 
+++ b/glib/guri.c
@@ -1037,6 +1037,9 @@ remove_dot_segments (gchar *path)
 {
   gchar *p, *q;
 
+  if (!*path)
+    return;
+

Comment 37 Marc-Andre Lureau 2015-03-26 15:32:58 UTC

*** Bug 550110 has been marked as a duplicate of this bug. ***

Comment 38 Marc-Andre Lureau 2015-03-27 15:54:16 UTC

Created attachment 300470 [details] [review]
guri: new URI parsing and generating functions

Add a set of new URI parsing and generating functions, including a new
parsed-URI type GUri. Move all the code from gurifuncs.c into guri.c,
reimplementing some of those functions (and
g_string_append_uri_encoded()) in terms of the new code.

Comment 39 Marc-Andre Lureau 2015-03-27 16:00:47 UTC

I just attached an updated version of Dan GUri for easy review, I fixed a few things:
- added tests, coverage at 98%
- fixed bug mentionned above
- fixed some misc bugs found during testing
- small leak in tests
- added preconditions
- added g_uri_get_flags()
- added autoptr and boxed type
- renamed GUriFlags G_URI_FLAGS_..
- added _NONE for 0 flags
- updated to 2.46 macros

I have a wip patch for spice-gtk and I planning to look at gvfs during the weekend.

Comment 40 Marc-Andre Lureau 2015-03-27 16:04:14 UTC

Created attachment 300472 [details] [review]
guri: new URI parsing and generating functions

Add a set of new URI parsing and generating functions, including a new
parsed-URI type GUri. Move all the code from gurifuncs.c into guri.c,
reimplementing some of those functions (and
g_string_append_uri_encoded()) in terms of the new code.

Comment 41 Marc-Andre Lureau 2015-03-27 16:07:47 UTC

Created attachment 300473 [details] [review]
guri: new URI parsing and generating functions

Add a set of new URI parsing and generating functions, including a new
parsed-URI type GUri. Move all the code from gurifuncs.c into guri.c,
reimplementing some of those functions (and
g_string_append_uri_encoded()) in terms of the new code.

Comment 42 Marc-Andre Lureau 2015-03-30 15:46:14 UTC

Created attachment 300605 [details] [review]
guri: new URI parsing and generating functions

Add a set of new URI parsing and generating functions, including a new
parsed-URI type GUri. Move all the code from gurifuncs.c into guri.c,
reimplementing some of those functions (and
g_string_append_uri_encoded()) in terms of the new code.

Comment 43 Marc-Andre Lureau 2015-09-21 21:50:04 UTC

ping

just wanted to give interesting figures from rust cargo (https://crates.io/crates?sort=downloads):

~ 500k downloads of libc (#1)
~ 123k downloads of url (#15)

It certainly says something about how useful URL parsing is to devs.

Comment 44 Marc-Andre Lureau 2016-03-11 18:04:44 UTC

ping? Can we imagine landing GUri next cycle? or what is left?

Comment 45 Dan Winship 2016-03-13 15:17:54 UTC

(In reply to Marc-Andre Lureau from comment #44)
> ping? Can we imagine landing GUri next cycle? or what is left?

As far as I know, no one has tried porting much existing code to use this API, so we don't really have any idea if it's well-designed for the various use cases or not.

Comment 46 Kalev Lember 2016-06-22 09:40:53 UTC

Is this too late for the 3.21 cycle now?

Comment 47 Colin Walters 2017-03-10 19:24:48 UTC

One thing that'd make more more confident in this is copying it into libsoup, and rebasing libsoup's URI parsing on it.  Does that make sense?

Comment 48 Dan Winship 2017-03-13 13:37:20 UTC

FWIW, wip/danw/guri in libsoup replaces libsoup's URI-parsing/stringifying code with calls to GUri instead, but I'm not sure that should really make you more confident, since I obviously had libsoup in mind when I wrote this code :). A better test would be whether it can replace *other people's* URI-handling code.

Comment 49 Ondrej Holy 2017-03-13 15:31:36 UTC

Some patches for GVfs are already proposed, see Bug 746993.

Comment 50 Ignacio Casal Quinteiro (nacho) 2017-04-19 22:03:47 UTC

Review of attachment 300605 [details] [review]:

Some minor comments:
 - we need to update the Since tags
 - we need to remove guri-notes

See also the minor comments inline

::: glib/guri.c
@@ +310,3 @@
+  else
+    g_free (decoded);
+  return d - (guchar *)decoded;

you free decoded and then you use it? seems bad

@@ +795,3 @@
+  return TRUE;
+
+ fail:

if you use g_clear_pointer you don't need the ifs

@@ +1246,3 @@
+
+ fail:
+  if (uri)

just do g_clear_pointer (&uri, g_uri_unref) ?

@@ +1737,3 @@
+                                   hide_fragment ? NULL : uri->fragment);
+    }
+  else

if you are returning inside the if you don't really need an else block

Comment 51 GNOME Infrastructure Team 2018-05-24 11:08:41 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/110.