GNOME Bugzilla – Bug 743195
uri: Add parsing unit test based on GNet
Last modified: 2015-01-22 21:10:03 UTC
There are a few failing tests for now which have to be fixed before we can release this API.
Created attachment 294908 [details] [review] uri: Add paring unit test based on GNet Plus some new URIs to parse. https://git.gnome.org/browse/archive/gnet/plain/tests/check/gnet/gneturi.c
This also includes a test for a case Marc Leeman reported on IRC today: rtp://123.456.789.012:345?query (without a / to terminate authority). Which would be equivalent to: scheme://hostname:123?query for the parser.
(09:18:12 AM) den_erpel: slomo: started thinking about the uri (09:18:18 AM) den_erpel: I probably have a fix (09:18:53 AM) den_erpel: we need to look for '/' and '?'; if '?' if before '/', the eoa is the '?', otherwise the '/'
Note that the first unit test failure currently is that > scheme://[ipv6address]/path does not drop the brackets around the address when asking for the host.
I'm having a look at this. Yes, the parsing code assumes that a path follows an authority part, that's easily modified to terminate the authority part at a query or fragment too, so scheme://user@host:123?query and scheme://user@host:123#fragment would also work. The default was to leave the square brackets in the host string, I can change that to strip them. Although it's debatable how to handle "scheme://[01:23:45:67:89:ab:cd:ef:123/path" as the RFC says there must be a closing square bracket, and that the brackets only contain unreserved, sub-delim and ":". This means that the authority part would end at the "/". I'd say in that case it would be host="[01:23:45:67:89:ab:cd:ef", port=123 (since we're back to <host>:<port>) and path="/path" as the only sensible way to interpret this. I'll work my way through the test cases and post a patch to fix any problems I find.
Review of attachment 294908 [details] [review]: I found some discrepancies between the tested URI's and their expected output. ::: tests/check/gst/gsturi.c @@ -167,0 +167,103 @@ +/* Taken from the GNet unit test and extended with other URIs: + * https://git.gnome.org/browse/archive/gnet/plain/tests/check/gnet/gneturi.c + */ ... 100 more ... The previous 4 test cases all have a scheme but are looking for a NULL scheme entry in the output @@ -167,0 +167,126 @@ +/* Taken from the GNet unit test and extended with other URIs: + * https://git.gnome.org/browse/archive/gnet/plain/tests/check/gnet/gneturi.c + */ ... 123 more ... Paths after an authority must start with a single "/" according to RFC3986. So in this case the hostname is "hostname" with no port and a path of ":123path". @@ -167,0 +167,139 @@ +/* Taken from the GNet unit test and extended with other URIs: + * https://git.gnome.org/browse/archive/gnet/plain/tests/check/gnet/gneturi.c + */ ... 136 more ... Wouldn't the port be 23? @@ -167,0 +167,144 @@ +/* Taken from the GNet unit test and extended with other URIs: + * https://git.gnome.org/browse/archive/gnet/plain/tests/check/gnet/gneturi.c + */ ... 141 more ... This should probably cause a parsing error since [ is not allowed without a ] and [ is not allowed in the other host part forms in RFC3986.
Created attachment 295102 [details] [review] Fix parsing and amend tests to correct expected results. This should fix the URI parsing issues and corrects some of the expected test results from Sebastian's patch. It also adds some tests to check the string presentation of URI's, including that square brackets are added to IPv6 addresses on conversion to string to match the new stripping of brackets behaviour. This patch includes Sebastian's patch too.
I think "scheme://hostname:123path?query#fragment" is a valid URI, nothing says that the path must start with a / if I'm not mistaken. https://tools.ietf.org/html/rfc3986#section-3.3 path-abempty, path-noscheme, path-rootless and path-empty can start with almost anything The other changes you did look reasonable to me :) Thanks for working on this.
RFC 3986, Section 3, second paragraph (https://tools.ietf.org/html/rfc3986#section-3) "When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//")." So if the authority part is present ("//" [userinfo "@"] host [":" port]), then the path must be absolute or not present at all. If there's no authority part then a path may start with almost any character. Since host cannot contain ":" unless it is surrounded by square brackets, then I believe that means that "//hostname:123path" is not a valid authority part. If there's no authority part, the path cannot start with "//", so then the URI is invalid.
Ah, missed that part. Thanks :) commit d20fa132c117240ad386a5a3b472dc629407b445 Author: David Waring <david.waring@rd.bbc.co.uk> Date: Wed Jan 21 14:10:02 2015 +0000 uri: Fix new URI parsing tests based on GNet's https://bugzilla.gnome.org/show_bug.cgi?id=743195 commit ab9b66e6becfdb944b2229a14e0ead7308a7bf92 Author: David Waring <david.waring@rd.bbc.co.uk> Date: Wed Jan 21 14:09:45 2015 +0000 uri: Fix parsing issues Make host IPs in square brackets store only the IP, i.e. strip the brackets. Strip leading whitespace characters in URIs. Fail parsing when host part does not match any valid formats from RFC3986. https://bugzilla.gnome.org/show_bug.cgi?id=743195 commit 0637703fe78a6be09166cc3bc392f5421c80055d Author: Sebastian Dröge <sebastian@centricular.com> Date: Mon Jan 19 19:15:32 2015 +0100 uri: Add parsing unit test based on GNet's Plus some new URIs to parse. https://git.gnome.org/browse/archive/gnet/plain/tests/check/gnet/gneturi.c https://bugzilla.gnome.org/show_bug.cgi?id=743195