Bug 667637 – SoupURI can end up with NULL path, which is documented as not allowed

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 667637 - SoupURI can end up with NULL path, which is documented as not allowed


Summary:	SoupURI can end up with NULL path, which is documented as not allowed


Status:	RESOLVED FIXED

Product:	libsoup
Classification:	Core
Component:	Misc
Version:	2.37.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	libsoup-maint@gnome.bugs
QA Contact:	libsoup-maint@gnome.bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-01-10 16:42 UTC by Simon McVittie
Modified:	2012-02-10 14:11 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
SoupURI: give "foo:" a non-NULL path member (1.06 KB, patch) 2012-01-10 16:43 UTC, Simon McVittie	accepted-commit_now	Details \| Review
soup_uri_new: return a valid URI for a NULL parameter (1.58 KB, patch) 2012-01-10 16:44 UTC, Simon McVittie	rejected	Details \| Review
SoupURI: add soup_uri_is_valid() and use it for precondition checks (11.21 KB, patch) 2012-01-10 16:45 UTC, Simon McVittie	reviewed	Details \| Review
soup_uri_new: do not allow invalid URIs to be returned at all (1.17 KB, patch) 2012-01-10 16:45 UTC, Simon McVittie	accepted-commit_now	Details \| Review
soup_uri_uses_default_port: correct documentation (909 bytes, patch) 2012-01-10 16:45 UTC, Simon McVittie	accepted-commit_now	Details \| Review
Header parsing test: verify that a full URI can appear after the "GET" (1.30 KB, patch) 2012-01-10 16:46 UTC, Simon McVittie	accepted-commit_now	Details \| Review
uri-parsing test: verify that URIs are split correctly (30.31 KB, patch) 2012-01-10 16:46 UTC, Simon McVittie	reviewed	Details \| Review
uri-parsing test: add a case that was exercised by a protocol fuzzer (1.10 KB, patch) 2012-01-10 16:46 UTC, Simon McVittie	accepted-commit_now	Details \| Review
SoupServer: reject non-HTTP URIs and URIs with no host (1.24 KB, patch) 2012-01-10 17:06 UTC, Simon McVittie	accepted-commit_now	Details \| Review
[1/4] SoupProxyResolver: construct a valid SoupURI (988 bytes, patch) 2012-02-06 13:54 UTC, Simon McVittie	committed	Details \| Review
[2/4] soup_uri_copy_host: always set the path to something non-NULL (974 bytes, patch) 2012-02-06 13:55 UTC, Simon McVittie	committed	Details \| Review
[3/4] SoupURI: add SOUP_URI_IS_VALID() and use it for basic precondition checks (11.40 KB, patch) 2012-02-06 13:55 UTC, Simon McVittie	committed	Details \| Review
[4/4] soup_uri_new: do not allow invalid URIs to be returned, except via soup_uri_new (NULL) (1.26 KB, patch) 2012-02-06 13:56 UTC, Simon McVittie	committed	Details \| Review
[extra 1/2] soup_uri_new: return a URI with all required fields set, even with NULL argument (1.04 KB, patch) 2012-02-06 14:05 UTC, Simon McVittie	rejected	Details \| Review
[extra 2/2] SoupURI: insist that the URI is valid, even in setters (3.08 KB, patch) 2012-02-06 14:07 UTC, Simon McVittie	rejected	Details \| Review
soup-uri: revert some of the previously-added return-if-fails (10.58 KB, patch) 2012-02-09 15:05 UTC, Dan Winship	none	Details \| Review

Description Simon McVittie 2012-01-10 16:42:05 UTC

Similar to Bug #666316, I'm still trying to get a libsoup-based app past a HTTP protocol fuzzer without errors.

SoupURI is documented as having a mandatory scheme and path (so "foo:" is { scheme => "foo", path => "" } and "about:mozilla" is { scheme => "about", path => "mozilla" }). However, if you actually parse "foo:", you get { scheme => "foo" } with path left NULL.

This violates assumptions elsewhere: for instance, this means that if you send "GET http: HTTP/1.0" to tests/simple-httpd, you get a segfault in the code in SoupServer that avoids Bug #653258, which calls soup_uri_decode on uri->path.

The patches I'm about to attach aren't the full solution - in particular, "GET http: HTTP/1.0" still gives me a critical elsewhere - but they're a start...

Comment 1 Simon McVittie 2012-01-10 16:43:32 UTC

Created attachment 204957 [details] [review]
SoupURI: give "foo:" a non-NULL path member

SoupURI documents uri->path as being required, and got_headers in
SoupServer assumes that it's non-NULL, but in fact parsing a URI
consisting solely of a scheme ("foo:") would leave path = NULL.

Bug-NB: NB#294977
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

Comment 2 Simon McVittie 2012-01-10 16:44:34 UTC

Created attachment 204958 [details] [review]
soup_uri_new: return a valid URI for a NULL parameter

A URI without a scheme or path is considered to be invalid, but
soup_uri_new (NULL) returned a zero-filled URI. Since it's incorrect
to use a zero-filled URI without setting at least a scheme and path,
filling those in initially shouldn't cause any behaviour changes.
        
This does mean that callers assigning directly to uri->path (without
freeing any previous value) will leak, but that seems better than
crashing, and Google Code Search indicates that it's rare to do so
(the only direct assignment I found was using uri->scheme incorrectly
anyway, so they already leaked a string).

Also change the gobject-introspection annotation to say that a NULL
parameter is allowed.

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

Comment 3 Simon McVittie 2012-01-10 16:45:24 UTC

Created attachment 204959 [details] [review]
SoupURI: add soup_uri_is_valid() and use it for  precondition checks

Also annotate nullable setter parameters as (allow-none), to justify
why they don't have a precondition check.

soup_uri_equal() and soup_uri_host_equal() just warn and carry on if
invalid, because invalid URIs can still be compared for equality.
Similarly, if soup_uri_uses_default_port() is used on an incomplete URI,
it still returns the same thing it used to.

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

Comment 4 Simon McVittie 2012-01-10 16:45:41 UTC

Created attachment 204960 [details] [review]
soup_uri_new: do not allow invalid URIs to be returned  at all

Also document the possible NULL return.

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

Comment 5 Simon McVittie 2012-01-10 16:45:56 UTC

Created attachment 204961 [details] [review]
soup_uri_uses_default_port: correct documentation

We now know the default for ftp, too.

Comment 6 Simon McVittie 2012-01-10 16:46:11 UTC

Created attachment 204962 [details] [review]
Header parsing test: verify that a full URI can appear  after the "GET"

Comment 7 Simon McVittie 2012-01-10 16:46:29 UTC

Created attachment 204963 [details] [review]
uri-parsing test: verify that URIs are split correctly

Testing the round-trip (split + reassemble) is necessary, but not
sufficient: a large proportion of SoupURI's API is in terms of the
separate parts, and this was never explicitly tested before.
 
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

Comment 8 Simon McVittie 2012-01-10 16:46:46 UTC

Created attachment 204964 [details] [review]
uri-parsing test: add a case that was exercised by a  protocol fuzzer

This was suspected to cause a crash via a NULL path; while it seems it
doesn't actually have that problem, it still seems worth testing.

Comment 9 Simon McVittie 2012-01-10 16:51:00 UTC

(In reply to comment #0)
> The patches I'm about to attach aren't the full solution - in particular, "GET
> http: HTTP/1.0" still gives me a critical elsewhere - but they're a start...

More specifically:

* "GET http:" tries to send a redirect to
  soup_uri_new_with_base ("http:", "./") but a http URI with no host is
  considered to be invalid, so it criticals

* "GET foo:" tries to send a redirect to "foo:/", which is syntactically OK
  but probably not what we want

I'm open to suggestions...

(Tested via: run simple-httpd; type HTTP into a telnet session into its listening port.)

Comment 10 Simon McVittie 2012-01-10 16:54:42 UTC

(In reply to comment #6)
> Header parsing test: verify that a full URI can appear  after the "GET"

RFC 2616 says: "To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies." - so libsoup is right to accept these on the server side, even though the RFC also says that clients MUST use an abs_path ("GET /blah HTTP/1.1") when not talking to a proxy.

Comment 11 Simon McVittie 2012-01-10 17:06:38 UTC

Created attachment 204966 [details] [review]
SoupServer: reject non-HTTP URIs and URIs with no host

This prevents a critical warning and other misbehaviour in the
simple-httpd test, when using requests like "GET http: HTTP/1.0"
or "GET foo: HTTP/1.0".

Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637
Bug-NB: NB#294977
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>

---

Having looked at RFC 2616 again, I think this is a reasonable solution to Comment #9.

Comment 12 Dan Winship 2012-01-16 23:16:57 UTC

Comment on attachment 204958 [details] [review]
soup_uri_new: return a valid URI for a NULL parameter

Scheme is required too though, and this patch doesn't initialize that. And besides, the docs already say:

 * (You will need to
 * call at least soup_uri_set_scheme() and soup_uri_set_path(), since
 * those fields are required.)

So as long as we have a return-if-fail anywhere someone would be able to shoot themselves in the foot, I don't think we should change the behavior here.

Comment 13 Dan Winship 2012-01-16 23:26:28 UTC

Comment on attachment 204959 [details] [review]
SoupURI: add soup_uri_is_valid() and use it for  precondition checks

> soup_uri_equal() and soup_uri_host_equal() just warn and carry on if
> invalid, because invalid URIs can still be compared for equality.
> Similarly, if soup_uri_uses_default_port() is used on an incomplete URI,
> it still returns the same thing it used to.

Hm... no, that doesn't seem right. If it's allowed then it shouldn't warn, and if it's not allowed, then you shouldn't get a meaningful return value back. I'd just change them to be return-if-fail as well.

>+ * soup_uri_is_valid:

I think I'd make this a macro like SOUP_URI_VALID_FOR_HTTP(). (And then that also gets rid of a problem with this patch, where if you pass in a NULL URI to a method, you'll get CRITICALs from both soup_uri_is_valid() and the method that called it.

Comment 14 Dan Winship 2012-01-16 23:29:28 UTC

Comment on attachment 204962 [details] [review]
Header parsing test: verify that a full URI can appear  after the "GET"

>+	/* It's better for this to be passed through: this means a SoupServer
>+	 * could implement ftp-over-http proxying, for instance
>+	 */
>+	{ "GET with full URI of unrecognised scheme",
>+	  "GET AbOuT: HTTP/1.1\r\n", -1,

though we should make sure somewhere that SoupServer rejects those URIs by default...

Comment 15 Dan Winship 2012-01-16 23:32:25 UTC

Comment on attachment 204963 [details] [review]
uri-parsing test: verify that URIs are split correctly

This is great, but...

>+	{ "ftp://user@host/path", "ftp://user@host/path",
>+	  { /* scheme = */ "ftp",
>+	    /* user = */ "user",
>+	    /* password = */ NULL,
>+	    /* host = */ "host",
>+	    /* port = */ 21,
>+	    /* path = */ "/path",
>+	    /* query = */ NULL,
>+	    /* fragment = */ NULL } },

the examples are gigantic now. I think I'd prefer

    { "ftp://user@host/path", "ftp://user@host/path",
      { "ftp", "user", NULL, "host", 21, "/path", NULL, NULL } },

Comment 16 Dan Winship 2012-01-16 23:33:12 UTC

Comment on attachment 204964 [details] [review]
uri-parsing test: add a case that was exercised by a  protocol fuzzer

ok, with the same modification as the other test cases

Comment 17 Dan Winship 2012-01-16 23:39:02 UTC

Comment on attachment 204966 [details] [review]
SoupServer: reject non-HTTP URIs and URIs with no host

ok, although this wrecks your "ftp proxy" use case... but it wouldn't actually be possible to implement that with SoupServer currently anyway, so I guess this makes sense for now. (We should definitely be checking at least soup_uri_is_valid() there.)

Comment 18 Simon McVittie 2012-01-17 11:31:10 UTC

(In reply to comment #12)
> Scheme is required too though, and this patch doesn't initialize that.

I'm pretty sure it returns { scheme => "about", path => "" }? ...

> So as long as we have a return-if-fail anywhere someone would be able to shoot
> themselves in the foot, I don't think we should change the behavior here.

... but OK, I'll omit this one.

(In reply to comment #13)
> I'd just change [equals/hash] to be return-if-fail as well.

OK. I'm probably not going to be able to do this immediately (working on several things in parallel), but I'll do that when I return to libsoup.

> >+ * soup_uri_is_valid:
> 
> I think I'd make this a macro like SOUP_URI_VALID_FOR_HTTP().

OK. I even wondered about imitating how GObject would look (SOUP_IS_URI()) but I think SOUP_IS_VALID is probably better.

(In reply to comment #15)
> the examples are gigantic now. I think I'd prefer
> 
>     { "ftp://user@host/path", "ftp://user@host/path",
>       { "ftp", "user", NULL, "host", 21, "/path", NULL, NULL } },

OK. 

(In reply to comment #17)
> ok, although this wrecks your "ftp proxy" use case... but it wouldn't actually
> be possible to implement that with SoupServer currently anyway

Yeah, I think SoupServer is the right layer to be rejecting non-http at the moment - as I see it, the header-parsing code implements the full HTTP RFC (including e.g. being a proxy), whereas SoupServer implements a small-enough-to-be-useful subset of the RFC.

(In reply to comment #0)
> I'm still trying to get a libsoup-based app past a HTTP
> protocol fuzzer without errors.

The people who can run the fuzzer (sadly, it's proprietary and I don't have access to just run it myself, so I get error reports a bit at a time) report that there are more error cases, so expect more patches when I return to this... :-(

Comment 19 Dan Winship 2012-01-17 13:12:12 UTC

(In reply to comment #18)
> (In reply to comment #12)
> > Scheme is required too though, and this patch doesn't initialize that.
> 
> I'm pretty sure it returns { scheme => "about", path => "" }? ...

No, the comment was about the soup_uri_new(NULL) case, not the soup_uri_new("about:") case. We do want an empty path in the latter case, but in the former, the returned URI is all NULL.

(Oh, the addition of (allow-none) to soup_uri_new() in that patch was right though... you can slip that into one of the other patches...)

Comment 20 Simon McVittie 2012-02-03 14:28:44 UTC

(In reply to comment #1)
> SoupURI: give "foo:" a non-NULL path member

Pushed

(In reply to comment #5)
> soup_uri_uses_default_port: correct documentation

Pushed

(In reply to comment #6)
> Header parsing test: verify that a full URI can appear  after the "GET"

Pushed

(In reply to comment #11)
> SoupServer: reject non-HTTP URIs and URIs with no host

Pushed

(In reply to comment #15)
> (From update of attachment 204963 [details] [review])
> This is great, but...
> 
> >+	{ "ftp://user@host/path", "ftp://user@host/path",
> >+	  { /* scheme = */ "ftp",
> >+	    /* user = */ "user",
> >+	    /* password = */ NULL,
> >+	    /* host = */ "host",
> >+	    /* port = */ 21,
> >+	    /* path = */ "/path",
> >+	    /* query = */ NULL,
> >+	    /* fragment = */ NULL } },
> 
> the examples are gigantic now. I think I'd prefer
> 
>     { "ftp://user@host/path", "ftp://user@host/path",
>       { "ftp", "user", NULL, "host", 21, "/path", NULL, NULL } },

Pushed with that change, I hope that's OK.

(In reply to comment #16)
> (From update of attachment 204964 [details] [review])
> ok, with the same modification as the other test cases

Likewise

Comment 21 Simon McVittie 2012-02-06 13:54:51 UTC

Created attachment 206883 [details] [review]
[1/4] SoupProxyResolver: construct a valid SoupURI

path = NULL is not considered valid.

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637
Bug-NB: NB#294977

Comment 22 Simon McVittie 2012-02-06 13:55:15 UTC

Created attachment 206884 [details] [review]
[2/4] soup_uri_copy_host: always set the path to something non-NULL

Not doing so is considered to be invalid.

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637
Bug-NB: NB#294977

Comment 23 Simon McVittie 2012-02-06 13:55:49 UTC

Created attachment 206885 [details] [review]
[3/4] SoupURI: add SOUP_URI_IS_VALID() and use it for basic  precondition checks

In this patch, field setters don't have precondition checks for the
validity of the URI object itself, only a non-NULL check, to avoid
breaking existing code if it calls soup_uri_new (NULL) and then sets
fields in an unexpected order:

    uri = soup_uri_new (NULL);                    /* uri is invalid */
    soup_uri_set_host (uri, "www.google.com");
    soup_uri_set_query (uri, "q=badgers");
    soup_uri_set_scheme (uri, "http");            /* still invalid... */
    soup_uri_set_path (uri, "/search");           /* finally valid */

Also annotate nullable setter parameters as (allow-none), to justify
why they don't have a precondition check.

Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637
Bug-NB: NB#294977

Comment 24 Simon McVittie 2012-02-06 13:56:06 UTC

Created attachment 206886 [details] [review]
[4/4] soup_uri_new: do not allow invalid URIs to be returned,  except via soup_uri_new (NULL)

Also document the possible NULL return.
    
Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637
Bug-NB: NB#294977

Comment 25 Simon McVittie 2012-02-06 14:05:20 UTC

Created attachment 206887 [details] [review]
[extra 1/2] soup_uri_new: return a URI with all required fields set,  even with NULL argument

This means that every URI returned by any public function is "valid",
at the cost that if a caller of soup_uri_new() assigns directly to
uri->path (as opposed to using soup_uri_set_path()), the empty string
will be leaked.

---

This and the patch that will follow are optional; if you don't want them, fine, but I think it'd be clearer what "valid" means for a SoupURI if there isn't a subset of the API that can't insist on it.

(In reply to comment #19)
> No, the comment was about the soup_uri_new(NULL) case, not the
> soup_uri_new("about:") case. We do want an empty path in the latter case, but
> in the former, the returned URI is all NULL.

The patch in Comment #12 did make soup_uri_new(NULL) exactly equivalent to soup_uri_new("about:"), but here's a version which is a bit more explicit.

Comment 26 Simon McVittie 2012-02-06 14:07:03 UTC

Created attachment 206888 [details] [review]
[extra 2/2] SoupURI: insist that the URI is valid, even in setters

Now that all functions return a valid URI, we can be more strict.

---

Requires [extra 1/2], otherwise you can't convert what soup_uri_new(NULL) returns into something valid (or at least not without poking around in the struct fields).

Comment 27 Dan Winship 2012-02-06 22:16:21 UTC

OK, I just pushed most of these myself because I'm about to do a
release. I'm leaving soup_uri_new(NULL) the way it is, so I'm not
committing the last two.

Thanks for the patches

Comment 28 Dan Winship 2012-02-09 15:05:24 UTC

(In reply to comment #23)
> Created an attachment (id=206885) [details] [review]
> [3/4] SoupURI: add SOUP_URI_IS_VALID() and use it for basic  precondition
> checks

this is causing problems with code that was technically-broken-but-still-working-fine before. (https://bugs.launchpad.net/ubuntu/+source/libsoup2.4/+bug/928820). Attaching a proposed patch.

(The soup_uri_new_with_base() change is a bit weird, but the idea is to allow the caller to call it with base->path==NULL, while still letting the rest of the function not have to worry about that.)

Comment 29 Dan Winship 2012-02-09 15:05:37 UTC

Created attachment 207195 [details] [review]
soup-uri: revert some of the previously-added return-if-fails

Although it has always been documented that a SoupURI must have a
non-NULL path, nothing ever enforced this, and most methods checked
whether it was NULL before looking at it anyway. So lots of existing
code was getting this wrong, and is now breaking because of the
"g_return_if_fail (SOUP_URI_IS_VALID (uri))" checks.

So, change most of those to just g_warn_if_fail() (while adding back
the old return-if-fail !NULL checks), but also fix soup_uri_set_path()
and soup_uri_new_with_base() to handle NULL paths more sanely (after
warning). Also, allow calling the getters on invalid URIs.

Add a new test to uri-testing to make sure that URIs created with
soup_uri_new(NULL) behave as expected at each step of the way...

https://bugzilla.gnome.org/show_bug.cgi?id=669479

Comment 30 Simon McVittie 2012-02-09 16:53:13 UTC

Review of attachment 207195 [details] [review]:

Looks fine, except:

::: libsoup/soup-uri.c
@@ +557,3 @@
 {
+	g_warn_if_fail (SOUP_URI_IS_VALID (uri1));
+	g_warn_if_fail (SOUP_URI_IS_VALID (uri2));

Not that I'm a reviewer or anything, but this will crash if uri1 or uri2 is NULL. Perhaps this?


	g_warn_if_fail (SOUP_URI_IS_VALID (uri1));
	g_warn_if_fail (SOUP_URI_IS_VALID (uri2));
	if (!uri1 || !uri2)
		return (uri1 == uri2); /* already warned */

Comment 31 Dan Winship 2012-02-10 14:11:08 UTC

ah, good catch. I added return-if-fails for != NULL. (Since it would have crashed in that case before, no one could have been doing it.)