GNOME Bugzilla – Bug 667637
SoupURI can end up with NULL path, which is documented as not allowed
Last modified: 2012-02-10 14:11:08 UTC
Similar to Bug #666316, I'm still trying to get a libsoup-based app past a HTTP protocol fuzzer without errors. SoupURI is documented as having a mandatory scheme and path (so "foo:" is { scheme => "foo", path => "" } and "about:mozilla" is { scheme => "about", path => "mozilla" }). However, if you actually parse "foo:", you get { scheme => "foo" } with path left NULL. This violates assumptions elsewhere: for instance, this means that if you send "GET http: HTTP/1.0" to tests/simple-httpd, you get a segfault in the code in SoupServer that avoids Bug #653258, which calls soup_uri_decode on uri->path. The patches I'm about to attach aren't the full solution - in particular, "GET http: HTTP/1.0" still gives me a critical elsewhere - but they're a start...
Created attachment 204957 [details] [review] SoupURI: give "foo:" a non-NULL path member SoupURI documents uri->path as being required, and got_headers in SoupServer assumes that it's non-NULL, but in fact parsing a URI consisting solely of a scheme ("foo:") would leave path = NULL. Bug-NB: NB#294977 Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Created attachment 204958 [details] [review] soup_uri_new: return a valid URI for a NULL parameter A URI without a scheme or path is considered to be invalid, but soup_uri_new (NULL) returned a zero-filled URI. Since it's incorrect to use a zero-filled URI without setting at least a scheme and path, filling those in initially shouldn't cause any behaviour changes. This does mean that callers assigning directly to uri->path (without freeing any previous value) will leak, but that seems better than crashing, and Google Code Search indicates that it's rare to do so (the only direct assignment I found was using uri->scheme incorrectly anyway, so they already leaked a string). Also change the gobject-introspection annotation to say that a NULL parameter is allowed. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Created attachment 204959 [details] [review] SoupURI: add soup_uri_is_valid() and use it for precondition checks Also annotate nullable setter parameters as (allow-none), to justify why they don't have a precondition check. soup_uri_equal() and soup_uri_host_equal() just warn and carry on if invalid, because invalid URIs can still be compared for equality. Similarly, if soup_uri_uses_default_port() is used on an incomplete URI, it still returns the same thing it used to. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Created attachment 204960 [details] [review] soup_uri_new: do not allow invalid URIs to be returned at all Also document the possible NULL return. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Created attachment 204961 [details] [review] soup_uri_uses_default_port: correct documentation We now know the default for ftp, too.
Created attachment 204962 [details] [review] Header parsing test: verify that a full URI can appear after the "GET"
Created attachment 204963 [details] [review] uri-parsing test: verify that URIs are split correctly Testing the round-trip (split + reassemble) is necessary, but not sufficient: a large proportion of SoupURI's API is in terms of the separate parts, and this was never explicitly tested before. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk>
Created attachment 204964 [details] [review] uri-parsing test: add a case that was exercised by a protocol fuzzer This was suspected to cause a crash via a NULL path; while it seems it doesn't actually have that problem, it still seems worth testing.
(In reply to comment #0) > The patches I'm about to attach aren't the full solution - in particular, "GET > http: HTTP/1.0" still gives me a critical elsewhere - but they're a start... More specifically: * "GET http:" tries to send a redirect to soup_uri_new_with_base ("http:", "./") but a http URI with no host is considered to be invalid, so it criticals * "GET foo:" tries to send a redirect to "foo:/", which is syntactically OK but probably not what we want I'm open to suggestions... (Tested via: run simple-httpd; type HTTP into a telnet session into its listening port.)
(In reply to comment #6) > Header parsing test: verify that a full URI can appear after the "GET" RFC 2616 says: "To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies." - so libsoup is right to accept these on the server side, even though the RFC also says that clients MUST use an abs_path ("GET /blah HTTP/1.1") when not talking to a proxy.
Created attachment 204966 [details] [review] SoupServer: reject non-HTTP URIs and URIs with no host This prevents a critical warning and other misbehaviour in the simple-httpd test, when using requests like "GET http: HTTP/1.0" or "GET foo: HTTP/1.0". Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637 Bug-NB: NB#294977 Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk> --- Having looked at RFC 2616 again, I think this is a reasonable solution to Comment #9.
Comment on attachment 204958 [details] [review] soup_uri_new: return a valid URI for a NULL parameter Scheme is required too though, and this patch doesn't initialize that. And besides, the docs already say: * (You will need to * call at least soup_uri_set_scheme() and soup_uri_set_path(), since * those fields are required.) So as long as we have a return-if-fail anywhere someone would be able to shoot themselves in the foot, I don't think we should change the behavior here.
Comment on attachment 204959 [details] [review] SoupURI: add soup_uri_is_valid() and use it for precondition checks > soup_uri_equal() and soup_uri_host_equal() just warn and carry on if > invalid, because invalid URIs can still be compared for equality. > Similarly, if soup_uri_uses_default_port() is used on an incomplete URI, > it still returns the same thing it used to. Hm... no, that doesn't seem right. If it's allowed then it shouldn't warn, and if it's not allowed, then you shouldn't get a meaningful return value back. I'd just change them to be return-if-fail as well. >+ * soup_uri_is_valid: I think I'd make this a macro like SOUP_URI_VALID_FOR_HTTP(). (And then that also gets rid of a problem with this patch, where if you pass in a NULL URI to a method, you'll get CRITICALs from both soup_uri_is_valid() and the method that called it.
Comment on attachment 204962 [details] [review] Header parsing test: verify that a full URI can appear after the "GET" >+ /* It's better for this to be passed through: this means a SoupServer >+ * could implement ftp-over-http proxying, for instance >+ */ >+ { "GET with full URI of unrecognised scheme", >+ "GET AbOuT: HTTP/1.1\r\n", -1, though we should make sure somewhere that SoupServer rejects those URIs by default...
Comment on attachment 204963 [details] [review] uri-parsing test: verify that URIs are split correctly This is great, but... >+ { "ftp://user@host/path", "ftp://user@host/path", >+ { /* scheme = */ "ftp", >+ /* user = */ "user", >+ /* password = */ NULL, >+ /* host = */ "host", >+ /* port = */ 21, >+ /* path = */ "/path", >+ /* query = */ NULL, >+ /* fragment = */ NULL } }, the examples are gigantic now. I think I'd prefer { "ftp://user@host/path", "ftp://user@host/path", { "ftp", "user", NULL, "host", 21, "/path", NULL, NULL } },
Comment on attachment 204964 [details] [review] uri-parsing test: add a case that was exercised by a protocol fuzzer ok, with the same modification as the other test cases
Comment on attachment 204966 [details] [review] SoupServer: reject non-HTTP URIs and URIs with no host ok, although this wrecks your "ftp proxy" use case... but it wouldn't actually be possible to implement that with SoupServer currently anyway, so I guess this makes sense for now. (We should definitely be checking at least soup_uri_is_valid() there.)
(In reply to comment #12) > Scheme is required too though, and this patch doesn't initialize that. I'm pretty sure it returns { scheme => "about", path => "" }? ... > So as long as we have a return-if-fail anywhere someone would be able to shoot > themselves in the foot, I don't think we should change the behavior here. ... but OK, I'll omit this one. (In reply to comment #13) > I'd just change [equals/hash] to be return-if-fail as well. OK. I'm probably not going to be able to do this immediately (working on several things in parallel), but I'll do that when I return to libsoup. > >+ * soup_uri_is_valid: > > I think I'd make this a macro like SOUP_URI_VALID_FOR_HTTP(). OK. I even wondered about imitating how GObject would look (SOUP_IS_URI()) but I think SOUP_IS_VALID is probably better. (In reply to comment #15) > the examples are gigantic now. I think I'd prefer > > { "ftp://user@host/path", "ftp://user@host/path", > { "ftp", "user", NULL, "host", 21, "/path", NULL, NULL } }, OK. (In reply to comment #17) > ok, although this wrecks your "ftp proxy" use case... but it wouldn't actually > be possible to implement that with SoupServer currently anyway Yeah, I think SoupServer is the right layer to be rejecting non-http at the moment - as I see it, the header-parsing code implements the full HTTP RFC (including e.g. being a proxy), whereas SoupServer implements a small-enough-to-be-useful subset of the RFC. (In reply to comment #0) > I'm still trying to get a libsoup-based app past a HTTP > protocol fuzzer without errors. The people who can run the fuzzer (sadly, it's proprietary and I don't have access to just run it myself, so I get error reports a bit at a time) report that there are more error cases, so expect more patches when I return to this... :-(
(In reply to comment #18) > (In reply to comment #12) > > Scheme is required too though, and this patch doesn't initialize that. > > I'm pretty sure it returns { scheme => "about", path => "" }? ... No, the comment was about the soup_uri_new(NULL) case, not the soup_uri_new("about:") case. We do want an empty path in the latter case, but in the former, the returned URI is all NULL. (Oh, the addition of (allow-none) to soup_uri_new() in that patch was right though... you can slip that into one of the other patches...)
(In reply to comment #1) > SoupURI: give "foo:" a non-NULL path member Pushed (In reply to comment #5) > soup_uri_uses_default_port: correct documentation Pushed (In reply to comment #6) > Header parsing test: verify that a full URI can appear after the "GET" Pushed (In reply to comment #11) > SoupServer: reject non-HTTP URIs and URIs with no host Pushed (In reply to comment #15) > (From update of attachment 204963 [details] [review]) > This is great, but... > > >+ { "ftp://user@host/path", "ftp://user@host/path", > >+ { /* scheme = */ "ftp", > >+ /* user = */ "user", > >+ /* password = */ NULL, > >+ /* host = */ "host", > >+ /* port = */ 21, > >+ /* path = */ "/path", > >+ /* query = */ NULL, > >+ /* fragment = */ NULL } }, > > the examples are gigantic now. I think I'd prefer > > { "ftp://user@host/path", "ftp://user@host/path", > { "ftp", "user", NULL, "host", 21, "/path", NULL, NULL } }, Pushed with that change, I hope that's OK. (In reply to comment #16) > (From update of attachment 204964 [details] [review]) > ok, with the same modification as the other test cases Likewise
Created attachment 206883 [details] [review] [1/4] SoupProxyResolver: construct a valid SoupURI path = NULL is not considered valid. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk> Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637 Bug-NB: NB#294977
Created attachment 206884 [details] [review] [2/4] soup_uri_copy_host: always set the path to something non-NULL Not doing so is considered to be invalid. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk> Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637 Bug-NB: NB#294977
Created attachment 206885 [details] [review] [3/4] SoupURI: add SOUP_URI_IS_VALID() and use it for basic precondition checks In this patch, field setters don't have precondition checks for the validity of the URI object itself, only a non-NULL check, to avoid breaking existing code if it calls soup_uri_new (NULL) and then sets fields in an unexpected order: uri = soup_uri_new (NULL); /* uri is invalid */ soup_uri_set_host (uri, "www.google.com"); soup_uri_set_query (uri, "q=badgers"); soup_uri_set_scheme (uri, "http"); /* still invalid... */ soup_uri_set_path (uri, "/search"); /* finally valid */ Also annotate nullable setter parameters as (allow-none), to justify why they don't have a precondition check. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk> Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637 Bug-NB: NB#294977
Created attachment 206886 [details] [review] [4/4] soup_uri_new: do not allow invalid URIs to be returned, except via soup_uri_new (NULL) Also document the possible NULL return. Signed-off-by: Simon McVittie <simon.mcvittie@collabora.co.uk> Bug: https://bugzilla.gnome.org/show_bug.cgi?id=667637 Bug-NB: NB#294977
Created attachment 206887 [details] [review] [extra 1/2] soup_uri_new: return a URI with all required fields set, even with NULL argument This means that every URI returned by any public function is "valid", at the cost that if a caller of soup_uri_new() assigns directly to uri->path (as opposed to using soup_uri_set_path()), the empty string will be leaked. --- This and the patch that will follow are optional; if you don't want them, fine, but I think it'd be clearer what "valid" means for a SoupURI if there isn't a subset of the API that can't insist on it. (In reply to comment #19) > No, the comment was about the soup_uri_new(NULL) case, not the > soup_uri_new("about:") case. We do want an empty path in the latter case, but > in the former, the returned URI is all NULL. The patch in Comment #12 did make soup_uri_new(NULL) exactly equivalent to soup_uri_new("about:"), but here's a version which is a bit more explicit.
Created attachment 206888 [details] [review] [extra 2/2] SoupURI: insist that the URI is valid, even in setters Now that all functions return a valid URI, we can be more strict. --- Requires [extra 1/2], otherwise you can't convert what soup_uri_new(NULL) returns into something valid (or at least not without poking around in the struct fields).
OK, I just pushed most of these myself because I'm about to do a release. I'm leaving soup_uri_new(NULL) the way it is, so I'm not committing the last two. Thanks for the patches
(In reply to comment #23) > Created an attachment (id=206885) [details] [review] > [3/4] SoupURI: add SOUP_URI_IS_VALID() and use it for basic precondition > checks this is causing problems with code that was technically-broken-but-still-working-fine before. (https://bugs.launchpad.net/ubuntu/+source/libsoup2.4/+bug/928820). Attaching a proposed patch. (The soup_uri_new_with_base() change is a bit weird, but the idea is to allow the caller to call it with base->path==NULL, while still letting the rest of the function not have to worry about that.)
Created attachment 207195 [details] [review] soup-uri: revert some of the previously-added return-if-fails Although it has always been documented that a SoupURI must have a non-NULL path, nothing ever enforced this, and most methods checked whether it was NULL before looking at it anyway. So lots of existing code was getting this wrong, and is now breaking because of the "g_return_if_fail (SOUP_URI_IS_VALID (uri))" checks. So, change most of those to just g_warn_if_fail() (while adding back the old return-if-fail !NULL checks), but also fix soup_uri_set_path() and soup_uri_new_with_base() to handle NULL paths more sanely (after warning). Also, allow calling the getters on invalid URIs. Add a new test to uri-testing to make sure that URIs created with soup_uri_new(NULL) behave as expected at each step of the way... https://bugzilla.gnome.org/show_bug.cgi?id=669479
Review of attachment 207195 [details] [review]: Looks fine, except: ::: libsoup/soup-uri.c @@ +557,3 @@ { + g_warn_if_fail (SOUP_URI_IS_VALID (uri1)); + g_warn_if_fail (SOUP_URI_IS_VALID (uri2)); Not that I'm a reviewer or anything, but this will crash if uri1 or uri2 is NULL. Perhaps this? g_warn_if_fail (SOUP_URI_IS_VALID (uri1)); g_warn_if_fail (SOUP_URI_IS_VALID (uri2)); if (!uri1 || !uri2) return (uri1 == uri2); /* already warned */
ah, good catch. I added return-if-fails for != NULL. (Since it would have crashed in that case before, no one could have been doing it.)