GNOME Bugzilla – Bug 590436
need to be able to disable Content-Sniffing on a request
Last modified: 2009-08-02 15:37:05 UTC
One of the tests in WebKit Bug #27143 (https://bugs.webkit.org/show_bug.cgi?id=27143) fails because got-chunk isn't being emitted when it should be. Failing WebKit test: http/tests/xmlhttprequest/small-chunks-response-text.html Here's the output of the diff of expected against actual: --- /home/jmalonzo/OpenSource/WebKit/TestResults/http/tests/xmlhttprequest/small-chunks-response-text-expected.txt 2009-08-01 19:07:20.605723802 +1000 +++ /home/jmalonzo/OpenSource/WebKit/TestResults/http/tests/xmlhttprequest/small-chunks-response-text-actual.txt 2009-08-01 19:07:20.605723802 +1000 @@ -1,6 +1,2 @@ This tests that an open XMLHttpRequest connection will call onreadystatechange correctly when data is sent over an open connection. -0 responseText: This is chunk number 0 -1 responseText: This is chunk number 0This is chunk number 1 -2 responseText: This is chunk number 0This is chunk number 1This is chunk number 2 -3 responseText: This is chunk number 0This is chunk number 1This is chunk number 2This is chunk number 3 -4 responseText: This is chunk number 0This is chunk number 1This is chunk number 2This is chunk number 3This is chunk number 4 +0 responseText: This is chunk number 0This is chunk number 1This is chunk number 2This is chunk number 3This is chunk number 4
Created attachment 139674 [details] [review] patch In read_body_chunk, io_handle_sniffing's done_reading should be TRUE since the soup_socket_read was OK. No regression in the sniffing tests and the WebKit test mentioned in this bug now passes.
done_reading means that the entire response body has been read, so setting it here would be wrong. The problem here is that libsoup believes WebKit wants this response to be Content-Type sniffed, and so it is waiting until it gets 512 bytes of data before emitting any chunks. (Since the total response body is < 512 bytes, that means it ends up waiting until the entire body has been read, and then emitting it all at once.) So the question becomes, what do the other platforms do differently? If they're not doing content-sniffing, then we need to figure out why, and then make WebKitGTK tell SoupContentSniffer to not sniff in this case as well. (Maybe XMLHttpRequest requests are never sniffed? That seems quite plausible.) Alternatively, maybe the other platforms have lame sniffers that only sniff the first chunk even if it's too small to be useful, in which case this is a bug in the other platforms. :) Or, maybe the other platforms do queue up multiple chunks for sniffing like libsoup does, but then they re-emit the chunks as they had been received, rather than emitting a single merged chunk as libsoup does. Note though FWIW that the documentation for SoupMessage::got-chunk has always warned that libsoup may split or merge chunks for its own purposes, and even if it didn't, in any real-world scenario there's the possibility of a proxy between the client and server, so a client script cannot assume it will see precisely the same set of chunks that the server sent. For that reason, the test is IMHO buggy. A better test would be to use larger chunks, and just check that onreadystatechange is called multiple times, with a subset of the full response each time, but not worry about it being called a *specific* number of times, at specific points in the response. (Leaving this open and NEEDINFO though, as I suspect that the immediate source of the test failure may be that we're not supposed to be sniffing in this case.)
(In reply to comment #2) > The problem here is that libsoup believes WebKit wants this response to be > Content-Type sniffed, and so it is waiting until it gets 512 bytes of data > before emitting any chunks. (Since the total response body is < 512 bytes, that > means it ends up waiting until the entire body has been read, and then emitting > it all at once.) Is 512 bytes an absolute requirement? The spec for "feed or html" says "The user agent may wait for 512 or more bytes of the resource to be available." which isn't a MUST. So I guess setting an absolute requirement of 512 bytes before sniffing the data may be an issue here in terms of interoperability with other implementations. FWIW, the test is sending a Content-Type of text/html that's why I referred to the "feed or html" section. Nevertheless, sniffing "text or binary" starts at 4 bytes so setting this 512 bytes may be an issue. > So the question becomes, what do the other platforms do differently? If they're > not doing content-sniffing, then we need to figure out why, and then make > WebKitGTK tell SoupContentSniffer to not sniff in this case as well. (Maybe > XMLHttpRequest requests are never sniffed? That seems quite plausible.) > Looking at WebCore/xml/XMLHttpRequest.cpp, WebKit creates an _async_ loader with "Do not content sniff" in of its parameters, so sniffing is optional since I don't see this when it creates a _sync_ loader. > Alternatively, maybe the other platforms have lame sniffers that only sniff the > first chunk even if it's too small to be useful, in which case this is a bug in > the other platforms. :) > > Or, maybe the other platforms do queue up multiple chunks for sniffing like > libsoup does, but then they re-emit the chunks as they had been received, > rather than emitting a single merged chunk as libsoup does. Note though FWIW > that the documentation for SoupMessage::got-chunk has always warned that > libsoup may split or merge chunks for its own purposes, and even if it didn't, > in any real-world scenario there's the possibility of a proxy between the > client and server, so a client script cannot assume it will see precisely the > same set of chunks that the server sent. > > For that reason, the test is IMHO buggy. A better test would be to use larger > chunks, and just check that onreadystatechange is called multiple times, with a > subset of the full response each time, but not worry about it being called a > *specific* number of times, at specific points in the response. > Like I said above on the requirement to wait for 512 bytes before sniffing =)
(In reply to comment #3) > Is 512 bytes an absolute requirement? It's not, but if you just sniff based on the first packet, then you'd end up with situations where a resource may or may not get sniffed correctly based on the MTU of the network or something. > Looking at WebCore/xml/XMLHttpRequest.cpp, WebKit creates an _async_ loader > with "Do not content sniff" in of its parameters, so sniffing is optional since > I don't see this when it creates a _sync_ loader. That seems like it's probably a mistake? At any rate, digging through WebKit, it looks like ResourceHandleSoup doesn't check d->m_shouldContentSniff, so it's doing sniffing here even though the caller didn't want it. On the libsoup side, there is no way to tell SoupContentSniffer to not sniff a particular message though, so we need to do something about that. Possibly bug 574773.
patch attached to 574773 *** This bug has been marked as a duplicate of 574773 ***