Bug 661682 – Add support for deflate Content-Encoding

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 661682 - Add support for deflate Content-Encoding


Summary:	Add support for deflate Content-Encoding


Status:	RESOLVED FIXED

Product:	libsoup
Classification:	Core
Component:	HTTP Transport
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	libsoup-maint@gnome.bugs
QA Contact:	libsoup-maint@gnome.bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-10-13 16:36 UTC by Sergio Villar
Modified:	2011-11-17 18:11 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch (1.58 KB, patch) 2011-10-13 16:41 UTC, Sergio Villar	needs-work	Details \| Review
Patch (3.71 KB, patch) 2011-10-14 17:22 UTC, Sergio Villar	needs-work	Details \| Review
Added tests to coding-test.c (17.43 KB, patch) 2011-10-21 17:29 UTC, Sergio Villar	needs-work	Details \| Review
Add support for deflate (3.66 KB, patch) 2011-11-15 18:54 UTC, Sergio Villar	accepted-commit_now	Details \| Review
Added tests to coding-test.c (17.07 KB, patch) 2011-11-15 18:56 UTC, Sergio Villar	accepted-commit_now	Details \| Review

Description Sergio Villar 2011-10-13 16:36:20 UTC

Websites like http://roland.antoniovillena.es/ are not working fine because we only claim to support gzip.

Comment 1 Dan Winship 2011-10-13 16:41:03 UTC

See bug 632372. I had started trying to implement it at one point, but it's not pleasant.

Note that libsoup explicitly sends "Accept-Encoding: gzip", meaning any server that returns deflate-encoded data is broken. (Oh my god! There are broken web servers on the internet!)

Comment 2 Sergio Villar 2011-10-13 16:41:06 UTC

Created attachment 198952 [details] [review]
Patch

Comment 3 Dan Winship 2011-10-13 16:43:05 UTC

Comment on attachment 198952 [details] [review]
Patch

oops, replied before you attached this. anyway, this handles RAW, but not ZLIB, and if we start saying we support deflate in our Accept-Encoding, then we're going to receive both kinds, so this patch would cause some sites to stop working

Comment 4 Sergio Villar 2011-10-14 17:22:57 UTC

Created attachment 199021 [details] [review]
Patch

(Adding the following JFTR) 

So the issue is that deflate Content-Encoding should mean data compressed with deflate algorithm and zlib headers (RFC 1950). The problem is that some servers decided to return the compressed data (as defined in RFC 1951) without any header if the clients claimed to support deflate.

Taking that into account I added some fallback code to soup-message-io.c that tries to detect if the server is returning us raw data when we expect a zlib header. If so, we give the zlib decompressor a dummy zlib header (got it from the Mozilla network code) and try to resume the data decoding.

Tried it with this test page http://carsten.codimi.de/gzip.yaws/ and it seems to work fine for gzip, zlib and raw compressed data.

Comment 5 Priit Laes (IRC: plaes) 2011-10-14 19:58:26 UTC

Review of attachment 199021 [details] [review]:

Patch works, but has some edge cases...

::: libsoup/soup-message-io.c
@@ -360,0 +361,9 @@
+		} else if (input_cur == 0 &&
+			   !dummy_header_used &&
+			   G_IS_ZLIB_DECOMPRESSOR (converter) &&
... 6 more ...

g_clear_error (error); // moved here from below...

@@ -360,0 +361,17 @@
+		} else if (input_cur == 0 &&
+			   !dummy_header_used &&
+			   G_IS_ZLIB_DECOMPRESSOR (converter) &&
... 14 more ...

Why not: static const char dummy_header[2] = { 0x78, 0x9c }; ?

@@ -360,0 +361,26 @@
+		} else if (input_cur == 0 &&
+			   !dummy_header_used &&
+			   G_IS_ZLIB_DECOMPRESSOR (converter) &&
... 23 more ...

Here you are overwriting non-null error/GError

@@ -360,0 +361,29 @@
+		} else if (input_cur == 0 &&
+			   !dummy_header_used &&
+			   G_IS_ZLIB_DECOMPRESSOR (converter) &&
... 26 more ...

drop the g_clear_error from here.

Comment 6 Sergio Villar 2011-10-17 07:39:37 UTC

(In reply to comment #5)
> Review of attachment 199021 [details] [review]:
> 
> Patch works, but has some edge cases...

Sorry Priit but it's not possible to understand your comments if you always refer to the same lines in my patch.

Comment 7 Sergio Villar 2011-10-17 07:49:13 UTC

(In reply to comment #5)
> Review of attachment 199021 [details] [review]:
> 
> Patch works, but has some edge cases...

Trying to figure out what your comments refer to...

> g_clear_error (error); // moved here from below...

You cannot do that. The error must be set if it was produced by for example a GZIP decompressor. In that case the fallback will not make any sense and the function will return NULL with the error set. 
 
> Why not: static const char dummy_header[2] = { 0x78, 0x9c }; ?

I prefer to keep it with the original value it has in the Mozilla sources. It's kind of a way of keeping the authorship...

> Here you are overwriting non-null error/GError

True, will change that

> drop the g_clear_error from here.

See first comment

Comment 8 Priit Laes (IRC: plaes) 2011-10-17 08:17:32 UTC

OK, it was the first time I tried the "Review" thingy here. If you click on the review link in comment #5 then you'll probably get better view. Yeah, the textual "comments" provided by the splinter tool are next to useless...

If the Review page doesn't work, I'll just add my patch...

Comment 9 Priit Laes (IRC: plaes) 2011-10-17 08:28:07 UTC

> +			if (format == G_ZLIB_COMPRESSOR_FORMAT_ZLIB) {

We need to clear error here in case the format is not headerless deflate. If a new error occurs we set the new error on-top of the old one.

> +				/* Some servers (especially Apache with mod_deflate)
> +				 * return RAW compressed data without the zlib headers
> +				 * when the client claims to support deflate. For
> +				 * those cases use a dummy header (stolen from
> +				 * Mozilla's nsHTTPCompressConv.cpp) and try to
> +				 * continue uncompressing data.
> +				 */
> +				static char dummy_header[2] = {
> +					0x8 + 0x7 * 0x10,
> +					(((0x8 + 0x7 * 0x10) * 0x100 + 30) / 31 * 31) & 0xFF,
> +				};

static const char dummy_header[2] = { 0x78, 0x9c};

I looked at these values and wondered why? Then I saw on the Carsten's page example streams and header was always 0x78 and 0x9c.

> +
> +				g_converter_reset (converter);
> +				result = g_converter_convert (converter,
> +							      dummy_header, sizeof(dummy_header),
> +							      outbuf + outbuf_cur, outbuf_length - outbuf_cur,
> +							      0, &input_used, &outbuf_used, error);

Here we overwrite previously set error (which is always non-NULL)

> +				dummy_header_used = TRUE;
> +				if (result == G_CONVERTER_CONVERTED) {
> +					g_clear_error (error);

So clearing the error here makes no sense...

> +					continue;
> +				}
> +			}
> +
> +			g_free (outbuf);
> +			return NULL;
> +

Comment 10 Dan Winship 2011-10-20 20:27:10 UTC

needs additions to coding-test too. Just add deflate and raw-compressed versions of mbox (there's already a gzip version) to tests/resources, and then make sure that they come out right.

"But how do I get deflate and raw encoded versions of that file?" I dunno. Write a little program using GZlibCompressor to compress them I guess.

Comment 11 Sergio Villar 2011-10-21 17:29:50 UTC

Created attachment 199674 [details] [review]
Added tests to coding-test.c

Added tests for deflate'd content to coding-test.c.

I have refactored the code so all the checks are now performed by two utility functions.

Comment 12 Sergio Villar 2011-10-21 17:30:46 UTC

The above code does not include the mbox.zlib and mbox.raw files used by the test but I have both of them in my machine.

Comment 13 Dan Winship 2011-11-12 13:59:20 UTC

Comment on attachment 199021 [details] [review]
Patch

> > Here you are overwriting non-null error/GError
> 
> True, will change that

(You never updated this patch for that.)

> > Why not: static const char dummy_header[2] = { 0x78, 0x9c }; ?
> 
> I prefer to keep it with the original value it has in the Mozilla sources. It's
> kind of a way of keeping the authorship...

Nah, you already mention in the comment that it was stolen from there. I think the simpler version is better.

Also, "header" in the context of soup-message-io suggests "HTTP header" to me; when I first saw the "dummy_header_used" variable, I thought you were going to be inserting a dummy "Content-Encoding" header into the message... How about "dummy_zlib_header" and "dummy_zlib_header_used"?

Comment 14 Dan Winship 2011-11-12 14:11:45 UTC

Comment on attachment 199674 [details] [review]
Added tests to coding-test.c

>Refactored the original code. A couple of utility functions perform now all
>the checks.

it would be nice if the refactoring was a separate patch (but I'll still take the patch even if you don't do that)

The set of new tests should be:

  GET /mbox, Accept-Encoding: deflate
  GET /mbox, Accept-Encoding: deflate, plus trailing junk
  GET /mbox, Accept-Encoding: deflate, with server error (*)

  GET /mbox, Accept-Encoding: deflate, prefer-deflate-raw
  GET /mbox, Accept-Encoding: deflate, prefer-deflate-raw, plus trailing junk (*)
  GET /mbox, Accept-Encoding: deflate, prefer-deflate-raw, with server error (*)

This patch doesn't implement the (*)ed ones. ("with server error" is the case where the server adds the Content-Encoding header but then sends the unencoded data, and we need to make sure we recognize that and fall back to not decoding.)

You don't need the "GET /mbox.zlib" and "GET /mbox.raw" tests, because deflate and raw-zlib aren't file types the way gzip is, so there'd never be a resource where the deflate/raw encoding was part of the Content-Type rather than part of the Content-Encoding.

Comment 15 Sergio Villar 2011-11-14 11:42:31 UTC

(In reply to comment #13)
> (From update of attachment 199021 [details] [review])
> > > Here you are overwriting non-null error/GError
> > 
> > True, will change that
> 
> (You never updated this patch for that.)

Yeah forgot to do that.

> > > Why not: static const char dummy_header[2] = { 0x78, 0x9c }; ?
> > 
> > I prefer to keep it with the original value it has in the Mozilla sources. It's
> > kind of a way of keeping the authorship...
> 
> Nah, you already mention in the comment that it was stolen from there. I think
> the simpler version is better.

Fair enough.
 
> Also, "header" in the context of soup-message-io suggests "HTTP header" to me;
> when I first saw the "dummy_header_used" variable, I thought you were going to
> be inserting a dummy "Content-Encoding" header into the message... How about
> "dummy_zlib_header" and "dummy_zlib_header_used"?

Makes perfect sense.

Comment 16 Sergio Villar 2011-11-14 11:46:28 UTC

(In reply to comment #14)
> The set of new tests should be:

[...]

>   GET /mbox, Accept-Encoding: deflate, prefer-deflate-raw
>   GET /mbox, Accept-Encoding: deflate, prefer-deflate-raw, plus trailing junk
> (*)

I'm not sure about this one. I don't think zlib's inflate would be able to successfully deal with that knowing that the raw data does not have any header nor crc check. At least in the tests I run I was always getting data errors when trying to uncompress raw data+junk.

Comment 17 Sergio Villar 2011-11-15 18:54:34 UTC

Created attachment 201473 [details] [review]
Add support for deflate

Comment 18 Sergio Villar 2011-11-15 18:56:41 UTC

Created attachment 201474 [details] [review]
Added tests to coding-test.c

Note: this does not include mbox.raw and mbox.zlib files to be uploaded to the repos.

Comment 19 Dan Winship 2011-11-17 13:14:38 UTC

Comment on attachment 201474 [details] [review]
Added tests to coding-test.c

make sure to add the data files before committing

Comment 20 Sergio Villar 2011-11-17 18:11:10 UTC

Committed:

6d8ff9731eae9bf57982de273c8a45673996bcb4: actual patch
e44809a82be8dcaa97ffdc9788e00e08f874dce9: mbox compressed files
dbcd20621c274712460401af58f8aede0027d804: coding-test.c