After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 522772 - Content-Encoding support
Content-Encoding support
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: HTTP Transport
2.4.x
Other Linux
: Normal normal
: GNOME 2.24
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
: 587912 598285 603269 603616 610684 (view as bug list)
Depends on: 591739
Blocks:
 
 
Reported: 2008-03-16 15:14 UTC by Dan Winship
Modified: 2010-03-04 00:34 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
webkit patch to use SoupContentDecoder (1.59 KB, patch)
2009-10-27 17:51 UTC, Dan Winship
none Details | Review

Description Dan Winship 2008-03-16 15:14:24 UTC
libsoup should support "Content-Encoding: deflate" and "Content-Encoding: gzip".

Client-side:

  - Have a flag like SOUP_SESSION_DO_CONTENT_ENCODING, that tells the
    session to automatically decode encoded responses. (Can't do this
    automatically; it would mess up apps that are currently doing it by
    hand.)

  - How does this interact with soup_message_set_chunk_allocator()?
    (We need to return the data to the caller in the buffers it has
    allocated. So this means one of: (a) we allocate our own buffers for
    reading and then decode into the app-provided buffers, (b) we read
    into the app-provided buffers and then try to decode in place, using
    as little temporary memory as possible, or (c) we ask the app for extra
    buffers, and read into some and decode into others.)

  - Maybe expose the encoding/decoding code in the public API so that
    apps can easily do it by hand in cases where our API doesn't work for
    them?

Server-side:

  - Allow server to either *declare* a Content-Encoding ("this content is
    already compressed") or *request* it ("i'd like this content to be
    compressed before sending"). The latter is sometimes used in web
    services.


Someday (maybe not right away), it might be nice to be able to plug in additional content-coding types. It might also be useful to support the corresponding Transfer-Encodings, although since they are not widely used (not at all used?) this isn't as important.
Comment 1 Dan Winship 2008-03-16 15:37:31 UTC
some work on this has been done on the content-coding branch of
http://gnome.org/~danw/libsoup.git (though it's also a little bit mixed in
with some thoughts about "I/O filters" as discussed in http://live.gnome.org/LibSoup/ToDo)
Comment 2 Dan Winship 2009-03-02 19:04:04 UTC
(In reply to comment #0)
>   - Have a flag like SOUP_SESSION_DO_CONTENT_ENCODING, that tells the
>     session to automatically decode encoded responses. (Can't do this
>     automatically; it would mess up apps that are currently doing it by
>     hand.)

Well, it should be a feature, not a flag, and it should be included in the SOUP_TYPE_GNOME_FEATURES_2_28 metafeature.

I'm thinking that individual encodings should also be features, and the overall encoding feature would use soup_session_get_features() to find all the other SOUP_TYPE_ENCODING features attached to the session. So then if someone starts standardizing a bzip2 or 7zip or whatever encoding some day, people can implement it themselves and attach it as a feature, without having to wait for a new libsoup release. (Or maybe it would be something that would never even make it into libsoup.)

>   - How does this interact with soup_message_set_chunk_allocator()?

(a) is probably the correct answer. (Read into our own buffers and decode into app-provided buffers.)

> Server-side:
> 
>   - Allow server to either *declare* a Content-Encoding ("this content is
>     already compressed") or *request* it ("i'd like this content to be
>     compressed before sending"). The latter is sometimes used in web
>     services.

The former could possibly just be:
  soup_message_headers_append (msg->response_headers, "Content-Encoding", "gzip");

We don't currently have "features" on SoupServer, though there's a server-features branch in git.
Comment 3 Alexander V. Butenko 2009-03-09 01:19:07 UTC
i was googled a bit, and found that in old 2004 somebody already made a patch for libsoup. 

Like author says "It includes both encoding and decoding messages with the gzip, deflate, and bzip2 algorithms, and can support layered Content-Encodings if called upon to do so."

I checked a patch and API of old soup is different, but probably a lot can be reused from it.

URL: http://mail.gnome.org/archives/evolution-hackers/2004-October/msg00159.html
Comment 4 Dan Winship 2009-03-09 14:05:16 UTC
(In reply to comment #3)
> i was googled a bit, and found that in old 2004 somebody already made a patch
> for libsoup. 

Yeah, I'd seen that before. I forget the details and what it was that I didn't like about it. At any rate, the code already in the content-coding branch does everything that patch had done.
Comment 5 Alexander V. Butenko 2009-05-17 23:51:56 UTC
Is there is any ways to try out current patch on recent libsoup? 
I was trying to pull content-coding branch but it seems broken. Or probably any tips about how to make it works? 

Also probably u can write a little breakdown of what is missing in the patch in order to merge it the head. i will try to code something then.
Comment 6 Dan Winship 2009-05-18 12:51:17 UTC
(In reply to comment #5)
> Is there is any ways to try out current patch on recent libsoup? 
> I was trying to pull content-coding branch but it seems broken. Or probably any
> tips about how to make it works?

The code was never finished. It probably never compiled.

> Also probably u can write a little breakdown of what is missing in the patch in
> order to merge it the head. i will try to code something then.

First off, ignore soup-io-filter.[ch] completely.

soup-session.c:encoding_handler() is incomplete and needs to be finished. Basically, it needs to do *something* such that if there's a Content-Encoding or Transfer-Encoding header, that it will undo the effects of that before the SoupMessage signals (got-chunk, got-body) get emitted. Actually, most of the work will end up in some mix of soup-message, soup-message-body, and/or soup-message-io, but I don't know where exactly. SoupSession gets involved though because it's the one that has the list of available encoding types, so it has to tell the message/body/iodata what class to use to do the decoding.

It's also possible that the SoupCoding API will turn out to be badly designed for this purpose and you'll want to change it to something different. It's also possible that one or all of the coding implementations is buggy. I never got as far as testing any of them.
Comment 7 Martin Sourada 2009-05-23 12:17:47 UTC
Just FYI that as of now, due to this missing feature I am unable to access wikipedia's four-fource article:

http://en.wikipedia.org/wiki/Four-force

Webkit's web inspector tells me that content encoding is gzip. This left me to wonder why this exact page as I've accessed loads of various of wikipedia pages and this is the first one that does not work... 

So... It would be very nice to have this implemented if people want to go with webkit for epiphany 2.28 as it seems the pages that need this might not be as rare as I thought... 

Another page that needs this is (also gzip)

http://anidb.net/
Comment 8 Dan Winship 2009-05-28 19:46:16 UTC
(In reply to comment #7)
> So... It would be very nice to have this implemented if people want to go with
> webkit for epiphany 2.28 as it seems the pages that need this might not be as
> rare as I thought... 

The current version of webkitgtk explicitly sends

    Allow-Encoding: identity

which tells the server that it doesn't support gzip. (HTTP actually assumes that if you don't say otherwise, then that means you support gzip, deflate, and compress encoding.) So this should work with the latest webkit.
Comment 9 Dan Winship 2009-07-07 12:50:26 UTC
*** Bug 587912 has been marked as a duplicate of this bug. ***
Comment 10 Christophe Gillette 2009-07-23 22:55:43 UTC
Hi,
Any update on this feature? 
While it is working with the current webkit (comment #8), lots of development is currently done on small devices which have bandwidth constraints, and where supporting deflate would help.
Thanks,
Comment 11 Dan Winship 2009-07-24 02:37:36 UTC
Well, there are a handful of sites (notably archive.org) that use "Content-Encoding: gzip" even when webkit sends "Accept-Encoding: identity". Hrmph.

Anyway, no real update. No one has been working on it, and it is *highly* unlikely it will be finished for 2.28 at this point.

Note that the addition of SoupContentSniffer now adds another wrinkle; content-decoding needs to happen before sniffing, or else the sniffer will always just return "application/gzip".

This gets back into the idea that the soup-io-filter in the content-coding branch was moving towards; we could have a series of filters on the input data, so you'd get something like:

   soup_socket_read -> Transfer-Encoding filter -> Content-Encoding filter -> SoupContentSniffer -> got-chunk

where the two encoding filters would actual output different data than they input, but the sniffing "filter" would just be buffering up data until it got enough to sniff, and then it would spit it all out at once, and then after that behave as just an identity filter.
Comment 12 Bastien Nocera 2009-08-10 11:03:51 UTC
FYI, it makes pages like this one unviewable:
http://www.spinics.net/lists/hotplug/msg02404.html
Comment 13 Reinout van Schouwen 2009-10-13 15:44:47 UTC
*** Bug 598285 has been marked as a duplicate of this bug. ***
Comment 14 Dan Winship 2009-10-27 17:51:18 UTC
Pushed my current work to the "coding" branch on git.gnome.org. The "deflate" decoding is slightly broken (it works for some sites but not others), so it's disabled at the moment. (We send "Allow-Encoding: gzip".)

Once I fix deflate I'll push to master. This should be in libsoup 2.28.2.
Comment 15 Dan Winship 2009-10-27 17:51:46 UTC
Created attachment 146361 [details] [review]
webkit patch to use SoupContentDecoder
Comment 16 Reinout van Schouwen 2009-11-29 21:08:51 UTC
*** Bug 603269 has been marked as a duplicate of this bug. ***
Comment 17 Gustavo Noronha (kov) 2009-12-02 19:40:45 UTC
*** Bug 603616 has been marked as a duplicate of this bug. ***
Comment 18 Dan Winship 2009-12-16 13:49:30 UTC
pushed to git with support for "gzip" only, and will end up in 2.28.2
Comment 19 Gustavo Noronha (kov) 2010-02-23 12:27:23 UTC
*** Bug 610684 has been marked as a duplicate of this bug. ***