GNOME Bugzilla – Bug 522772
Content-Encoding support
Last modified: 2010-03-04 00:34:29 UTC
libsoup should support "Content-Encoding: deflate" and "Content-Encoding: gzip". Client-side: - Have a flag like SOUP_SESSION_DO_CONTENT_ENCODING, that tells the session to automatically decode encoded responses. (Can't do this automatically; it would mess up apps that are currently doing it by hand.) - How does this interact with soup_message_set_chunk_allocator()? (We need to return the data to the caller in the buffers it has allocated. So this means one of: (a) we allocate our own buffers for reading and then decode into the app-provided buffers, (b) we read into the app-provided buffers and then try to decode in place, using as little temporary memory as possible, or (c) we ask the app for extra buffers, and read into some and decode into others.) - Maybe expose the encoding/decoding code in the public API so that apps can easily do it by hand in cases where our API doesn't work for them? Server-side: - Allow server to either *declare* a Content-Encoding ("this content is already compressed") or *request* it ("i'd like this content to be compressed before sending"). The latter is sometimes used in web services. Someday (maybe not right away), it might be nice to be able to plug in additional content-coding types. It might also be useful to support the corresponding Transfer-Encodings, although since they are not widely used (not at all used?) this isn't as important.
some work on this has been done on the content-coding branch of http://gnome.org/~danw/libsoup.git (though it's also a little bit mixed in with some thoughts about "I/O filters" as discussed in http://live.gnome.org/LibSoup/ToDo)
(In reply to comment #0) > - Have a flag like SOUP_SESSION_DO_CONTENT_ENCODING, that tells the > session to automatically decode encoded responses. (Can't do this > automatically; it would mess up apps that are currently doing it by > hand.) Well, it should be a feature, not a flag, and it should be included in the SOUP_TYPE_GNOME_FEATURES_2_28 metafeature. I'm thinking that individual encodings should also be features, and the overall encoding feature would use soup_session_get_features() to find all the other SOUP_TYPE_ENCODING features attached to the session. So then if someone starts standardizing a bzip2 or 7zip or whatever encoding some day, people can implement it themselves and attach it as a feature, without having to wait for a new libsoup release. (Or maybe it would be something that would never even make it into libsoup.) > - How does this interact with soup_message_set_chunk_allocator()? (a) is probably the correct answer. (Read into our own buffers and decode into app-provided buffers.) > Server-side: > > - Allow server to either *declare* a Content-Encoding ("this content is > already compressed") or *request* it ("i'd like this content to be > compressed before sending"). The latter is sometimes used in web > services. The former could possibly just be: soup_message_headers_append (msg->response_headers, "Content-Encoding", "gzip"); We don't currently have "features" on SoupServer, though there's a server-features branch in git.
i was googled a bit, and found that in old 2004 somebody already made a patch for libsoup. Like author says "It includes both encoding and decoding messages with the gzip, deflate, and bzip2 algorithms, and can support layered Content-Encodings if called upon to do so." I checked a patch and API of old soup is different, but probably a lot can be reused from it. URL: http://mail.gnome.org/archives/evolution-hackers/2004-October/msg00159.html
(In reply to comment #3) > i was googled a bit, and found that in old 2004 somebody already made a patch > for libsoup. Yeah, I'd seen that before. I forget the details and what it was that I didn't like about it. At any rate, the code already in the content-coding branch does everything that patch had done.
Is there is any ways to try out current patch on recent libsoup? I was trying to pull content-coding branch but it seems broken. Or probably any tips about how to make it works? Also probably u can write a little breakdown of what is missing in the patch in order to merge it the head. i will try to code something then.
(In reply to comment #5) > Is there is any ways to try out current patch on recent libsoup? > I was trying to pull content-coding branch but it seems broken. Or probably any > tips about how to make it works? The code was never finished. It probably never compiled. > Also probably u can write a little breakdown of what is missing in the patch in > order to merge it the head. i will try to code something then. First off, ignore soup-io-filter.[ch] completely. soup-session.c:encoding_handler() is incomplete and needs to be finished. Basically, it needs to do *something* such that if there's a Content-Encoding or Transfer-Encoding header, that it will undo the effects of that before the SoupMessage signals (got-chunk, got-body) get emitted. Actually, most of the work will end up in some mix of soup-message, soup-message-body, and/or soup-message-io, but I don't know where exactly. SoupSession gets involved though because it's the one that has the list of available encoding types, so it has to tell the message/body/iodata what class to use to do the decoding. It's also possible that the SoupCoding API will turn out to be badly designed for this purpose and you'll want to change it to something different. It's also possible that one or all of the coding implementations is buggy. I never got as far as testing any of them.
Just FYI that as of now, due to this missing feature I am unable to access wikipedia's four-fource article: http://en.wikipedia.org/wiki/Four-force Webkit's web inspector tells me that content encoding is gzip. This left me to wonder why this exact page as I've accessed loads of various of wikipedia pages and this is the first one that does not work... So... It would be very nice to have this implemented if people want to go with webkit for epiphany 2.28 as it seems the pages that need this might not be as rare as I thought... Another page that needs this is (also gzip) http://anidb.net/
(In reply to comment #7) > So... It would be very nice to have this implemented if people want to go with > webkit for epiphany 2.28 as it seems the pages that need this might not be as > rare as I thought... The current version of webkitgtk explicitly sends Allow-Encoding: identity which tells the server that it doesn't support gzip. (HTTP actually assumes that if you don't say otherwise, then that means you support gzip, deflate, and compress encoding.) So this should work with the latest webkit.
*** Bug 587912 has been marked as a duplicate of this bug. ***
Hi, Any update on this feature? While it is working with the current webkit (comment #8), lots of development is currently done on small devices which have bandwidth constraints, and where supporting deflate would help. Thanks,
Well, there are a handful of sites (notably archive.org) that use "Content-Encoding: gzip" even when webkit sends "Accept-Encoding: identity". Hrmph. Anyway, no real update. No one has been working on it, and it is *highly* unlikely it will be finished for 2.28 at this point. Note that the addition of SoupContentSniffer now adds another wrinkle; content-decoding needs to happen before sniffing, or else the sniffer will always just return "application/gzip". This gets back into the idea that the soup-io-filter in the content-coding branch was moving towards; we could have a series of filters on the input data, so you'd get something like: soup_socket_read -> Transfer-Encoding filter -> Content-Encoding filter -> SoupContentSniffer -> got-chunk where the two encoding filters would actual output different data than they input, but the sniffing "filter" would just be buffering up data until it got enough to sniff, and then it would spit it all out at once, and then after that behave as just an identity filter.
FYI, it makes pages like this one unviewable: http://www.spinics.net/lists/hotplug/msg02404.html
*** Bug 598285 has been marked as a duplicate of this bug. ***
Pushed my current work to the "coding" branch on git.gnome.org. The "deflate" decoding is slightly broken (it works for some sites but not others), so it's disabled at the moment. (We send "Allow-Encoding: gzip".) Once I fix deflate I'll push to master. This should be in libsoup 2.28.2.
Created attachment 146361 [details] [review] webkit patch to use SoupContentDecoder
*** Bug 603269 has been marked as a duplicate of this bug. ***
*** Bug 603616 has been marked as a duplicate of this bug. ***
pushed to git with support for "gzip" only, and will end up in 2.28.2
*** Bug 610684 has been marked as a duplicate of this bug. ***