GNOME Bugzilla – Bug 577630
libsoup should try to fix up broken Content-Type headers
Last modified: 2009-04-03 00:46:27 UTC
We hit a problem with webkit, in which the following page would send a bad Content-Type header: http://www.gnome.org/~shaunm/pulse/web/ This site sends a broken Content-Type header: kov@abacate ~> wget -S -O /dev/null http://www.gnome.org/~shaunm/pulse/web/ --2009-04-01 14:42:59-- http://www.gnome.org/~shaunm/pulse/web/ Resolving www.gnome.org... 209.132.176.176 Connecting to www.gnome.org|209.132.176.176|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Wed, 01 Apr 2009 17:43:19 GMT Server: Apache/2.2.3 (Red Hat) Connection: close Content-Type: content-type: text/html; charset=utf-8 Length: unspecified [content-type: text/html] Saving to: `/dev/null' [ <=> ] 6,457 20.5K/s in 0.3s 2009-04-01 14:43:02 (20.5 KB/s) - `/dev/null' saved [6457] The problem is quite simple to work-around, but it would be good to have it in one central place.
I forgot to say this causes webkit to try to download the page. We have a bug report on WebKit to track this: https://bugs.webkit.org/show_bug.cgi?id=24843.
probably soup_message_headers_get_content_type() should ignore the header if it's syntactically incorrect like this. But why isn't this handled already by your existing content-type-sniffing code?
It's not sniffed, because the content type is not empty (the only case our current content sniffing code handles). The code currently living in WebKitGTK+ for content sniffing is really just a simple work-around while we get a more complete implementation into libsoup.
fixed in trunk; soup_message_headers_get_content_type() will now return NULL if the header is syntactically incorrect, which will then cause you to try to sniff it and win