GNOME Bugzilla – Bug 90067
body not displayed when Mime-Headers are missing
Last modified: 2006-06-18 04:57:22 UTC
Some versions of MS OE produce corrupt postings that pan fails to display. These postings contain 8bit-Characters but no Mime-Header at all [1] When trying to convert this to utf8 pan complains: WARNING **: Unable to convert article to UTF-8: Invalid byte sequence in conversion input WARNING **: Some parts of the article might not be displayed. (nothing of the body is displayed actually) Example-Article: Message-ID: <aipjbi$15vfq0$1@ID-13499.news.dfncis.de> From: "Sebastian Göring" <0o.o0@gmx.de> Newsgroups: de.comp.hardware.laufwerke.brenner Subject: Re: Meine aelteste selbstgebrannte Date: Tue, 6 Aug 2002 22:34:30 +0200 To reproduce: Either try the article above or have a look into a group that fullfills the following requirements: -> special characters in the corresponding language (e.g. german group) -> big part of MS-Users (avoid linux/unix specific groups) you will soon 'get a hit' [1] Pan "invents" an Mime-Header when "showing all headers" and sets it to MIME-Version: 1.0 Content-Type: text/plain (but thats another bug)
This fix works for me, please try it out and confirm: http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan&command=DIFF_FRAMESET&file=text.c&rev1=1.314&rev2=1.315&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/base&command=DIFF_FRAMESET&file=pan-glib-extensions.c&rev1=1.30&rev2=1.31&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/base&command=DIFF_FRAMESET&file=pan-glib-extensions.h&rev1=1.18&rev2=1.19&root=/cvs/gnome
Tried it out and it works :-) Confimed. ciao Christian
Unfortunately, I have to reopen this bug. In pan 0.12.94 (no version entry in bugzilla yet) this fix doesn't work like expected. The article is only displayed until the first non ASCII character appears (the rest gets truncated). I get the following message: WARNING **: g_mime_iconv_strndup: Invalid or incomplete multibyte or wide character WARNING **: Some parts of the article are corrupt and can't be displayed.
Chris, backing out the header_to_utf8 call in articlelist was your idea... fix this. :)
Well, AFAICS, this isn't an articlelist-related bug, but I'll have a look ...
<sheepishly> You're right; I was confusing this with the header pane bug. Sorry about that Chris ... but feel free to be a second pair of eyes anyway. ;) </sheepishly> Looking through GMime's source, it looks like the error is generated in gmime-iconv-utils.c:140, where the retval is set to NULL. Since Christian only reports that error showing up once, probably there's not a mime header in the message, and the one in the default charset for the group is insufficient for decoding it. So the fallback of "Some parts of this article are corrupt and can't be displayed" kicks in. Christian: (1) could you attach the group + message-id of the article doing this? (2) what have you set as the default charset for that group?
(1) could you attach the group + message-id of the article doing this? no Problem: Group: de.comp.office-pakete.staroffice MsgID (postings from: "M.Mathias" <mongjoe@deunet.co.za>) <aj0ufu$28d$2@ctb-nnrp2.saix.net> <aj3gp4$c1l$1@ctb-nnrp2.saix.net> <ahubo1$p53$1@ctb-nnrp1.saix.net> <afvet2$me8$1@ctb-nnrp2.saix.net> <agkgs6$a0d$2@ctb-nnrp2.saix.net> ... (2) what have you set as the default charset for that group? You got the right nose ;-) I set the default charset for all groups to utf-8 because of the encoding bug. When I set it to iso-8859-1 the whole message is displayed. BUT: It worked (and still works :) with the cvs-Version I got after you reported this bug as fixed and asked for confirmation even with utf-8. I don't know what else you did chage afterwards, but these changes broke the fix ;-) I'd like to keep utf-8 as default charset for outgoing postings (and like the ability to set this on a per-group basis - before I got aware of this I always looked in the posting-profile-settings, but the per-group solution is superior) May I suggest the following order (dont't know if this is possible): Since utf-8 fails one doesn't have to test ascii 0) the group's charset when !=utf-8 1) the locales charset 2) latin1 3) give up trying to decode, truncate message
I don't agree with that heuristic approach. It would probably be better to just decouple the sending charset from the reading charset (e.g. like Mozilla, Evolution, ...) Bumping to 0.13.1.
Bumped remaining bugs to 0.13.2.
what do you guys use to convert the body text to UTF-8? g_mime_iconv_strndup? naughty naughty!! You should probably use GMime's charset filterfor this... it has some more robustness added for when it finds illegal sequences as well. It might be desireable for the charset filter to hold info about how many bytes were not converted so that you could tell if an article body converted "flawlessly" in the case that you wanted to try a few different charsets. actually, imho there should be a nice GMimeStream wrapper for the text widget so you can just g_mime_data_wrapper_write_to_stream (dw, text_widget_stream_iface); maybe I'll hack on one at some point... :-)
Bumped remaining bugs to 0.13.3.
Charles: assinging back to you, since you know the gmime API much better than I do. You'll probably be able to adapt pan_body_to_utf8() much faster than me.
bumping for 0.13.3 freeze.
> what do you guys use to convert the body text to UTF-8? > g_mime_iconv_strndup? naughty naughty!! fejj: grr, what's the point of having g_mime_iconv_strndup() in GMime's API if it's not supposed to be used? chris: attached is a patch that follows fejj's suggestion to change the three lines needed for g_mime_iconv_strdup() to a thirty-five line function needed to line up gmime's input mem streams, stream-filters, charset filters, and output mem streams with byte arrays. I'll be happy to make gmime fixes here but don't know how to test the meat of it, the charsets. Could you apply the patch locally and test it? This is one of our oldest active bugs. fejj: this is an insanely complex process to just convert from one charset to another (see patch). Am I doing it wrong? If I'm doing it right, should this be moved into GMime?
Created attachment 15578 [details] [review] first attempt at using gmime filter-charset & streams instead of g_mime_iconv_strndup
heh. I have them around for converting headers, and so I needed it to return NULL on fail (any bad bytes) so that I don't lose info in the headers. the charset-filter is what was meant to be used for writing content (note: if it comes accross any invalid sequences, it will strip them out - so be warned). Fortunately, this isn't a problem since it doesn't corrupt the original content (which would have been a problem for header decoding at the time... might be okay now tho - I should check that).
chris: ping
Tried out the patch and it works fine. Good place to test it: fr.test Sample article: <3e9438ee$0$20027$7a628cd7@news.club-internet.fr>
chris: cool, thanks... committed the patch to CVS and marking closed based on Chris' testing. http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan/base&command=DIFF_FRAMESET&file=pan-glib-extensions.c&rev1=1.45&rev2=1.46&root=/cvs/gnome http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan&command=DIFF_FRAMESET&file=ANNOUNCE.html&rev1=1.117&rev2=1.118&root=/cvs/gnome