After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 90067 - body not displayed when Mime-Headers are missing
body not displayed when Mime-Headers are missing
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
pre-0.13.0 betas
Other Linux
: Normal major
: 0.14.0
Assigned To: Charles Kerr
Pan QA Team
Depends on: 99247
Blocks:
 
 
Reported: 2002-08-06 23:32 UTC by Christian Lohmaier
Modified: 2006-06-18 04:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
first attempt at using gmime filter-charset & streams instead of g_mime_iconv_strndup (3.17 KB, patch)
2003-04-08 22:10 UTC, Charles Kerr
none Details | Review

Description Christian Lohmaier 2002-08-06 23:32:08 UTC
Some versions of MS OE produce corrupt postings that pan fails to display.
These postings contain 8bit-Characters but no Mime-Header at all [1]
When trying to convert this to utf8 pan complains:

WARNING **: Unable to convert article to UTF-8: Invalid byte sequence in
conversion input
WARNING **: Some parts of the article might not be displayed.

(nothing of the body is displayed actually)

Example-Article:
Message-ID: <aipjbi$15vfq0$1@ID-13499.news.dfncis.de>
From: "Sebastian Göring" <0o.o0@gmx.de>
Newsgroups: de.comp.hardware.laufwerke.brenner
Subject: Re: Meine aelteste selbstgebrannte
Date: Tue, 6 Aug 2002 22:34:30 +0200

To reproduce:
Either try the article above or have a look into a group that fullfills the
following requirements:
-> special characters in the corresponding language (e.g. german group)
-> big part of MS-Users (avoid linux/unix specific groups)
you will soon 'get a hit'

[1] Pan "invents" an Mime-Header when "showing all headers" and sets it to 
MIME-Version: 1.0
Content-Type: text/plain
(but thats another bug)
Comment 2 Christian Lohmaier 2002-08-08 13:07:38 UTC
Tried it out and it works :-)
Confimed.
ciao
Christian
Comment 3 Christian Lohmaier 2002-08-11 21:25:33 UTC
Unfortunately, I have to reopen this bug. In pan 0.12.94 (no version
entry in bugzilla yet) this fix doesn't work like expected.
The article is only displayed until the first non ASCII character
appears (the rest gets truncated). I get the following message:

WARNING **: g_mime_iconv_strndup: Invalid or incomplete multibyte or
wide character
WARNING **: Some parts of the article are corrupt and can't be displayed.
Comment 4 Charles Kerr 2002-08-11 22:09:41 UTC
Chris, backing out the header_to_utf8 call in articlelist was your
idea... fix this. :)
Comment 5 Christophe Lambin 2002-08-12 06:49:19 UTC
Well, AFAICS, this isn't an articlelist-related bug, but I'll have a look ...

Comment 6 Charles Kerr 2002-08-12 17:52:29 UTC
<sheepishly>
You're right; I was confusing this with the header pane bug.
Sorry about that Chris ... but feel free to be a second pair
of eyes anyway. ;)
</sheepishly>

Looking through GMime's source,
it looks like the error is generated in gmime-iconv-utils.c:140,
where the retval is set to NULL.

Since Christian only reports that error showing up once,
probably there's not a mime header in the message, and the
one in the default charset for the group is insufficient for
decoding it.  So the fallback of "Some parts of this article
are corrupt and can't be displayed" kicks in.

Christian:
(1) could you attach the group + message-id of the article doing this?
(2) what have you set as the default charset for that group?
Comment 7 Christian Lohmaier 2002-08-12 18:55:20 UTC
(1) could you attach the group + message-id of the article doing this?
no Problem:
Group: de.comp.office-pakete.staroffice
MsgID (postings from: "M.Mathias" <mongjoe@deunet.co.za>)
<aj0ufu$28d$2@ctb-nnrp2.saix.net>
<aj3gp4$c1l$1@ctb-nnrp2.saix.net>
<ahubo1$p53$1@ctb-nnrp1.saix.net>
<afvet2$me8$1@ctb-nnrp2.saix.net>
<agkgs6$a0d$2@ctb-nnrp2.saix.net>
...
(2) what have you set as the default charset for that group?
You got the right nose ;-)
I set the default charset for all groups to utf-8 because of the
encoding bug. When I set it to iso-8859-1 the whole message is displayed. 
BUT: It worked (and still works :) with the cvs-Version I got after
you reported this bug as fixed and asked for confirmation even with
utf-8. I don't know what else you did chage afterwards, but these
changes broke the fix ;-)
I'd like to keep utf-8 as default charset for outgoing postings (and
like the ability to set this on a per-group basis - before I got aware
of this I always looked in the posting-profile-settings, but the
per-group solution is superior)
May I suggest the following order (dont't know if this is possible):
Since utf-8 fails one doesn't have to test ascii
0) the group's charset when !=utf-8
1) the locales charset
2) latin1
3) give up trying to decode, truncate message
Comment 8 Christophe Lambin 2002-08-23 22:49:27 UTC
I don't agree with that heuristic approach. It would probably be
better to just decouple the sending charset from the reading charset
(e.g. like  Mozilla, Evolution, ...)

Bumping to 0.13.1.
Comment 9 Christophe Lambin 2002-10-11 00:00:35 UTC
Bumped remaining bugs to 0.13.2.
Comment 10 Jeffrey Stedfast 2002-10-15 02:25:59 UTC
what do you guys use to convert the body text to UTF-8?
g_mime_iconv_strndup? naughty naughty!!

You should probably use GMime's charset filterfor this... it has some
more robustness added for when it finds illegal sequences as well.

It might be desireable for the charset filter to hold info about how
many bytes were not converted so that you could tell if an article
body converted "flawlessly" in the case that you wanted to try a few
different charsets.

actually, imho there should be a nice GMimeStream wrapper for the text
widget so you can just g_mime_data_wrapper_write_to_stream (dw,
text_widget_stream_iface);

maybe I'll hack on one at some point... :-)
Comment 11 Christophe Lambin 2002-11-13 20:53:43 UTC
Bumped remaining bugs to 0.13.3.
Comment 12 Christophe Lambin 2002-12-15 00:24:31 UTC
Charles: assinging back to you, since you know the gmime API much
better than I do. You'll probably be able to adapt pan_body_to_utf8()
much faster than me.
Comment 13 Charles Kerr 2002-12-18 05:22:31 UTC
bumping for 0.13.3 freeze.
Comment 14 Charles Kerr 2003-04-08 22:08:15 UTC
> what do you guys use to convert the body text to UTF-8?
> g_mime_iconv_strndup? naughty naughty!!

fejj: grr, what's the point of having g_mime_iconv_strndup()
in GMime's API if it's not supposed to be used?

chris: attached is a patch that follows fejj's suggestion
to change the three lines needed for g_mime_iconv_strdup()
to a thirty-five line function needed to line up gmime's
input mem streams, stream-filters, charset filters, and
output mem streams with byte arrays.  I'll be happy to make
gmime fixes here but don't know how to test the meat of it,
the charsets.  Could you apply the patch locally and test it?
This is one of our oldest active bugs.

fejj: this is an insanely complex process to just convert
from one charset to another (see patch).  Am I doing it wrong?
If I'm doing it right, should this be moved into GMime?
Comment 15 Charles Kerr 2003-04-08 22:10:16 UTC
Created attachment 15578 [details] [review]
first attempt at using gmime filter-charset & streams instead of g_mime_iconv_strndup
Comment 16 Jeffrey Stedfast 2003-04-09 00:05:01 UTC
heh. I have them around for converting headers, and so I needed it to
return NULL on fail (any bad bytes) so that I don't lose info in the
headers.

the charset-filter is what was meant to be used for writing content 
(note: if it comes accross any invalid sequences, it will strip them
out - so be warned). Fortunately, this isn't a problem since it
doesn't corrupt the original content (which would have been a problem
for header decoding at the time... might be okay now tho - I should
check that).
Comment 17 Charles Kerr 2003-04-09 19:52:53 UTC
chris: ping
Comment 18 Christophe Lambin 2003-04-09 19:54:21 UTC
Tried out the patch and it works fine. 

Good place to test it: fr.test
Sample article: <3e9438ee$0$20027$7a628cd7@news.club-internet.fr>