GNOME Bugzilla – Bug 104479
Answering to an article which contains a 8-bit character in the headers
Last modified: 2006-06-18 05:25:30 UTC
Answering to an article which contains a 8-bit character in the headers (like, "From: Mahaléo"), Pan posts in Utf-8 by declaring Iso-8859-1.
I did a logical analyze of what can cause this bug, wich leads to the following conclusion. - Pan does not expect some people will send raw 8 bit data in headers. - Pan is copying the content of the "From" header to an internal UTF-8 buffer without testing if there is raw 8 bit inside it. This is done when pan tries to create the "%s has written in message:" line in front of the answer. - As a result there is some invalid ISO-8859-1 data inside this UTF-8 buffer - Later a conversion fonction is called to convert the date to iso-8859-1 that stops at the first invalid character (this is the behavious of iconv for example). - This error is not detected, and Pan goes on sending as output most of the buffer content as UTF-8 instead of the expected ISO-8859-1 This analyze is confirmed that raw UTF-8 data in headers will not cause the bug, everything is correct in the output, and the name in the "%s wrote" line is correct and is encoded in iso-8859-1. Recommended solution for correction : - test if the header is valid UTF-8 (iconv from/to UTF-8 is a stupid but simple way to do it). - if not, convert it from the encoding used for the body to UTF-8. - if this fails again ?? Maybe remove all non-7 bit data ?
Jean-Marc: thanks for that excellent analysis! Note that this only happens for %a (which doesn't convert to UTF-8), not for %n (which does).
Fixed in CVS for 0.13.4: http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&subdir=pan/pan&command=DIFF_FRAMESET&file=message-window.c&rev1=1.343&rev2=1.344&root=/cvs/gnome
The bug seems to persist in the latest beta version 0.13.3.91 e.g, an original post with some headers: ================================================= [...] Reply-To: "Mahaléo" <mahaleo@wanadoo.fr> From: "Mahaléo" <quidam@pour_le_bot.org.antispam> Newsgroups: fr.test Subject: zzz test ignore Date: Sat, 1 Feb 2003 01:18:01 +0100 X-Newsreader: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Message-ID: <3e3b11e9$0$247$626a54ce@news.free.fr> NNTP-Posting-Date: 01 Feb 2003 01:16:41 MET MIME-Version: 1.0 Content-Type: text/plain Etre ou ne pas être, c'est ça la question. ================================================= And the follow-up message: ================================================= [...] From: "Mahaleo" <quidam@pour_le_bot.org.antispam> Subject: Re: zzz test ignore Date: Sat, 01 Feb 2003 01:26:27 +0100 User-Agent: Pan/0.13.3.91 (How did the starling get into the bar?) Message-ID: <pan.2003.02.01.00.26.24.214810@mahaleowanadoofr> Newsgroups: fr.test MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 8bit Sat, 01 Feb 2003 01:18:01 +0100, dans <3e3b11e9$0$247$626a54ce@news.free.fr>, Mahaléo a écrit: [...] > Reply-To: "Mahaléo" <mahaleo@wanadoo.fr> > From: "Mahaléo" <quidam@pour_le_bot.org.antispam> > Newsgroups: fr.test > Subject: zzz test ignore > Date: Sat, 1 Feb 2003 01:18:01 +0100 > X-Newsreader: Microsoft Outlook Express 6.00.2800.1106 > X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 > Message-ID: <3e3b11e9$0$247$626a54ce@news.free.fr> > [...] > MIME-Version: 1.0 > Content-Type: text/plain > > Etre ou ne pas être, c'est ça la question. Bouh! ================================================= Mahaleo
Punting for Christophe's return
/me scratches his head ... Mahaleo : could you let me know the charset of the group you're posting to (group properties) and the locale you're using ('locale' from the command line) ? Could you also attach the full message Pan creates to this bugreport (http://bugzilla.gnome.org/createattachment.cgi?id=104479) ?
Created attachment 14880 [details] This is the full message Pan creates to the bugreport 104479
The charset of the group is: ISO-8859-1 (but the same bug appears with ISO-8859-15). My 'locale': [mahaleo@localhost mahaleo]$ locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_PAPER="fr_FR.UTF-8" LC_NAME="fr_FR.UTF-8" LC_ADDRESS="fr_FR.UTF-8" LC_TELEPHONE="fr_FR.UTF-8" LC_MEASUREMENT="fr_FR.UTF-8" LC_IDENTIFICATION="fr_FR.UTF-8" LC_ALL=
*** Bug 104459 has been marked as a duplicate of this bug. ***
Hmmm, I can only explain this if either your group's charset is UTF-8, or if you're switching between profiles that have '%n' in the attribution (definite bug there, brought out if your locale is in UTF-8, which is the case for you). Looking further ...
The bug does not appear if I change my 'locale' by editing '/etc/sysconfig/i18n' and by changing line LANG="fr_FR.UTF-8" by LANG="fr_FR@euro" My 'locale' becomes: [mahaleo@localhost mahaleo]$ locale LANG=fr_FR@euro LC_CTYPE="fr_FR@euro" LC_NUMERIC="fr_FR@euro" LC_TIME="fr_FR@euro" LC_COLLATE="fr_FR@euro" LC_MONETARY="fr_FR@euro" LC_MESSAGES="fr_FR@euro" LC_PAPER="fr_FR@euro" LC_NAME="fr_FR@euro" LC_ADDRESS="fr_FR@euro" LC_TELEPHONE="fr_FR@euro" LC_MEASUREMENT="fr_FR@euro" LC_IDENTIFICATION="fr_FR@euro" LC_ALL= But I am not sure that it is a good thing to change Utf-8...
I have just installed 0.13.91 (on Red Hat 8, with 'locale': LANG=fr_FR.UTF-8), with the same attribution which involved the bug (%n a écrit), and the bug does not appear any more. Thanks for that good job.
Marking as 'fixed' based on user feedback. Thanks, Mahaleo!