After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 363268 - mixed-charset messages get garbled
mixed-charset messages get garbled
Status: RESOLVED FIXED
Product: Pan
Classification: Other
Component: general
unspecified
Other Linux
: Normal normal
: 1.0
Assigned To: Charles Kerr
Pan QA Team
Depends on:
Blocks:
 
 
Reported: 2006-10-18 22:47 UTC by Michael Rasmussen
Modified: 2006-11-02 17:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
cvs head patch (876 bytes, patch)
2006-10-31 19:18 UTC, Charles Kerr
committed Details | Review

Description Michael Rasmussen 2006-10-18 22:47:01 UTC
Messages written in UTF-8 is producing rubbish.

The same applies to reading messages from others written in UTF-8.

I suspect it is the same bug that was in the old PAN where it was not able to handle messages written in UTF-8 if the news server or the poster had a footer using a different charset.

Example:
ÊÞå ÃÃà <------- UTF-8 (æøå ÆØÅ)

<----- part below written in ISO-8859-1
-- 
Hilsen/Regards
Michael Rasmussen
http://keyserver.veridis.com:11371/pks/lookup?op=get&search=0xE3E80917

--------------------------------------------------------
Denne postliste er til test af din email i forhold til
SSLUGs postlister. Vær sød ikke at misbruge denne
Comment 1 Charles Kerr 2006-10-19 03:42:59 UTC
chris, could you take a look at this ticket?
Comment 2 Christophe Lambin 2006-10-19 05:32:45 UTC
Michael: can you attach the following evidences:
- an article you've READ where UTF-8 is broken
- an article you've POSTED where UTF-8 is broken
- your sig

Comment 3 Michael Rasmussen 2006-10-19 21:48:58 UTC
Copy of read article:
Return-Path: <sslug-novice-return-38839-mail2news=sslug.dk@sslug.dk>
Delivered-To: mail2news@sslug.dk
Mailing-List: contact sslug-novice-help@sslug.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
list-help: <mailto:sslug-novice-help@sslug.dk>
list-unsubscribe: <mailto:sslug-novice-unsubscribe@sslug.dk>
errors-to: sslug-error@sslug.dk
Delivered-To: mailing list sslug-novice@sslug.dk
From: "Michael Schmidt" <michael.zmit@gmail.com>
Date: Thu, 19 Oct 2006 00:10:32 +0200
Organization: SSLUG
Lines: 25
Message-ID: <op.thm07utzdi3geh@news.sslug.dk>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; delsp=yes; charset=utf-8
Content-Transfer-Encoding: 8bit
NNTP-Posting-Date: Wed, 18 Oct 2006 22:12:09 +0000 (UTC)
User-Agent: Opera Mail/9.02 (Linux)
Subject: Re: [NOVICE] Ubuntu 6.06 LTS
Newsgroups: sslug.novice
References: <eh4j24$38l$1@www.sslug.dk> <eh5hua$agt$1@shrek.krogh.cc> <eh5tao$87d$1@www.sslug.dk> <eh5tme$aaa$1@www.sslug.dk> <eh5ua7$g2p$1@www.sslug.dk> <eh5ugo$979$1@shrek.krogh.cc> <eh62br$bhk$1@www.sslug.dk>
Approved: news@sslug.dk
Path: news.sslug.dk!sslug.dk!not-for-mail
Xref: news.sslug.dk sslug.novice:38341

Wed, 18 Oct 2006 22:21:15 +0200, JÞrgen Heesche <heesche@webspeed.dk>
skrev:

> Jesper Krogh wrote:
>> I sslug.novice, skrev Claus:
>>>  Atte André Jensen wrote:
>>>> Claus wrote:
>>>>> OK, den er sat til at downloade.
>>>>> Men hvordan skal den brÊndes for at det bliver gjort rigtigt?
>>>> hvad med "cdrecord ubuntu-6.06.1-alternate-i386.iso"?
>>>  Hmmm, den fylder 713550 KB.
>>>  Kan det vÊre på en CD?
>>  Det er den designet til.. så det vil jeg tro.
>
> Jeg har altid forstået at maximum er 650 MB på en CD.
>
Det var det også tidligere. Idag er 700MB/80min nÊrmest blevet standard,
men der findes også 800MB/90min og sågar også 900MB/100min, men de to
sidstnÊvnte krÊver at drev og brÊndersoftware kan håndtere dem.


-- 
Med venlig hilsen
/Zmit/
RLU # 314205

sslug-novice: Listen for begynder-relaterede spørgsmål

Copy of posted article:
Return-Path: <sslug-test-return-5727-mail2news=sslug.dk@sslug.dk>
Delivered-To: mail2news@sslug.dk
Mailing-List: contact sslug-test-help@sslug.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
errors-to: sslug-error@sslug.dk
Delivered-To: mailing list sslug-test@sslug.dk
From: "Michael Rasmussen" <mir@miras.org>
Date: Thu, 19 Oct 2006 21:46:37 +0000 (UTC)
Organization: SSLUG
Lines: 6
Message-ID: <eh8rnt$52u$1@www.sslug.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
NNTP-Posting-Date: Thu, 19 Oct 2006 21:46:37 +0000 (UTC)
User-Agent: pan 0.117 (We'll fly and we'll fall and we'll burn)
Subject: [TEST] Test with UTF-8
Newsgroups: sslug.test
Approved: news@sslug.dk
Path: news.sslug.dk!sslug.dk!not-for-mail
Xref: news.sslug.dk sslug.test:5193

ÊÞå ÃÃÃ

-- 
Hilsen/Regards
Michael Rasmussen
http://keyserver.veridis.com:11371/pks/lookup?op=get&search=0xE3E80917

--------------------------------------------------------
Denne postliste er til test af din email i forhold til
SSLUGs postlister. Vær sød ikke at misbruge denne 

My signature:
-- 
Hilsen/Regards
Michael Rasmussen
http://keyserver.veridis.com:11371/pks/lookup?op=get&search=0xE3E80917

Signature added by list:
--------------------------------------------------------
Denne postliste er til test af din email i forhold til
SSLUGs postlister. Vær sød ikke at misbruge denne 
Comment 4 Michael Rasmussen 2006-10-19 21:57:31 UTC
BTW. The old bug I am refering to is this one: http://bugzilla.gnome.org/show_bug.cgi?id=317156
Comment 5 Christophe Lambin 2006-10-20 19:27:54 UTC
Oh, I remember that bug. Thanks for the reference: that saved me some time.

Charles: this is what's happening: pan::content_to_utf8() tries to pass the article through g_convert for the different charsets (article's charset, group's charset and hardcoded CURRENT and ISO-8859-15). The article contains both UTF-8 and ISO-8859-1 characters. If you use g_convert to convert from ISO-8859-1 to UTF-8, g_convert will actually consider the UTF-8 2byte characters as ISO-8859-1 and succeed !  The UTF-8 characters will of course be garbled.

You can't simply remove ISO-8859-1 from the hardcoded fallback charsets, since the group's charset may still be ISO-8859-1 and you'll have the same problem.

The old Pan did not suffer from his problem, since it did not use g_convert() for this purpose. It used an internal function (g_mime_charset_strndup), which used gmime streams to convert. That approach did not suffer from that problem.

Reassigning to you, since I have no idea how to fix that without major surgery. :)
Comment 6 Charles Kerr 2006-10-31 18:33:08 UTC
Chris: so if I reimplement content_to_utf8() to use the old
g_mime_charset_strndup() code, that would fix this?
Comment 7 Christophe Lambin 2006-10-31 18:43:21 UTC
assuming the underlying gmime still behaves the same way, I'd guess so.
Comment 8 Charles Kerr 2006-10-31 19:18:17 UTC
Created attachment 75733 [details] [review]
cvs head patch

Replacing g_convert with g_mime_charset_strndup is a simple one-liner.
Does this fix it?
Comment 9 Christophe Lambin 2006-10-31 20:05:26 UTC
I tested it, and it indeed no longer garbles the message (i.e. the part of the message that's in the content-type's charset.

With this patch, the invalid characters are simply removed, whereas the old Pan would display '?' for each non-utf8 character. It looks like this is a difference in the underlying gmime behaviour: the non-utf8 characters are already removed by the time g_mime_charset_strndup() returns.
Comment 10 Charles Kerr 2006-10-31 20:53:40 UTC
So although we're still losing those invalid characters,
old-pan did that too and we're better off before by not
garbling the message, is that right?

So should this change be checked in and the bug marked closed?
Comment 11 Christophe Lambin 2006-10-31 21:18:39 UTC
Yes, works for me.