After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 666896 - Workaround Improper GB-2312 Mail Encodings
Workaround Improper GB-2312 Mail Encodings
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: Mailer
2.32.x (obsolete)
Other Linux
: Normal normal
: ---
Assigned To: evolution-mail-maintainers
Evolution QA team
: 446783 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-12-27 06:10 UTC by Palmer Dabbelt
Modified: 2012-04-11 16:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Mail with bad GB-2312 encodings (2.63 KB, application/mbox)
2011-12-27 06:10 UTC, Palmer Dabbelt
  Details
Use GB-18030 instead of GB-2312 for all messages (930 bytes, patch)
2011-12-27 06:17 UTC, Palmer Dabbelt
committed Details | Review

Description Palmer Dabbelt 2011-12-27 06:10:36 UTC
Created attachment 204254 [details]
Mail with bad GB-2312 encodings

Some mail clients (it seems both gmail on Windows and Outlook do this, but not on all machines) improperly send messages indicating that they are encoded in the GB-2312 charset while including characters only valid in the GB-18030 charset.

The mail headers have the following set so I think Evolution is correct in setting the GB-2312 charset, it's just that some mail clients set the headers improperly.
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: base64

As the messages come from poorly configured Windows systems, I'm not entirely sure how to reproduce it.  I have however simulated the bug by forcing Evolution's composer to use the GB-2312 charset, pasting in some GB-18030-only characters, sending the mail to myself, and saving the resulting mail as mbox.  It's attached.

There is a bug submitted on this, but the submitter seems confused and doesn't provide a patch (and it's 4 years old and still unconfirmed), so I created a new report.  Sorry if that was wrong.
https://bugzilla.gnome.org/show_bug.cgi?id=446783
Comment 1 Palmer Dabbelt 2011-12-27 06:17:59 UTC
Created attachment 204255 [details] [review]
Use GB-18030 instead of GB-2312 for all messages

Attached is a patch that sets all GB-2312 encoded messages to be decoded as GB-18030 messages instead.  According to wikipedia GB-18030 is a superset of GB-2312, so I think it's reasonable to decode all GB-2312 encoded messages as GB-18030.

The patch applies and works properly for the already-patched sources Gentoo uses to build 2.32.3, but it's only a 5 lines change to a lookup table so I think it should be OK.
Comment 2 André Klapper 2012-01-03 16:15:03 UTC
*** Bug 446783 has been marked as a duplicate of this bug. ***
Comment 3 André Klapper 2012-01-03 16:15:25 UTC
Patch reviewer: Please also check the discussions in bug 446783
Comment 4 Milan Crha 2012-04-11 16:12:14 UTC
Thanks for a patch. I would attach it to the original bug report, especially if I knew it, though this way it works too. The patch itself works fine (I do not read Chinese, but there shown more "letters" with your patch). I'll commit it.
Comment 5 Milan Crha 2012-04-11 16:15:02 UTC
Created commit 241edbd in eds master (3.5.1+)
Created commit 6d2ac17 in eds gnome-3-4 (3.4.1+)