Bug 340717 – please stop using ISO-8859-5 Cyrillic encoding for "Subject/To/Cc/Bcc/" headers

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 340717 - please stop using ISO-8859-5 Cyrillic encoding for "Subject/To/Cc/Bcc/" headers


Summary:	please stop using ISO-8859-5 Cyrillic encoding for "Subject/To/Cc/Bcc/" headers


Status:	RESOLVED FIXED

Product:	evolution
Classification:	Applications
Component:	Mailer
Version:	2.6.x (obsolete)
Hardware:	Other other

Importance:	Normal normal
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-05-05 09:02 UTC by Andrey Utkin
Modified:	2006-05-05 15:31 UTC

See Also:
GNOME target:	---
GNOME version:	2.13/2.14

Attachments
sample messages (1.36 KB, application/x-compressed-tar) 2006-05-05 13:43 UTC, Andrey Utkin	Details

Description Andrey Utkin 2006-05-05 09:02:53 UTC

Distribution: Debian testing/unstable
Package: Evolution
Severity: Normal
Version: GNOME2.14.1 2.6.x
Gnome-Distributor: Debian
Synopsis: please stop using  ISO-8859-5 Cyrillic encoding for "Subject/To/Cc/Bcc/" headers
Bugzilla-Product: Evolution
Bugzilla-Component: Mailer
Bugzilla-Version: 2.6.x
Description:
Evolution use iso-8859-5 encoding for cyrillic in Subject and friends
headers. It is not possible to change this behavior . But this encoding
is not widely used in Russia (and I belive, in other cyrillic world).
There are many popular email clients that don`t understand this encoding
(The But! for example). 
So, it will be nice to use utf-8 or koi8-r for headers. Or, just use the
message body encoding (the character set used by composer).




------- Bug created by bug-buddy at 2006-05-05 09:02 -------

Comment 1 Jeffrey Stedfast 2006-05-05 09:55:45 UTC

the order of preference for charsets to use in message headers is as follows:

	"iso-8859-2"
	"iso-8859-4"
	"koi8-r"
	"koi8-u"
	"iso-8859-5"
	"iso-8859-7"
	"iso-8859-8"
	"iso-8859-9"
	"iso-8859-13"
	"iso-8859-15"
	"windows-1251"
	"UTF-8"


if the headers are ending up in iso-8859-5 for you, then that means you are using characters that are not found in koi8-r but are found in iso-8859-5.

Comment 2 Andrey Utkin 2006-05-05 10:18:13 UTC

I have not ever seen koi8-r encoded headers for Cyrillic.
Are you sure, koi8-r is checked before 8859-5 and correctly? Can I perform some tests for you?

Comment 3 Jeffrey Stedfast 2006-05-05 12:20:08 UTC

yes, I'm sure (at least in so far as last time I tested it, it did).

Comment 4 Andrey Utkin 2006-05-05 12:42:33 UTC

Yes, you are right. Sorry.
My colleague have RH4 with Evolution 2.0.2 and his messages contains headers in koi8-r... But my message with same Subject, To... contains headers in 8859-5. 

We have different locale settings.
My locale is:
======
LANG=
LC_CTYPE=ru_RU.UTF-8
LC_NUMERIC="POSIX"
LC_TIME=ru_RU.UTF-8
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
======

His locale:
======
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
======

Can locale settings be the reason of it?

Comment 5 Jeffrey Stedfast 2006-05-05 13:01:36 UTC

I don't think so...

If you and your friend both type the same subject (or whatever), they get encoded differently? yours in 8859-5 and his in koi8-r? if so, maybe someone broke the code.

Comment 6 Andrey Utkin 2006-05-05 13:15:58 UTC

Yes, we have differents encoding for same text... I can attach sample messages if it can help.

I am using debian evolution package. I don`t build it by myself. Is it possible that debian mantainers broke the code of Evolution?...

But, I and my friend have different versions of Evolution - 2.6.1 vs 2.0.2. 
ps. I remember 8859-5 headers in my Evolution 2.4.x too.

Comment 7 Jeffrey Stedfast 2006-05-05 13:36:47 UTC

yea, if you could attach a message composed with your evolution and maybe a message composed with your friends evolution with the same text, that would be great. Maybe we can figure out what changed in Evo to cause that...

Comment 8 Andrey Utkin 2006-05-05 13:43:30 UTC

Created attachment 64859 [details]
sample messages

Sample messages attached.

Comment 9 Jeffrey Stedfast 2006-05-05 14:08:16 UTC

well, I can definitely confirm this...

Comment 10 Jeffrey Stedfast 2006-05-05 14:45:24 UTC

aha... you might have been right about the environment vairables.

try setting your LANG to ru_RU.UTF-8 and see if that solves it...

Comment 11 Andrey Utkin 2006-05-05 14:54:52 UTC

    Yes! I`m  running evolution with
    $ LANG=ru_RU.UTF-8 evolution
    And now my headers are encoded with koi8-r.

Comment 12 Jeffrey Stedfast 2006-05-05 15:07:38 UTC

ok, I found the bug :)

The code that chose the charset in the list I pasted above found a match koi8-r, but since koi8-r had a 'lang' of "ru" associated with it and your environment was not set, it would continue scanning thru that charset table and the next match was iso-8859-5 (which did not have a 'lang' associated with it) and thus iso-8859-5 would get chosen.

simple boolean logic mistake:

locale_lang = e_iconv_locale_language ();
for (i = 0; i < G_N_ELEMENTS (camel_charinfo); i++) {
	if (camel_charinfo[i].bit & mask) {
		lang = e_iconv_charset_language (camel_charinfo[i].name);
		
-		if (!lang || (locale_lang && !strncmp (locale_lang, lang, 2)))
			return camel_charinfo[i].name;
	}
}


the code should have been:

locale_lang = e_iconv_locale_language ();
for (i = 0; i < G_N_ELEMENTS (camel_charinfo); i++) {
	if (camel_charinfo[i].bit & mask) {
		lang = e_iconv_charset_language (camel_charinfo[i].name);
		
+		if (!locale_lang || (lang && !strncmp (locale_lang, lang, 2)))
			return camel_charinfo[i].name;
	}
}


since your LANG environment is unset, the first charset that would fit your text should be the one that gets chosen (that's why the list is ordered in the way that it is - so koi8-r takes precedence over iso-8859-5)

I'll be committing a fix shortly (after the fix you shouldn't need to set your LANG environment)

Comment 13 Jeffrey Stedfast 2006-05-05 15:12:26 UTC

ok, fixed in CVS

Comment 14 Andrey Utkin 2006-05-05 15:21:58 UTC

Thanks a lot.
I hope the fix will be in 2.6.2 (or may be in 2.6.1.1 ;) )

Comment 15 Jeffrey Stedfast 2006-05-05 15:31:51 UTC

yea, it should be in evolution-data-server-1.6.2 packages whenever that happens (evolution-data-server-1.6.2 will be released alongside evolution-2.6.2)