Bug 438438 – Evolution does not use selected encoding in message headers

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 438438 - Evolution does not use selected encoding in message headers


Summary:	Evolution does not use selected encoding in message headers


Status:	RESOLVED WONTFIX

Product:	evolution
Classification:	Applications
Component:	Mailer
Version:	3.2.x (obsolete)
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2007-05-14 20:58 UTC by Krzysztof Lubański
Modified:	2018-03-22 11:17 UTC

See Also:
GNOME target:	---
GNOME version:	2.91/3.0

Attachments
Sample message composed with encoding set to ISO-8859-2 (1.45 KB, message/rfc822) 2012-01-29 22:04 UTC, Krzysztof Lubański	Details
source and headers of emails from different sources (5.03 KB, application/zip) 2018-01-25 23:43 UTC, René Genz	Details
VBScript and PowerShell scripts used to create emails (2.75 KB, application/zip) 2018-01-25 23:45 UTC, René Genz	Details

Description Krzysztof Lubański 2007-05-14 20:58:50 UTC

Distribution: Debian lenny/sid
Package: Evolution
Severity: Normal
Version: GNOME2.14.3 2.6.x
Gnome-Distributor: Debian
Synopsis: Evolution does not use selected encoding in message headers
Bugzilla-Product: Evolution
Bugzilla-Component: Mailer
Bugzilla-Version: 2.6.x
Description:
Description of Problem:

I am using UTF-8 Polish locale, but send e-mails with ISO-8859-2
encoding which is still the most popular in Poland. Though the message
body gets encoded in ISO-8859-2, as seleteced in preferences, the
headers are still in UTF-8 which is somehow inconsistent and causes
problems with some clients.

Additionally (maybe I should file it as another bug), when I mix words
containing language-specific characters and "pure-ASCII" words in the
subject, sometimes Evolution puts a tab between the words which is
ignored by some clients (e.g. Evolution itself), but interpreted by
other. The subject looks bad then with wide blank space.

I include an example message - some relevant headers (that Evolution
encoded in system-wide UTF-8, *not* in ISO-8859-2 selected in the
preferences) and the body containing language-specific characters (they
are properly encoded in ISO-8859-2).

Steps to reproduce the problem:
1. Choose UTF-8-based system locale.
2. Configure Evolution composer to use ISO-8859-2 (or probably any other
non-UTF encoding).
3. Use language-specific characters in headers, e.g. To: and Subject:.

Actual Results:

1. Message headers use system-wide locale encoding.
2. Unnecessary tab characters are inserted into some subject lines.

Expected Results:

Message headers using the encoding selected in Evolution preferences.

How often does this happen?

When using language-specific characters in headers: the headers'
encoding is *always* set as the system-wide one; unnecessary tabs are
*sometimes* added, depending on the composition of words.

Additional Information:

Sample message attached.





------- Bug created by bug-buddy at 2007-05-14 20:58 -------


Bugreport had an attachment. This cannot be imported to Bugzilla.
Contact bugmaster@gnome.org if you are willing to write a patch for this.

Comment 1 André Klapper 2007-05-15 20:37:18 UTC

duplicate of bug 224026?

Comment 2 André Klapper 2012-01-27 13:08:33 UTC

Hi Krzysztof,
If you have time, could you please check again whether this issue still happens
in Evolution 3.2.2 or 3.0.3 and update this report by adding a comment and
changing the "Version" field, provide information about your distribution, and
provide exact steps (click by click) to reproduce?
Also, the value for "Edit > Preferences > Mail Preferences > General > Message
Display > Default character encoding" would be interesting.
Thanks a lot!

Comment 3 Krzysztof Lubański 2012-01-29 22:04:24 UTC

Created attachment 206380 [details]
Sample message composed with encoding set to ISO-8859-2

Comment 4 Krzysztof Lubański 2012-01-29 22:17:39 UTC

Hello, André!

Wow, I surely didn't expect a comment on this bug after about four years since reporting it. But really, I'm glad that you asked!

So, I've just installed and run Evolution 3.2.2-1 on my current Debian/testing system (haven't used Evolution for a long time, actually).

In Composer Preferences -> General -> Character set, I have chosen Central European (ISO-8859-2), as when I had reported the bug in the first place. Then I just created and sent a simple message with some local Polish characters, all of them covered by ISO-8859-2, in both the headers and the body. I've attached the received message here.

Still, no luck -- the message has the body encoded (properly) in ISO-8859-2, but all the headers are in UTF-8 (I use only UTF-8-based locales). The headers are in proper UTF-8 and I can see the characters, they're just not in ISO-8859-2 as the body, which I would expect.

This also applies to the draft of the message. Setting Mail Preferences -> ... Default character encoding to ISO-8859-2 or UTF-8 doesn't seem to really matter here.

Actually, this doesn't bother me much right now as UTF-8 has been picked up virtually everywhere, as far as my email goes, over these years -- I've been using only UTF-8 in all messaging for a long time. It would still be useful to fix the bug, though, in case anyone needs to send mail to someone still using a non-UTF-8 mail client.

Please let me know if I can help any more here!

Comment 5 René Genz 2017-07-31 17:27:49 UTC

In Evolution 3.24.4 on Fedora 26 I have a similar problem with encoding.

I am using "UTF-8" as default encoding.
The email uses "Western European (ISO-8859-1)".
The subject and body of the email contains umlauts (Ä, ä, ü, ö, ß etc.).

In mail mode window in the email list the umlauts in the subject are displayed correctly.


The problems are in the message preview window.
a) In the "Subject:" line the Umlauts are replaced with "?" character.
It is a literal question mark.
It is not the "question mark in a square" replacement character (�).

b) In the mail body the umlauts are dropped without replacment, f.e. "möchten" will be displayed as "mchten".


You can manually switch to the appropriate encoding for a specific email via:
Evolution -- View -- Character Encoding -- |x| Western European (ISO-8859-1)

Problem a persists.
Problem b is gone. The umlauts are displayed correctly.



For reference:
I set default encoding via:
Evolution -- Edit -- Preferences -- Mail Preferences -- General: Message Display: Default character encoding: |...|

You can switch to mail mode via:
Evolution -- View -- Windows -- Mail

You can toggle display of message preview window via:
Evolution -- View -- Preview -- [x] Show Message Preview

Comment 6 André Klapper 2017-08-01 09:20:51 UTC

Feel free to attach a complete message source including headers to reproduce, after obfuscating any private / confidential data (usernames, email addresses, hostnames, IP numbers) so others can try to reproduce the problem.

Comment 7 René Genz 2017-08-02 19:05:04 UTC

Are the following steps sufficient to save the complete message source including headers?
1. in Evolution open affected email
2. View -- Message Source
3. new window opens; from it copy-paste text to text file
4. obfuscate text file

Comment 8 André Klapper 2017-08-04 13:59:54 UTC

I'd hope so, however "File > Save as mbox" and using a text editor might be easier. :)

Comment 9 René Genz 2018-01-25 23:43:44 UTC

Created attachment 367448 [details]
source and headers of emails from different sources

(In reply to André Klapper from comment #6)

The files have been obfuscated and are attached.
The *.mbox files have been created with:
File -- Save as mbox...

The *.txt files have been created with:
View -- Message Source ; then manually copy-paste the content into a text file



The files for comment 5 are:
text-plain from apache.*

The situation is:
- Umlaut in subject line of preview window garbled to a literal "?"
- Umlauts in body are dropped without replacement with default encoding (UTF-8); after changing encoding to "Western European, New (ISO-8859-15)" Umlauts in body of preview window are displayed properly  


I guess the subject line problem must be fixed on the server sending the email.
I guess the body problem must be fixed in Evolution.

Opening the same email in "Microsoft Outlook 2010" the Umlauts in subject and body are displayed properly. I guess it makes a leap of faith to "Western European, New (ISO-8859-15)" and is just lucky that the encoding matches.

Can you confirm any of my guesses?




I tried to create those emails.
I could not create them with on Fedora with:
$ echo "Hänsel\nund\nder Wald" | mailx -S "from=ME@WORK.de" -S smtp=smtp://MAIL.WORK.de -S "sendcharsets=iso-8859-15" -s "mailx - testing Umlauts: Hänsel" ME@WORK.de

The Umlauts are displayed always properly. No matter the encoding in the command and in Evolution.



I could create those emails with self-written scripts in Microsoft Windows.
The scripts have been obfuscated and are attached.
Emails from CMD-PowerShell (CMD-PowerShell-embedded.cmd) and PowerShell (PowerShell.ps1) behave the same in Evolution as far as I can see. Hence I include only one of them in source (see "text-plain from PowerShell.*" files).

Emails from VBScript (VBScript.vbs) do not have the problem with the subject line, but reproduce the problem with the body, see "text-plain from VBScript.*" files.

* CMD-PowerShell
- Umlauts are garbled in all positions to literal "?"

* PowerShell
- Umlauts are garbled in all positions to literal "?"

* VBScript
- Umlaut in subject is displayed properly in the email list and in the preview window
- Umlaut is not displayed in the body of the preview window with default encoding (UTF-8)
- after changing encoding to "Western European, New (ISO-8859-15)" the Umlaut is displayed properly in the body of the preview window

Comment 10 René Genz 2018-01-25 23:45:36 UTC

Created attachment 367449 [details]
VBScript and PowerShell scripts used to create emails

(In reply to André Klapper from comment #8)
With the VBScript-emails:
After changing the encoding to ISO-8859-15 the Umlaut is displayed in the body of the preview window, but the text the "View -- Message Source" window is not updated. It is still missing the Umlaut.
Is this a separate bug?

Comment 11 André Klapper 2018-01-26 00:02:42 UTC

René: I imported "text-plain from apache.mbox" from the attachment in comment 9. I can confirm that no umlauts are shown in the message body in 3.26. And that the subject includes a ?.

The email has no charset encoding defined which is against RFC 2047.

In my humble opinion this makes it an enhancement / feature request.

Comment 12 René Genz 2018-01-27 15:15:31 UTC

(In reply to André Klapper from comment #11)

Thank you for confirmation.
I will ask the maintainer of the server that sends emails of type "text-plain from apache.mbox" to fix the application/server.
This should fix both of the problem.s

From my point of view the enhancement is not necessary because it is against RFC 2047.

Comment 13 Milan Crha 2018-03-22 11:17:48 UTC

I see couple semi-related issues here. Detecting the right encoding when the message doesn't contain any clue about which had been used to compose it is pretty hard. Your are right that it was only a matter of luck when Outlook 2010 showed everything properly. The luck of using the same encoding in the UI as the sender.

Extra tab character in headers. I think it's still the case and the problem is header folding. When the text is encoded in the header, it can make the header longer than some limit (something around 72 letters or so), in which case the header value is split into multiple lines, which is called folding. The software is supposed to unfold the value to get the same string.

I think in time of using GtkHTML users could see those "question marks" when the character could not be shown in the view, or even a rectangle "characters" which contained the hexa code of the character, but since Evolution moved to WebKitGTK+ this does not happen any more. They just ignore to show letters they cannot properly show for some reason (I do not know how they actually transform a stream of characters into a set of glyphs), thus, I believe, that might be filled against WebKitGTK+ itself.

That leads us to the message source view. It's also provided through WebKitGTK+, thus the same issue applies as for the message preview and letters which cannot be shown for some reason. To avoid these issues, the safest is to save as mbox in evolution, rather than copy from the view.

Ideally, headers should not contain 8-bit letters. That's the RFC 2047 basically about. Using 8-bit letters in headers or message body can lead to many issues, both should be properly encoded and the part's Content-Type header should contain the "charset" parameter, where is written what character set the text is written in. Then every client will show the message body properly straight after opening it, because no auto-detection will be needed (and when the client is set to obey encoding provided by the message).

Finally, message body using different encoding than the headers. I looked into the code and there is currently no way to force used encoding. There's chosen either ISO-8859-1 or UTF-8, depending on the characters in the header value. As you said, UTF-8 is preferred method for single-byte encoding these days and more importantly it's widely supported by the clients, thus I'd not change this.

I'm closing this bug report.