GNOME Bugzilla – Bug 302991
RFC2047 subject decoding of outlook emails (?iso-8859-...)
Last modified: 2012-01-31 12:49:01 UTC
This bug has been opened here: https://bugzilla.ubuntu.com/9078 "evolution does not display € the right way in the listview, they are displayed the right way in the mail display part. ... from line looks like this: From: =?iso-8859-1?Q?=80?= <emal@example.com> " from IRC: <kjetilho> the correct name is CP1252, I think. <kjetilho> yep, http://www.microsoft.com/globaldev/reference/sbcs/1252.htm ... <seb128> kjetilho: is evolution supposed to handle that correctly? ie: is that a bug? <kjetilho> seb128: it's not a bug in Evolution <seb128> hum, there is http://bugzilla.gnome.org/show_bug.cgi?id=259292 about this <kjetilho> but of course Evolution could incorporate a hack to support it <seb128> kjetilho: apparently it has a hack somewhere, since the mail is correct, only the mail list displays it bugged ... <kjetilho> seb128: the mail is _not_ correct <kjetilho> it claims to be iso-8859-1, but it's not. <seb128> kjetilho: how come than it's displayed correctly by the preview pane? <NotZed> there is a hack in the display code to check for windows charsets and remap them to the correct one <seb128> k <seb128> that explains it <kjetilho> NotZed: heh. inconsistent handling _is_ a bug ;)
ask the sender to fix their mailer
try to ask to microsoft to fix their mailers, that will take some time and some user still use outdated version. For the moment that's evolution which seems to be broken to users since he deals with the same subject differently on 2 different places andmozilla has no issue with these mails by example. As described by the original comment that would be nice to workaround it.
*** Bug 325290 has been marked as a duplicate of this bug. ***
rephrasing subject
*** Bug 317083 has been marked as a duplicate of this bug. ***
the IRC transcript quotes rather daming comments from me, but I don't actually think a workaround for this would be a horrible thing. I copy a little from my comments to bug 317083: "the bug is about Evolution displaying the octet value 0x80 as the glyph |00| |80| rather than the euro sign (€), even though the charset is declared to be ISO 8859-1. in other words, the broken client sends out characters from the CP-1252 coded character set, but claims the characters are in ISO 8859-1. the RFCs don't have anything to say about that... this bug doesn't really touch on RFC 2047 decoding at all. it's just a request to be a bit lenient and special case the octet value 0x80, so that it maps to the Unicode U+20AC. personally, I don't see a great harm in that." I don't know if there are other characters in CP-1252 which could do with similar remapping. I'm pretty sure the euro character is the most noticable one. BTW, the subject for this bug should be changed back so that it is less misleading.
At least bug 325290 is about a slightly different topic of what is the proper delimitation of the =??Q??= encoding. RFC 2047 states that it should always be separated by white space from other header words, but in practice a lot of clients, Outlook in particular, don't respect this when encoding the headers. It's a case of "be strict when you encode, and lenient when you decode". My interpretation of bug 317083 is that it's not about the charset either (at least primarily), but the same "what's an encoded-word" decision. Quote from the RFC: IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's by an RFC 822 parser. As a consequence, unencoded white space characters (such as SPACE and HTAB) are FORBIDDEN within an 'encoded-word'. For example, the character sequence =?iso-8859-1?q?this is some text?= would be parsed as four 'atom's, rather than as a single 'atom' (by an RFC 822 parser) or 'encoded-word' (by a parser which understands 'encoded-words'). The correct way to encode the string "this is some text" is to encode the SPACE characters as well, e.g. =?iso-8859-1?q?this=20is=20some=20text?= The characters which may appear in 'encoded-text' are further restricted by the rules in section 5. I can't find a good quote about the "whitespace must precede =? and trail ?=" part, but it's implicit by how whitespace should be treated when two encoded-words follow each other. Anyhow - in most of the cases quoted in the linked bugs, the encoded-words violate the quoted paragraphs above, but as they are commonly occuring cases in real-world email, Evolution should treat them "as they were meant".
I really think this should be a WONTFIX. Hmm... users would *not* really like that though.
WONTFIX would be the simple choice, yes, but what is most important, standards compliance, security, or interoperability? I would say that without the latter, the former two are meaningless.
oa, which bug are you talking about? anyway, interoperability is defined by standard compliance. only when the standard is incompletely specified, leading to ambiguities, should you consider mimicing other implementations to enhance interoperability.
My apologies - my comment is relevant for Bug 325290 and perhaps Bug 318083, but not this one, as those two are NOT duplicates of this. A comment relevant to this bug: it seems that for some encodings, Evolution does not display the properly decoded form in the listview although it does in the message display. Two From: headers: From: Aapo =?iso-8859-1?Q?Kyr=F6l=E4?= <email.deleted> From: =?ISO-8859-1?Q?Aapo_Kyr=F6l=E4?= <email.deleted> Both shown correctly in the message view, but the first one is not decoded in the listview. In this case, there certainly is no problem with the character sets. The one displaying the error is much older, first read with an older version of Evo. Should I clear some cache for this test? Which one and how?
yes, you'll prob need to clear your cache. what you need to do is delete .ev-summary file for the mailbox in question.
I deleted all cache directories and summary files under .evolution/mail/imap4, but the problem persists. Messages containing the first From header display incorrectly (in listview only), messages with the second header are OK. Only my four precreated (empty) local folders have .ev-summary.
(In reply to comment #13) > I deleted all cache directories and summary files under .evolution/mail/imap4, > but the problem persists. Messages containing the first From header display > incorrectly (in listview only), messages with the second header are OK. Only my > four precreated (empty) local folders have .ev-summary. both headers work perfectly here (2.4.1), both in listview and preview. I sent the messages by telnet to my SMTP server, with the values: From: Aapo =?iso-8859-1?Q?Kyr=F6l=E4?= <email@deleted.com> From: =?ISO-8859-1?Q?Aapo_Kyr=F6l=E4?= <email@deleted.com>
you probably deleted the wrong cache, imap4 is experimental - you're probably using imap instead.
Been using imap4 since Evo 2.4's release. I verified, there were no caches left anywhere in the evo directory tree (which is why I wiped all the caches, and not just those of the folder in question).
imap backend doesn't suffer from the same problem, based on a test of adding the same account a second time. I was under the impression that imap4 was considered the preferred backend type, though. Perhaps I should switch back...
no, imap4 is experimental and is not the preferred backend. probably gonna be removed from the tree since I left the project nearly a year ago (why am I even looking at bugzilla? I have no idea... :p)
fixed in svn
*** Bug 519323 has been marked as a duplicate of this bug. ***
Not fixed yet in Evo 2.22
Created attachment 108722 [details] Screenshot of the "From" field in the message list.
Same here.
it's fixed in 2.23
Created attachment 118591 [details] evolution bug i message list it isn't fixed in 2.23.92
Mikolaj, you really need to include the raw header (copied from View Source) for developers to make a judgement of whether Evolution should add another exception to the code. I think it is important to stress that the bug is *not* in Evolution, although Evolution tries to undo some forms of braindamage in other e-mail clients.
his examples are also not of the subject header, they are address headers. anyways, Mikolaj is running into the "gmail doesn't properly encode address headers" bug which is filed as bug #536457
*** Bug 372986 has been marked as a duplicate of this bug. ***