GNOME Bugzilla – Bug 325290
RFC2047 header decoding Outlook compatibility
Last modified: 2006-01-04 18:29:13 UTC
Please describe the problem: Outlook abuses the =?charset?word?= encoding in mail headers by not respecting word boundaries correctly, and Evo can't currently decode this headers. The fix is fairly simple - don't require the encoding to be whitespace-separated from the rest of the header text. For example, Thunderbird is able to decode the headers sent by Outlook, and in fact sometimes generates such a header itself. An example, where Evo fails to decode correctly due to a quote character immediately following the end-of-encoding markup. Content-Type: application/msword; name="Vaatimuksia =?ISO-8859-1?Q?kehitt=E4jien_ty=F6tiloille=2Edoc?=" Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information:
Thanks for the bug report. This particular bug has already been reported into our bug tracking system, but please feel free to report any further bugs you find. I copied your comment to bug 302991. *** This bug has been marked as a duplicate of 302991 ***
I'm afraid this is _another_ bug which shouldn't have been marked as a duplicate, really. INVALID would be more appropriate. you should NOT use RFC 2047 encoding for MIME parameter values. this is covered by a separate RFC, RFC 2231. the correct coding for the above example would look like this: Content-Type: application/msword; name*=ISO-8859-1''Vaatimuksia%20kehitt%E4jien%20ty%F6tiloille.doc Evolution should be very careful about following standards when decoding filenames, since it has security implications if Evolution is more accepting than the malware scanner. this is a huge problem for antivirus software which has to try to protect the broken MIME parsers in Outlook and others, they need to implement bug-for-bug compatibility rather than the RFCs. (for Subject, From, etc., the security risk isn't as great, but it is always nice to be standards compliant, IMHO.)
This bug and bug 302991 are issues because of the non-standard behaviour of OUtlook. Like Kjetil pointed in Comment #2 i would consider this to be a WONTFIX. Thanks for the inputs Kjetil
This isn't dup of bug 302991. That one is about treatment of glyphs not part of the declared character sets, and in particular in the message listview. This one is about treatment of 2047 encoding which is not delimited to word boundaries. I acknowledge the security implications pointed out by Kjetil, but fail to understand exactly how NOT attempting to make a sensible decoding is more secure. Wouldn't it make more sense to decode, and then strip out "dangerous" characters? Another example: Microsoft Entourage likes to encode words in the headers beginning from the first character needing encoding. A contrived version of this would be the word Entou=?ISO-8859-1?B?cg==?=age on a Subject line of a message (I'm intentionally avoiding actual non-ascii characters here). It's highly unlikely that a message would be found where the string sequence =?(charset)?(B|Q)?(encoded-word)?= would be found even in the middle of a non-whitespace delimited word and NOT be meant for decoding. This is not purely about MS agents, either. The Content-Type line I quote earlier came from Thunderbird.
I don't think it is safe to change the file extension of the attachment, i.e. from ".doc?=" to ".doc", they're not the same thing to a naive application. the rest of the filename can have similar issues. who decides that is a dangerous character? NUL, slash, backslash, ampersand, quote marks? in RISC OS, what we know as ".." is "^", and "." is the directory separator. in MacOS 9, some applications use the colon as the directory separator. just removing "dangerous" characters has security implications, too. (consider "foo.e\\xe")
Right. Well, if you want to WONTFIX Thunderbird compatibility, what about the laxer decoding of 2047 in From and Subject fields for Outlook/Entourage compatibility? I figured out how to build evolution-data-server and its unit tests, so I might be inclined to look at creating a patch for it, but not if nothing's going to be accepted anyway.
I wrote a patch months ago (a year ago?) that made the code more lax, but by applying the patch, it could "accidently" decode stuff that wasn't meant to be decoded. So it is a WONTFIX as well.