Bug 325290 – RFC2047 header decoding Outlook compatibility

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 325290 - RFC2047 header decoding Outlook compatibility


Summary:	RFC2047 header decoding Outlook compatibility


Status:	RESOLVED DUPLICATE of bug 302991

Product:	evolution
Classification:	Applications
Component:	Mailer
Version:	2.4.x (obsolete)
Hardware:	Other All

Importance:	Normal minor
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2005-12-30 11:08 UTC by oa
Modified:	2006-01-04 18:29 UTC

See Also:
GNOME target:	---
GNOME version:	2.11/2.12

Description oa 2005-12-30 11:08:36 UTC

Please describe the problem:
Outlook abuses the =?charset?word?= encoding in mail headers by not respecting
word boundaries correctly, and Evo can't currently decode this headers. The fix
is fairly simple - don't require the encoding to be whitespace-separated from
the rest of the header text. For example, Thunderbird is able to decode the
headers sent by Outlook, and in fact sometimes generates such a header itself.
An example, where Evo fails to decode correctly due to a quote character
immediately following the end-of-encoding markup.

Content-Type: application/msword; name="Vaatimuksia
=?ISO-8859-1?Q?kehitt=E4jien_ty=F6tiloille=2Edoc?="

Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?


Other information:

Comment 1 André Klapper 2006-01-01 14:36:35 UTC

Thanks for the bug report. This particular bug has already been reported into our bug tracking system, but please feel free to report any further bugs you find.
I copied your comment to bug 302991.

*** This bug has been marked as a duplicate of 302991 ***

Comment 2 Kjetil Torgrim Homme 2006-01-03 15:55:14 UTC

I'm afraid this is _another_ bug which shouldn't have been marked as a duplicate, really.  INVALID would be more appropriate.  you should NOT use RFC 2047 encoding for MIME parameter values.  this is covered by a separate RFC, RFC 2231.

the correct coding for the above example would look like this:

Content-Type: application/msword;
 name*=ISO-8859-1''Vaatimuksia%20kehitt%E4jien%20ty%F6tiloille.doc

Evolution should be very careful about following standards when decoding filenames, since it has security implications if Evolution is more accepting than the malware scanner.  this is a huge problem for antivirus software which has to try to protect the broken MIME parsers in Outlook and others, they need to implement bug-for-bug compatibility rather than the RFCs.

(for Subject, From, etc., the security risk isn't as great, but it is always nice to be standards compliant, IMHO.)

Comment 3 parthasarathi susarla 2006-01-03 16:04:25 UTC

This bug and bug 302991 are issues because of the non-standard behaviour of OUtlook. Like Kjetil pointed in Comment #2 i would consider this to be a WONTFIX. Thanks for the inputs Kjetil

Comment 4 oa 2006-01-04 08:49:37 UTC

This isn't dup of bug 302991. That one is about treatment of glyphs not part of the declared character sets, and in particular in the message listview. This one is about treatment of 2047 encoding which is not delimited to word boundaries.

I acknowledge the security implications pointed out by Kjetil, but fail to understand exactly how NOT attempting to make a sensible decoding is more secure. Wouldn't it make more sense to decode, and then strip out "dangerous" characters?

Another example: Microsoft Entourage likes to encode words in the headers beginning from the first character needing encoding. A contrived version of this would be the word Entou=?ISO-8859-1?B?cg==?=age on a Subject line of a message (I'm intentionally avoiding actual non-ascii characters here). It's highly unlikely that a message would be found where the string sequence =?(charset)?(B|Q)?(encoded-word)?= would be found even in the middle of a non-whitespace delimited word and NOT be meant for decoding.

This is not purely about MS agents, either. The Content-Type line I quote earlier came from Thunderbird.

Comment 5 Kjetil Torgrim Homme 2006-01-04 11:17:26 UTC

I don't think it is safe to change the file extension of the attachment, i.e. from ".doc?=" to ".doc", they're not the same thing to a naive application.  the rest of the filename can have similar issues.  who decides that is a dangerous character?  NUL, slash, backslash, ampersand, quote marks?  in RISC OS, what we know as ".." is "^", and "." is the directory separator.  in MacOS 9, some applications use the colon as the directory separator.  just removing "dangerous" characters has security implications, too.  (consider "foo.e\\xe")

Comment 6 oa 2006-01-04 13:38:36 UTC

Right. Well, if you want to WONTFIX Thunderbird compatibility, what about the laxer decoding of 2047 in From and Subject fields for Outlook/Entourage compatibility?

I figured out how to build evolution-data-server and its unit tests, so I might be inclined to look at creating a patch for it, but not if nothing's going to be accepted anyway.

Comment 7 Jeffrey Stedfast 2006-01-04 18:29:13 UTC

I wrote a patch months ago (a year ago?) that made the code more lax, but by applying the patch, it could "accidently" decode stuff that wasn't meant to be decoded. So it is a WONTFIX as well.