Bug 417000 – Generated email subject shows without space in Microsoft Outlook

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 417000 - Generated email subject shows without space in Microsoft Outlook


Summary:	Generated email subject shows without space in Microsoft Outlook


Status:	RESOLVED FIXED

Product:	evolution
Classification:	Applications
Component:	Mailer
Version:	2.8.x (obsolete)
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	evolution-mail-maintainers
QA Contact:	Evolution QA team

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2007-03-11 05:16 UTC by Santiago Erquicia
Modified:	2008-04-04 15:12 UTC

See Also:
GNOME target:	---
GNOME version:	2.15/2.16

Attachments
*Patch against version 1.4.5* to use spaces to fold long headers instead of tabs.** (1.19 KB, patch) 2008-03-08 03:27 UTC, madcap	none	Details \| Review

Description Santiago Erquicia 2007-03-11 05:16:43 UTC

Please describe the problem:
I wrote an email with the following subject: "Proyecto USAID Paraguay - Análisis de situación". When the email is read in Microsoft Outlook, the subject shows up as "Proyecto USAID Paraguay - Análisis desituación" (there is a missing space before "situación").

Looking at the source code of the email I found that just after the word "de", there is a line break which might cause this problem. Here is the exact text:

Subject: Proyecto USAID Paraguay - =?ISO-8859-1?Q?An=E1lisis?= de
        =?ISO-8859-1?Q?situaci=F3n?=

The email subject shows up perfectly if you use Evolution or Gmail, but not under Microsoft Outlook.

Thunderbird generates an email with the following subject:

Subject: Proyecto USAID Paraguay - =?ISO-8859-1?Q?An=E1lisis_de_situa?=
 =?ISO-8859-1?Q?ci=F3n?=

Steps to reproduce:
1. Write an email with the specified subject
2. Check the result in Microsoft Outlook
3. 


Actual results:
There is a missing space in the subject line in Microsoft Outlook

Expected results:
The subject shows up as in Evolution

Does this happen every time?
Yes

Other information:
I guess it's an Outlook bug, but it forced me to switch to Thunderbird given that most of my customers use Outlook and it doesn't look profesional to "write" emails with orthography mistakes (even if they are not mine).

Comment 1 André Klapper 2007-03-11 14:55:40 UTC

hmm... yes, please file a bug against outlook. :-/

Comment 2 Santiago Erquicia 2007-03-11 15:56:15 UTC

From what I'm seeing in my emails, Mozilla Thunderbird keeps changing to that format anything (is encode the right word for this?) after the first word that has an accented word. On the other hand, Microsoft Outlook, as well as Yahoo Mail, changes everything in the subject line if there is an accented word. It seems that evolution only "encodes" the words that have an accented letter.

Comment 3 C de-Avillez 2007-07-12 19:12:01 UTC

I am unsure on this one as being Outlook-related. I looked at RFC2045, and it says the following (section 6.7 "Quoted-Printable Content-Transfer-Encoding"):

(...)
    (3)   (White Space) Octets with values of 9 and 32 MAY be
          represented as US-ASCII TAB (HT) and SPACE characters,
          respectively, but MUST NOT be so represented at the end
          of an encoded line.  Any TAB (HT) or SPACE characters
          on an encoded line MUST thus be followed on that line
          by a printable character.  In particular, an "=" at the
          end of an encoded line, indicating a soft line break
          (see rule #5) may follow one or more TAB (HT) or SPACE
          characters.  It follows that an octet with decimal
          value 9 or 32 appearing at the end of an encoded line
          must be represented according to Rule #1.  This rule is
          necessary because some MTAs (Message Transport Agents,
          programs which transport messages from one user to
          another, or perform a portion of such transfers) are
          known to pad lines of text with SPACEs, and others are
          known to remove "white space" characters from the end
          of a line.  Therefore, when decoding a Quoted-Printable
          body, any trailing white space on a line must be
          deleted, as it will necessarily have been added by
          intermediate transport agents.
(...)
    (5)   (Soft Line Breaks) The Quoted-Printable encoding
          REQUIRES that encoded lines be no more than 76
          characters long.  If longer lines are to be encoded
          with the Quoted-Printable encoding, "soft" line breaks
          must be used.  An equal sign as the last character on a
          encoded line indicates such a non-significant ("soft")
          line break in the encoded text.

Additionally, RFC 2047 states (section 2 "Syntax of encoded-words"):

(...)
   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.  As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  For example, the character sequence

      =?iso-8859-1?q?this is some text?=

   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is some
   text" is to encode the SPACE characters as well, e.g.

      =?iso-8859-1?q?this=20is=20some=20text?=

   The characters which may appear in 'encoded-text' are further
   restricted by the rules in section 5.

and, later of RFC2047 (section 4.2 "The "Q" encoding")

(...)
   (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
       represented as "_" (underscore, ASCII 95.).  (This character may
       not pass through some internetwork mail gateways, but its use
       will greatly enhance readability of "Q" encoded data with mail
       readers that do not support this encoding.)  Note that the "_"
       always represents hexadecimal 20, even if the SPACE character
       occupies a different code position in the character set in use.

   (3) 8-bit values which correspond to printable ASCII characters other
       than "=", "?", and "_" (underscore), MAY be represented as those
       characters.  (But see section 5 for restrictions.)  In
       particular, SPACE and TAB MUST NOT be represented as themselves
       within encoded words.


So it seems Evo is not fully respecting the RFCs, at least regarding the Subject header.

Comments, as always, are welcome.

I have confirmed this behaviour on Evo 2.11.5 and e-d-s 1.11.5.

Comment 4 Jeffrey Stedfast 2007-07-13 03:07:56 UTC

(Note: rfc2045 section 6.7 is completely irrelevant to this case, that applies only to MIME part bodies, not headers)

you mean that Outlook isn't following the RFCs, correct?

because if you read what you just posted, you'd see evolution is following the rules precisely.


Subject: Proyecto USAID Paraguay - =?ISO-8859-1?Q?An=E1lisis?= de
        =?ISO-8859-1?Q?situaci=F3n?=

atom: Proyecto
atom: USAID
atom: -
atom: =?ISO-8859-1?Q?An=E1lisis?=
atom: de
atom: =?ISO-8859-1?Q?situaci=F3n?=

each encoded word looks like a legal atom token to me... 


anyways, your assumption that evo encodes on a word-by-word basis is also wrong (unless someone changed it since I left the team), it gathers words into like-encodings but not to exceed the remainder of the line length (up to 78 chars or some such).

Thus, since combining the encoding of 'situación' with 'Análisis' + ' de ' would have exceeded 78 chars, they were split - since they were split, ' de ' was not added to the encoding of 'Análisis' because it was us-ascii and didn't need to be.

If I was to hazard a guess, I would say that Outlook assumes it is supposed to ignore linear white space between all atom tokens when 1-or-more are encoded... this is not true. The RFC states in section 6.2:

   When displaying a particular header field that contains multiple
   'encoded-word's, any 'linear-white-space' that separates a pair of
   adjacent 'encoded-word's is ignored.  (This is to allow the use of
   multiple 'encoded-word's to represent long strings of unencoded text,
   without having to separate 'encoded-word's where spaces occur in the
   unencoded text.)


notice that it says "a pair of adjacent encoded-words"...

since:

atom: de
atom: =?ISO-8859-1?Q?situaci=F3n?=

'de' is not an encoded-word token, any lwsp between 'de' and '=?ISO-8859-1?Q?situaci=F3n?=' MUST be preserved in the display.

hence... Outlook bug.

Comment 5 C de-Avillez 2007-07-13 16:39:12 UTC

Thank you. I was indeed led on a wild goose chase. Went back to look at it and...

Ah, RFC2047 is the one I should have looked at. Sorry. 

My question came up exactly because I could not make sense out of 2045.

But, anyway: the original reporter (at https://bugs.edge.launchpad.net/evolution/+bug/115844) stated:

"I guess it's an Outlook bug, but it forced me to switch to Thunderbird given that most of my customers use Outlook and it doesn't look professional to "write" emails with orthography mistakes (even if they are not mine).

I hope that Ubuntu, since it's very focussed on making things work, can push for a fix to this interoperability problem, even if Evolution actually follows the spec."

It is now clear to me that indeed Evo is following the specs. But, in the interest of usage, would it be possible to accept this as -- at least -- a wishlist? This would mean, I guess, encoding lswp that begins a continuation line. of course, I am REALLY not sure of potential ramifications.

Meanwhile, I am leaving this as new.

Comment 6 Jeffrey Stedfast 2007-07-13 17:31:08 UTC

that's up to the current evo mail maintainers... 

I personally don't mind, so long as whatever patch goes in to work around this Outlook bug doesn't break evo's rfc2047 compliance wrt output :)

Comment 7 Santiago Erquicia 2007-12-27 02:04:48 UTC

(In reply to comment #4)
> 
> atom: de
> atom: =?ISO-8859-1?Q?situaci=F3n?=
> 
> 'de' is not an encoded-word token, any lwsp between 'de' and
> '=?ISO-8859-1?Q?situaci=F3n?=' MUST be preserved in the display.
> 
> hence... Outlook bug.
> 

I just made some tests and when the following subject is set "Proyecto USAID Paraguay - Análisis de situación", the "Message Source" is the following:

Subject: Proyecto USAID Paraguay - =?ISO-8859-1?Q?An=E1lisis?= de
        =?ISO-8859-1?Q?situaci=F3n?=

(Note that there is no space after "de")

If I use the following subject: "Proyecto USAID Paraguay - Análisis de   situación" (three spaces after "de"), the following is generated:

Subject: Proyecto USAID Paraguay - =?ISO-8859-1?Q?An=E1lisis?= de  
        =?ISO-8859-1?Q?situaci=F3n?=

(Note that there are two spaces after "de")

I assume that the spaces before "=?ISO-8859-1?Q?situaci=F3n?=" are actually a representation of a TAB because they are 8 characters. 

Where does the RFC specifies that a TAB should be represented as a space in a mail client while unfolding? I have searched the internet and found that many MUAs replaces CR/LR and TABs while unfolding.

There is a message on Mailman users mailing list that do a comment like that: http://mail.python.org/pipermail/mailman-users/2007-June/057499.html

If that is the case, wouldn't be better to replace a TAB with a SPACE so no MUA misrepresents the space of the original subject? 

(I'm not an expert in this field, but I try to investigate about this problem as much as possible because it is really annoying to my day-to-day work)

Comment 8 Jeffrey Stedfast 2007-12-27 03:38:13 UTC

It's not specified, but many (most?) mail clients do it to make the raw message header formatting look nicer.

I would accept a patch which makes Evolution use the WSP char from the pre-folded text instead of always using a TAB.

Comment 9 madcap 2008-03-08 02:19:52 UTC

I am hitting the following bug, which seems to be the same as what is described above minus any word encodings:

If the message goes over the 78 character limit and is wrapped, a tab is inserted after the CR/LF instead of a WSP. 

Sounds similar to what is described above... except the result when viewing the email is that a tab character is inserted where the space ought to be; both on Evolution as well as Outlook (2007).

One caveat: I've encountered this bug on Evo 1.4.5. Yeah, I know, ancient, but I'm currently stuck on RHEL3. Anyway, it looks like the bug still exists in some form.

Comment 10 madcap 2008-03-08 03:27:28 UTC

Created attachment 106827 [details] [review]
Patch against version *1.4.5* to use spaces to fold long headers instead of tabs.

This is against 1.4.5. Make of it what you will.

Note: the *real* solution should probably look at what the actual whitespace character is and use that (because as written, if someone uses tabs in their subject line instead of spaces, a tab might get replaced with a space; although I don't know how to insert tabs into the subject line in evolution anyway).

Comment 11 Milan Crha 2008-04-02 16:41:37 UTC

fejj, could I ask you to review this, please? Thanks.

Comment 12 Jeffrey Stedfast 2008-04-02 19:47:32 UTC

as madcap has noted, the patch isn't correct because it forces the use fo a space when it should really use the lwsp char it folded on.

anyways... fixed this myself in svn

Comment 13 André Klapper 2008-04-04 15:12:04 UTC

*** Bug 523259 has been marked as a duplicate of this bug. ***