After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 713060 - Garbled email addresses in conversation viewer
Garbled email addresses in conversation viewer
Status: RESOLVED FIXED
Product: geary
Classification: Other
Component: charset-encoding
master
Other All
: High normal
: 0.6.0
Assigned To: Geary Maintainers
Geary Maintainers
review
Depends on:
Blocks:
 
 
Reported: 2013-06-25 07:29 UTC by Eric Gregory
Modified: 2014-02-22 09:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Screenshot showing the sender OK in the list and garbled in the conversation (119.11 KB, image/png)
2014-02-18 19:01 UTC, Jakob Unterwurzacher
  Details
Meld diff comparing header in the Geary DB and Thunderbird source view (442.80 KB, image/png)
2014-02-18 19:25 UTC, Jakob Unterwurzacher
  Details
[PATCH] Initialize GMime with ENABLE_RFC2047_WORKAROUNDS (1019 bytes, patch)
2014-02-20 21:27 UTC, Jakob Unterwurzacher
none Details | Review

Description Charles Lindsay 2013-11-21 20:18:59 UTC


---- Reported by eric@yorba.org 2013-06-25 12:29:00 -0700 ----

Original Redmine bug id: 7158
Original URL: http://redmine.yorba.org/issues/7158
Searchable id: yorba-bug-7158
Original author: Eric Gregory
Original description:

Certain email addresses don't appear to get decoded correctly in the
conversation viewer.

For example, emails from AboutUs.org appear as follows:

    
    =?utf-8?Q?AboutUs.org?= updates@aboutus.org

Related issues:
related to geary - 5711: subject contains garbled UTF-8 characters (Open)



---- Additional Comments From geary-maint@gnome.bugs 2013-10-02 13:07:00 -0700 ----

### History

####

#1

Updated by Jim Nelson 5 months ago

  * **Category** set to _charset-encoding_

####

#2

Updated by Charles Lindsay 5 months ago

What does the source say the From header looks like?

####

#3

Updated by Eric Gregory 5 months ago

The from header is:

    
    From: =?utf-8?Q?AboutUs.org?= <updates@aboutus.org>

But it looks like the headers are all encoded the same way:

    
    Subject: =?utf-8?Q?Check=20out=20our=20infographic=20about=20content=20marketing?=
    From: =?utf-8?Q?AboutUs.org?= <updates@aboutus.org>
    Reply-To: =?utf-8?Q?AboutUs.org?= <updates@aboutus.org>
    To: =?utf-8?Q?Eric=20Gregory=20?= <eric@yorba.org>

####

#4

Updated by Charles Lindsay 5 months ago

_Spec lawyer alert._ Those From and Reply-To headers are technically in
violation of the spec. See [RFC 2047](http://www.faqs.org/rfcs/rfc2047.html)
section 5 subsection (3):

> ... the set of characters that may be used ... is restricted to: <upper and
lower case ASCII letters, decimal digits, "!", "*", "+", "-", "/", "=", and
"_" (underscore, ASCII 95.)>.

The dot in those addresses should be encoded as `=2E`. This might have the
same underlying cause as #5711.

####

#5

Updated by Eric Gregory 5 months ago

Too bad "spec lawyer" isn't a real profession, one could make **billions** off
the violations in email RFCs alone. It's worth noting that Gmail's web
interface does decode the address, correct or not.

####

#6

Updated by Jim Nelson 3 months ago

  * **Target version** changed from _0.4.0_ to _0.5.0_

####

#7

Updated by Jim Nelson about 1 month ago

I received an email with a garbled email address. The From: in the header is
like this:

    
    
    From: =?utf-8?Q?CleanBayArea.com=20=2D=20Free=20e=2Dwaste=20pickup=20since=202003?=
     <contact@cleanbayarea.com>
    

Gmail displays this correctly.



--- Bug imported by chaz@yorba.org 2013-11-21 20:19 UTC  ---

This bug was previously known as _bug_ 7158 at http://redmine.yorba.org/show_bug.cgi?id=7158

Unknown milestone "unknown in product geary. 
   Setting to default milestone for this product, "---".
Setting qa contact to the default for this product.
   This bug either had no qa contact or an invalid one.
Resolution set on an open status.
   Dropping resolution 

Comment 1 Jakob Unterwurzacher 2014-02-13 23:57:31 UTC
I'm in the middle of analyzing this but cannot finish it right now. Progress so far:


Original From that does shows up garbled:
=?utf-8?Q?WestLicht.=20Schauplatz=20f=C3=BCr=20Fotografie?= <info@westlicht.com>

This can be decoded by g_mime_utils_header_decode_text(), so I guess it's valid syntax. I tested it by adding the string to gmime-2.6.19/tests/test-mime.c and running "make check".

I tracked the problem down to rfc822-message.vala line 307:

from = new RFC822.MailboxAddresses.from_rfc822_string(message_sender);

message_sender at this point is

"=?utf-8?Q?WestLicht.=20Schauplatz=20f=C3=BCr=20Fotografie?=" <info@westlicht.com>

Note that quotes have been added! This string can not be decoded by g_mime_utils_header_decode_text().

Probably the string should have been decoded *before* adding the quotes?
Comment 2 Jim Nelson 2014-02-17 19:59:25 UTC
I did a little research into this problem last week.  I believe the problem is this:

* When Geary adds the email address to the database, it uses GMime.utils_quote_string().
* When it's parsed from the database back into memory, the quoted string is passed directly to GMime.utils_header_decode_text().

This works for most strings, even when quoting is applied.  (utils_quote_string only modifies the string if necessary.)  My first thought was to use utils_unquote_string before utils_header_decode_text, but that's not quite accurate -- the quoted addresses are concatenated when stored in the database, much like they're listed in an RFC822 header.  Unquoting the entire string will fail when multiple addresses are stored, hence the reason for calling utils_header_decode_text to break them apart.

Looking at the gmime source code (https://git.gnome.org/browse/gmime/tree/gmime/gmime-utils.c#n1167 and specifically https://git.gnome.org/browse/gmime/tree/gmime/gmime-utils.c#n1133), it appears it's the period in the quoted-printable that's triggering the addition of quotes.  This is not what we want, and so we may need to do our own verification that quoting is desirable before running it through GMime.
Comment 3 Jakob Unterwurzacher 2014-02-17 22:00:25 UTC
For reference, this is how it looks like in the database:

sqlite> SELECT from_field,sender from MessageTable WHERE from_field LIKE '%?Q?West%';
"=?utf-8?Q?WestLicht.=20Schauplatz=20f=C3=BCr=20Fotografie?=" <info@westlicht.com>|"=?utf-8?Q?WestLicht.=20Schauplatz=20f=C3=BCr=20Fotografie?=" <info=westlicht.com@mail128.atl61.mcsv.net>
Comment 4 Jakob Unterwurzacher 2014-02-18 19:01:19 UTC
Created attachment 269592 [details]
Screenshot showing the sender OK in the list and garbled in the conversation
Comment 5 Jakob Unterwurzacher 2014-02-18 19:23:18 UTC
I dug a bit deeper this time, created a new empty Gmail account to debug this cleanly.

In the SQLite database, from_field looks good actually (probably dumped the wrong email last time?):

sqlite> SELECT from_field, sender from MessageTable;
"WestLicht. Schauplatz für Fotografie" <info@westlicht.com>|"<info=westlicht.com@mail128.atl61.mcsv.net>" <=?ISO-8859-1?Q?"WestLicht._Schauplatz_f=FCr_Fotografie"?=@>

This explains why the sender looks OK in the list (see screenshot).

However, if you look at the "header" column, you notice that everything has been rewrapped!
I will attach a screenshot of the meld diff in a second.

Original:
From: =?utf-8?Q?WestLicht.=20Schauplatz=20f=C3=BCr=20Fotografie?= <info@westlicht.com>

Geary DB:
From: =?utf-8?Q?WestLicht.=20Schauplatz=20f=C3=BCr=20Fotografie?=
 <info@westlicht.com>

Jim, I'm not sure how this fits in with your theory, but gmime cannot decode that rewrapped "From:". Is the header rewrapping intentional? Disabling it may solve other corruption problems, too. I have not found out how to do that, though. Hints are welcome :)
Comment 6 Jakob Unterwurzacher 2014-02-18 19:25:20 UTC
Created attachment 269600 [details]
Meld diff comparing header in the Geary DB and Thunderbird source view

Geary rewraps the header lines
Comment 7 Jim Nelson 2014-02-18 22:22:13 UTC
(In reply to comment #5)
> Jim, I'm not sure how this fits in with your theory, but gmime cannot decode
> that rewrapped "From:". Is the header rewrapping intentional? Disabling it may
> solve other corruption problems, too. I have not found out how to do that,
> though. Hints are welcome :)

This might be the issue, but it's not really Geary's fault.

I suspect Thunderbird pulls down the entire RFC822 message at once.  With IMAP, that's done with a FETCH RFC822 command (there are other variants).

Geary was designed to build a sparse database of your email.  That is, it can fetch fragments of the email (headers, body, metadata, etc.).  This design was to quickly pull down the "envelope" and then later fetch the full email.  Because of this, Geary fetches the body and header separately using IMAP's FETCH RFC822.HEADER and FETCH RFC822.TEXT commands.

Whether or not this is a good idea, it's valid IMAP.  But, for whatever reason, Gmail (not Geary) re-wraps the headers when they're pulled with RFC822.HEADER:

$ openssl s_client -connect imap.gmail.com:993 -crlf
a login <email> <pw>
a select Inbox
a fetch * RFC822
<normal headers>
<body>
a fetch * RFC822.HEADER
<re-wrapped headers>

I haven't seen this with other servers, only Gmail.

In prior investigation of the problem, the re-wrapping seemed innocuous (i.e. it followed RFC822 wrapping rules and so was not technically causing any data loss).  It's entirely possible you've found a case where it is an issue, but we need to investigate more closely.

If you want to keep looking into this, we're welcome to hear more of what you find.
Comment 8 Jakob Unterwurzacher 2014-02-20 21:27:20 UTC
Created attachment 269841 [details] [review]
[PATCH] Initialize GMime with ENABLE_RFC2047_WORKAROUNDS

It turns out the wrapping was indeed innocuous and the problem is in fact the dot - as you suspected, Jim! 

The dot does seems to be technically illegal. The good thing is that GMime tolerates it once you enable ENABLE_RFC2047_WORKAROUNDS. The attached patch does that and I have not noticed regressions so far.
Comment 9 Jim Nelson 2014-02-20 22:10:26 UTC
Great work!  Let me look into this further before committing.

(In reply to comment #8)
> It turns out the wrapping was indeed innocuous and the problem is in fact the
> dot - as you suspected, Jim! 

Charles Lindsay pointed out the dot problem to me, so he deserves the congratulations for that.
Comment 10 Jim Nelson 2014-02-20 22:54:04 UTC
This patch also fixes bug #714339.
Comment 11 Jim Nelson 2014-02-20 23:19:26 UTC
As mentioned, this closes a separate bug and potentially fixes other problems, so I'm sold.  Pushed to master, commit 1cbd39b.  Thanks for sticking with this!

Incidentally, the ticket describing our investigation into Gmail's header re-wrapping is bug #713326.
Comment 12 Jakob Unterwurzacher 2014-02-22 09:02:20 UTC
Jim, your responsiveness has created the best experience I have seen in a bug tracker so far.