GNOME Bugzilla – Bug 713326
GMail compresses header whitespace (was: view source munges the source)
Last modified: 2021-07-05 13:25:16 UTC
---- Reported by chaz@yorba.org 2012-04-20 12:01:00 -0700 ---- Original Redmine bug id: 5089 Original URL: http://redmine.yorba.org/issues/5089 Searchable id: yorba-bug-5089 Original author: Charles Lindsay Original description: When I view source, I expect to be shown exactly what the mail server is delivering to the client. Geary seems to violate that expectation in two ways. 1) Judging by how neatly formatted all the headers of each message are, it would appear that Geary is parsing the email, then reconstituting the message in RFC-822 format from its own interpretation. If there were zero bugs in Geary and any other program that we input that source into, and we made sure to carefully preserve things like header comments that every program **should** ignore, it wouldn't make a difference, but I'm not that optimistic. See difference in the screenshots showing Thunderbird's message source and Geary's message source for the same message. 2) Geary shows the source for the entire conversation, which is fine in general, but it does it in a way that makes it impossible to reliably reconstruct the thread. It appears that Geary is taking the messages in order and simply concatenating each message's source with an extra double newline between them. Without a Content-Length header, it can be impossible to tell whether what appears to be a new message is in fact a new message, or e.g. part of the previous message where someone had pasted some email headers. Using a format like [mbox](https://en.wikipedia.org/wiki/Mbox) (as odious as it is) would solve this problem -- there are probably other ways to deal with it too. The second issue is much smaller a concern than the first, because it seems unlikely that anyone would need to import a whole conversation anywhere else. It also somewhat conflicts with the first point -- there's no format to display that doesn't involve some munging, but I feel like I would trust un- munged message data combined in a lightweight container format more than what we have now: munged message data simply concatenated together. ---- Additional Comments From geary-maint@gnome.bugs 2013-01-14 17:47:00 -0800 ---- ### History #### #1 Updated by Charles Lindsay over 1 year ago * **Description** updated (diff) #### #2 Updated by Charles Lindsay over 1 year ago I just discovered point 2 only applies to when right clicking and clicking view source, not clicking on the arrow button on a message and viewing source. I wasn't aware those did two different things. Perhaps the right click version should be titled "view conversation source". #### #3 Updated by Jim Nelson over 1 year ago * **Category** set to _13_ * **Target version** set to _0.1_ Charles and I have investigated and discovered a couple of things: 1. The Geary UI is displaying the source after it's been pushed through a GMime interpreter. 2. However, in the cases we've seen so far, the results are byte-for-byte identical with the headers/body Geary pulls from the server. We verified this by manually logging into GMail and examining the results of a FETCH operation. 3. Thunderbird and GMail's web client display the same results (that is, with extra spaces at the start of a header line continuation, not a single space as we're seeing). This leads me to believe that Thunderbird somehow is pulling the "true" RFC822 message off the server through some mechanism we're not aware of. It may be an extension (GMail-specific or otherwise) that allows for this. The only reason I can think GMail is sending headers with reduced whitespace is as a form of network compression. I'm going to patch Geary so it displays the raw source (with no GMime interpretation), but won't close this ticket until we're retrieving the "true" headers. #### #4 Updated by Jim Nelson over 1 year ago * **Priority** changed from _Normal_ to _High_ Charles was able to perform a Thunderbird trace. We've identified the issue. The Geary engine today uses the standard RFC822.HEADER topic to fetch the headers of the message. GMail returns them with whitespace compression. Thunderbird fetches the entire message with the BODY[] topic. This returns the "true" headers with no whitespace compression. (It also returns the full body, attachments and all.) From my quick investigation, BODY[] is the only flavor of IMAP command that gets GMail to return the true headers. All the other variations return headers with whitespace compression. I'm marking this as High because this might be something we want to do for 0.1. I don't think it's critical for operation. #### #5 Updated by Jim Nelson over 1 year ago Geary now displays source without parsing it with GMime: 8676202f5450d9ce489612bcea8ff8168156c101 #### #6 Updated by Christian Dywan over 1 year ago Is there a difference between "true" and "compressed" source besides whitespace? If not, it would make sense to me to only use the former in debugging mode, since otherwise it'd waste bandwidth. #### #7 Updated by Adam Dingle over 1 year ago I agree that View Source should preferably show exactly what went over the wire - ideally what was sent from the mail originator to the recipient. I think the whitespace compression here is probably not a huge deal, but this might be one more argument in favor of downloading all messages via BODY[] someday. (To do that incrementally, we'd need to have an incremental MIME parser. It's not yet clear that GMime is that.) Charles, I can't reproduce your issue #2 - for me, Geary shows the source of each message individually rather than concatenating all messages from a conversation. Do you still see this in git master? #### #8 Updated by Charles Lindsay over 1 year ago Adam, see comment #2 on this ticket. When I wrote the ticket, I didn't know there are two different behaviors for "view source" depending on how it's invoked. #### #9 Updated by Adam Dingle over 1 year ago * **Target version** deleted (<strike>_0.1_</strike>) #### #10 Updated by Jim Nelson 10 months ago * **Subject** changed from _view source munges the source_ to _GMail compresses header whitespace (was: view source munges the source)_ * **Category** set to _engine_ * **Priority** changed from _High_ to _Low_ ONe addendum to Adam's suggestion: it should be investigated if the mechanism for incremental downloads preserves header whitespace before jumping in to this ticket. If my memory serves me, any call other than a full BODY call compressed whitespace, in which case incremental downloads won't help. Marking as Low, as this has not proven to be much of an issue so far. --- Bug imported by chaz@yorba.org 2013-11-21 20:20 UTC --- This bug was previously known as _bug_ 5089 at http://redmine.yorba.org/show_bug.cgi?id=5089 Imported an attachment (id=260650) Imported an attachment (id=260651) Unknown version " in product geary. Setting version to "!unspecified". Unknown milestone "unknown in product geary. Setting to default milestone for this product, "---". Setting qa contact to the default for this product. This bug either had no qa contact or an invalid one. Resolution set on an open status. Dropping resolution
I've been working on fixing GMime to reserialize message/mime-part headers exactly as the parser found them. This is all happening for GMime 3.0 since it required API changes.
I should mention that you don't need an "incremental parser" to download parts individually, GMime should be plenty fine for that.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/geary/-/issues/ Thank you for your understanding and your help.