After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 714317 - Hide HTML code in preview
Hide HTML code in preview
Status: RESOLVED FIXED
Product: geary
Classification: Other
Component: engine
master
Other All
: High normal
: 0.12.0
Assigned To: Michael Gratton
Geary Maintainers
: 713571 713605 751143 775489 (view as bug list)
Depends on:
Blocks: 772607 776330
 
 
Reported: 2012-08-30 09:35 UTC by Geary Maintainers
Modified: 2016-12-25 00:17 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test.txt (28.00 KB, text/plain)
2012-08-30 21:54 UTC, Geary Maintainers
Details
Message Text (1.91 KB, text/plain)
2016-10-27 23:20 UTC, Ken Berland
Details
HTML Rendered as Preview (88.42 KB, image/png)
2016-10-27 23:24 UTC, Ken Berland
Details
e-mail affected by problem as mentioned in comment #12 (5.48 KB, text/plain)
2016-12-21 08:40 UTC, Marvin W
Details

Description Charles Lindsay 2013-11-21 20:26:02 UTC


---- Reported by geary-maint@gnome.bugs 2012-08-30 14:35:00 -0700 ----

Original Redmine bug id: 5743
Original URL: http://redmine.yorba.org/issues/5743
Searchable id: yorba-bug-5743
Original author: Alexander Wilms
Original description:

When receiving an HTML mail, the preview shows stuff like <br><div
style="width: 700px and so on. This should probably be hidden so that only
actual content is visible.

Related issues:
related to geary - 5130: Strip common message armor from previews (Fixed)
related to geary - 7263: MIME-Version header visible in preview (Open)
related to geary - Bite-sized #4405: previews don't fill message list pane (Invalid)
related to geary - 7565: Text attachment used as preview when message has
no body (Open)
related to geary - 7221: Generate preview from full message text when
available (Open)
duplicated by geary - 7443: Message body preview doesn't strip all html (Duplicate)



---- Additional Comments From geary-maint@gnome.bugs 2013-09-23 15:44:00 -0700 ----

### History

####

#1

Updated by Adam Dingle about 1 year ago

I don't see this behavior. What version of Geary are you running (0.1? git
master?) If you're running the latest build, can you attach the source of a
message that exhibits this problem?

####

#2

Updated by Alexander Wilms about 1 year ago

  * **File** test.txt added

I'm using the latest build from ppa:yorba/daily-builds.

####

#3

Updated by Adam Dingle about 1 year ago

  * **Priority** changed from _Normal_ to _High_
  * **Target version** set to _0.2_

Thanks for the additional info!

####

#4

Updated by Adam Dingle about 1 year ago

  * **Tracker** changed from _Feature_ to _Bug_

####

#5

Updated by Adam Dingle about 1 year ago

Alexander,

This looks like a plain text mail message with HTML inside it. Can you view
the same message using another mail client and let us know what it looks like
there?

####

#6

Updated by Adam Dingle about 1 year ago

  * **Target version** deleted (<strike>_0.2_</strike>)

####

#7

Updated by Alexander Wilms about 1 year ago

There is some formatted text with images and links. At the bottom is some non-
formatted text, a quote.

####

#8

Updated by Jim Nelson 11 months ago

  * **Category** set to _client_
  * **Target version** set to _0.3.0_

####

#9

Updated by Eric Gregory 11 months ago

Fixing the preview is easy enough, it would just involve adding a call to our
HTML striping function to the preview generator. But I assume the message
itself isn't displayed correctly by since the MIME info is incorrect.

####

#10

Updated by Jim Nelson 9 months ago

  * **Target version** changed from _0.3.0_ to _0.4.0_

####

#11

Updated by Robert Schroll 6 months ago

As I noted over in #4215, we now have two different code paths for generating
previews. The previews in the conversation list are stored in the database,
and is ultimately set from some FetchResults. The previews in the conversation
view are made from RFC822.Messages each time the email is loaded. (This was
changed when we added support for nested emails and wanted to use the same
code for dealing with both top-level and nested emails.) These two have their
own sets of problems; here's what I've noticed.

  * Previews from the FetchResults don't always strip out HTML tags; the ones from RFC822.Message do better.
  * Previews from RFC822.Message may included quoted text; ones from FetchResults are better at only including new text.
  * For messages with short bodies, the preview from FetchResults can end up with headers from the following Mime part. This doesn't happen with RFC822.Message previews.

We may not be able to completely unify these code paths, since it may be
useful to get previews without needing the full message downloaded and parsed
into a RFC822.Message. But since each is demonstrating problems solved by the
other, I suspect there's at least some duplicated code we can excise.

####

#12

Updated by Jim Nelson 6 months ago

This is a good summary of where we stand with previews. Two things to add:

  * We could try making the FetchResults preview smarter by fetching the message's MIME structure beforehand and using that to make an informed decision about what portion to fetch for the preview (and how much to get).
  * The FetchResult previews tend to be fetched only for the message at the head of a conversation, whereas the RFC822.Message previews are used when all of the message (headers and body) has been pulled down. If the RFC822 message is available, including the head of the conversation, we could use that for preview instead. This means that any message's preview could change over time, depending on what's available.

The two of these could make our preview story better: smarter pulling of the
preview text from the server, and even if it's not so great, it will be
updated to an RFC822.Message preview when the full email is available. The
client could monitor a signal and update its preview when it's "upgraded".

####

#13

Updated by Robert Schroll 5 months ago

One other preview issue that I've just noticed: An email with the first mime
part being "multipart/signed" with content "This is an S/MIME signed message."
gets that text as its conversation list preview (which comes from the
FetchResult). The conversation viewer preview (from the RFC822.Message)
correctly comes from a "text/plain" part.

####

#14

Updated by Jim Nelson 4 months ago

  * **Category** changed from _client_ to _engine_

####

#15

Updated by Jim Nelson about 1 month ago

  * **Target version** changed from _0.4.0_ to _0.5.0_



--- Bug imported by chaz@yorba.org 2013-11-21 20:26 UTC  ---

This bug was previously known as _bug_ 5743 at http://redmine.yorba.org/show_bug.cgi?id=5743
Imported an attachment (id=260945)

Unknown milestone "unknown in product geary. 
   Setting to default milestone for this product, "---".
Setting qa contact to the default for this product.
   This bug either had no qa contact or an invalid one.
Resolution set on an open status.
   Dropping resolution 

Comment 1 Paul Rensing 2014-10-11 15:22:30 UTC
This is still a problem in 0.8.1 (Linux). I see it in an "Other" type account.
Comment 2 Ken Berland 2016-10-27 23:20:10 UTC
Still an issue in 0.11.2-1~trusty1.  Attaching the message source an screenshot.
Comment 3 Ken Berland 2016-10-27 23:20:54 UTC
Created attachment 338653 [details]
Message Text
Comment 4 Ken Berland 2016-10-27 23:24:00 UTC
Created attachment 338654 [details]
HTML Rendered as Preview
Comment 5 Michael Gratton 2016-12-18 15:20:51 UTC
WIP branch going up here: wip/714317-hide-html-in-preview

Already fixed:
 - CSS and HTML sometimes showing up in the header
 - Merged code paths for fetch and conversation message preview generation
 - Don't try to generate a preview unless the part is text/plain or text/html
 - Added a bunch of unit tests

Still todo:
 - Fetch body structure so we know what the best part to fetch is, rather than guessing the first 
 - Update the saved preview when full body has been fetched & notify the client
 - Use the saved preview in the conversation viewer for the primary message's preview (instead of re-generating it every time)
Comment 6 Michael Gratton 2016-12-20 01:20:25 UTC
*** Bug 775489 has been marked as a duplicate of this bug. ***
Comment 7 Michael Gratton 2016-12-20 13:19:32 UTC
WIP branch has been updated to use the full message body for generating the preview, in all circumstances: The email prefetcher was getting the complete message anyway, but now the conversation list also now fetches the complete message for unread messages, which is rather suboptimal, but necessary since fetching PREVIEW is still rather broken, for two reasons:

1. It's only prefetching a very small number of bytes, so with all of the markup and CSS overhead, little to no preview text can be obtained from HTML parts
2. Since we don't know the structure of the message beforehand, the preview is blindly choosing the first part, which is often a bad choice as it's commonly neither text/plain or text/html.

To fix fetching PREVIEW properly, we need to fetch the IMAP BODYSTRUCTURE in advance, and that probably means also storing it in the DB. I've filed Bug 776309 to take care of that in the future, since it's non-trivial.

For now, I'm inclined to land this on master and see if people are impacted by the conversation list pulling down complete messages. I'd kind of be surprised though since the email prefetcher would start downloading it almost immediately anyway.

Before I do, can people here try building the wip/714317-hide-html-in-preview and running with that for a while and let me know how well it works for you? Note that it won't fix old, broken previews in the DB. That will require a rebuild.
Comment 8 Michael Gratton 2016-12-20 13:36:16 UTC
*** Bug 751143 has been marked as a duplicate of this bug. ***
Comment 9 Michael Gratton 2016-12-20 13:37:28 UTC
*** Bug 713605 has been marked as a duplicate of this bug. ***
Comment 10 Michael Gratton 2016-12-20 13:40:30 UTC
*** Bug 713571 has been marked as a duplicate of this bug. ***
Comment 11 Michael Gratton 2016-12-20 14:05:36 UTC
A few other minor comments:

- Base64 encoded plain text parts commonly use LF for line delims, and hence quotes, etc aren't being stripped by Geary.RFC822.Utils::to_preview_text(): it expects RFC822 style CRLF line delims. It should probably instead require plain text input normalised to just LF, and fix callers to provide that
- There's maybe some further tweaks left for stripping out unwanted text: e.g. the signature delim and everything after for plain parts, any URLs in the first n chars (unless that's the only content), etc.
- The branch also contains some fixes along the lines of Bug 713191 (commit deb0c41) but again would need a DB FTS table rebuild to apply it to old messages.
Comment 12 Marvin W 2016-12-20 20:06:50 UTC
When text/html only e-Mails are passed through a mailman, it uses multipart/mixed to add the text/plain footer as a separate part to the e-mail. Before merging this branch, the preview for such a mail would be only the text/plain footer (which is useless), now it's just empty (not better). I think there should be some work put into correctly handling multipart/mixed here.
Comment 13 Michael Gratton 2016-12-21 00:50:55 UTC
I just pushed another fix to the WIP branch that addresses the Base64 issue above, and that also incidentally fixes quotes, armour, etc not being stripped from plain-text-only messages.

(In reply to Marvin W from comment #12)
> When text/html only e-Mails are passed through a mailman, it uses
> multipart/mixed to add the text/plain footer as a separate part to the
> e-mail. Before merging this branch, the preview for such a mail would be
> only the text/plain footer (which is useless), now it's just empty (not
> better). I think there should be some work put into correctly handling
> multipart/mixed here.

Thanks for pointing this out. There's a few things going on here. Geary does have some issues handling Mailman's multipart/mixed  when HTML is involved (I've just filed Bug 776329 to cover that). Also, in some situations it defaults to text parts to generate the preview, which is why you may have seen only the footer in the past.

If you still have that message around, can you attach it here so I can look at what is going on in this instance? If it's private, feel free to redact email addresses and content, it's mostly the structure that I'm interested in.
Comment 14 Marvin W 2016-12-21 08:40:07 UTC
Created attachment 342309 [details]
e-mail affected by problem as mentioned in comment #12
Comment 15 Michael Gratton 2016-12-23 00:25:44 UTC
(In reply to Marvin W from comment #14)
> Created attachment 342309 [details]
> e-mail affected by problem as mentioned in comment #12

Thanks for that. So it looks like the issue here is indeed the fact that for generating the preview, Geary prefers the plain text version of the message's content, and we're not doing a very good job of extracting that for these mailman messages (per Bug 776329).

A workaround would be to make Geary prefer the HTML version for generating the preview, but I think that could be sub-optimal for cases where senders are doing the right thing and actually providing a proper plain text version of their HTML content. Because of that, and because fixing Bug 776329 will also fix this issue, I'm thinking I'm going to leave the current preference as-is, unless someone has a strong argument otherwise.
Comment 16 Marvin W 2016-12-23 00:39:57 UTC
I am not an expert in MIME, but as far as I know, there are different types of multiparts: multipart/alternative when the same content is provided with different content types (e.g. text/html and text/plain) and multipart/mixed when different content is provided with different content types. In the multipart/mixed case both should be displayed to the user whereas multipart/alternative only one of the alternatives should be displayed. The same should obviously hold for the preview, which means that a multipart/mixed should concatenate all parts and multipart/alternative should choose the easiest-to-parse alternative (which is text/plain in most cases).

ref: RFC 2046, 5.1.3/5.1.4

By the way: while the preview of the linked mail displayed the footer only, the content display the html part only and skips the footer. This looks to me like multipart/mixed as handled the same as multipart/alternative. However the RFC says that 'Any "multipart" subtypes that an implementation does not recognize must be treated as being of subtype "mixed"', which means that support for multipart/mixed is important.
Comment 17 Michael Gratton 2016-12-23 00:55:12 UTC
(In reply to Marvin W from comment #16)
> I am not an expert in MIME, but as far as I know, there are different types
> of multiparts: multipart/alternative when the same content is provided with
> different content types (e.g. text/html and text/plain) and multipart/mixed
> when different content is provided with different content types. In the
> multipart/mixed case both should be displayed to the user whereas
> multipart/alternative only one of the alternatives should be displayed. The
> same should obviously hold for the preview, which means that a
> multipart/mixed should concatenate all parts and multipart/alternative
> should choose the easiest-to-parse alternative (which is text/plain in most
> cases).
> 
> ref: RFC 2046, 5.1.3/5.1.4
> 
> By the way: while the preview of the linked mail displayed the footer only,
> the content display the html part only and skips the footer. This looks to
> me like multipart/mixed as handled the same as multipart/alternative.
> However the RFC says that 'Any "multipart" subtypes that an implementation
> does not recognize must be treated as being of subtype "mixed"', which means
> that support for multipart/mixed is important.

Absolutely, but all of this are separate bugs (i.e. Bug 769868 & Bug 776329), so let's handle these issues over there.

Given the (incorrect) text that is currently being fed into the preview generator in this case, the preview generator itself is I think currently doing the right thing with it, which is what this bug is about.
Comment 18 Michael Gratton 2016-12-25 00:17:34 UTC
Branch merged to master as commit 1ad0228.