GNOME Bugzilla – Bug 714317
Hide HTML code in preview
Last modified: 2016-12-25 00:17:34 UTC
---- Reported by geary-maint@gnome.bugs 2012-08-30 14:35:00 -0700 ---- Original Redmine bug id: 5743 Original URL: http://redmine.yorba.org/issues/5743 Searchable id: yorba-bug-5743 Original author: Alexander Wilms Original description: When receiving an HTML mail, the preview shows stuff like <br><div style="width: 700px and so on. This should probably be hidden so that only actual content is visible. Related issues: related to geary - 5130: Strip common message armor from previews (Fixed) related to geary - 7263: MIME-Version header visible in preview (Open) related to geary - Bite-sized #4405: previews don't fill message list pane (Invalid) related to geary - 7565: Text attachment used as preview when message has no body (Open) related to geary - 7221: Generate preview from full message text when available (Open) duplicated by geary - 7443: Message body preview doesn't strip all html (Duplicate) ---- Additional Comments From geary-maint@gnome.bugs 2013-09-23 15:44:00 -0700 ---- ### History #### #1 Updated by Adam Dingle about 1 year ago I don't see this behavior. What version of Geary are you running (0.1? git master?) If you're running the latest build, can you attach the source of a message that exhibits this problem? #### #2 Updated by Alexander Wilms about 1 year ago * **File** test.txt added I'm using the latest build from ppa:yorba/daily-builds. #### #3 Updated by Adam Dingle about 1 year ago * **Priority** changed from _Normal_ to _High_ * **Target version** set to _0.2_ Thanks for the additional info! #### #4 Updated by Adam Dingle about 1 year ago * **Tracker** changed from _Feature_ to _Bug_ #### #5 Updated by Adam Dingle about 1 year ago Alexander, This looks like a plain text mail message with HTML inside it. Can you view the same message using another mail client and let us know what it looks like there? #### #6 Updated by Adam Dingle about 1 year ago * **Target version** deleted (<strike>_0.2_</strike>) #### #7 Updated by Alexander Wilms about 1 year ago There is some formatted text with images and links. At the bottom is some non- formatted text, a quote. #### #8 Updated by Jim Nelson 11 months ago * **Category** set to _client_ * **Target version** set to _0.3.0_ #### #9 Updated by Eric Gregory 11 months ago Fixing the preview is easy enough, it would just involve adding a call to our HTML striping function to the preview generator. But I assume the message itself isn't displayed correctly by since the MIME info is incorrect. #### #10 Updated by Jim Nelson 9 months ago * **Target version** changed from _0.3.0_ to _0.4.0_ #### #11 Updated by Robert Schroll 6 months ago As I noted over in #4215, we now have two different code paths for generating previews. The previews in the conversation list are stored in the database, and is ultimately set from some FetchResults. The previews in the conversation view are made from RFC822.Messages each time the email is loaded. (This was changed when we added support for nested emails and wanted to use the same code for dealing with both top-level and nested emails.) These two have their own sets of problems; here's what I've noticed. * Previews from the FetchResults don't always strip out HTML tags; the ones from RFC822.Message do better. * Previews from RFC822.Message may included quoted text; ones from FetchResults are better at only including new text. * For messages with short bodies, the preview from FetchResults can end up with headers from the following Mime part. This doesn't happen with RFC822.Message previews. We may not be able to completely unify these code paths, since it may be useful to get previews without needing the full message downloaded and parsed into a RFC822.Message. But since each is demonstrating problems solved by the other, I suspect there's at least some duplicated code we can excise. #### #12 Updated by Jim Nelson 6 months ago This is a good summary of where we stand with previews. Two things to add: * We could try making the FetchResults preview smarter by fetching the message's MIME structure beforehand and using that to make an informed decision about what portion to fetch for the preview (and how much to get). * The FetchResult previews tend to be fetched only for the message at the head of a conversation, whereas the RFC822.Message previews are used when all of the message (headers and body) has been pulled down. If the RFC822 message is available, including the head of the conversation, we could use that for preview instead. This means that any message's preview could change over time, depending on what's available. The two of these could make our preview story better: smarter pulling of the preview text from the server, and even if it's not so great, it will be updated to an RFC822.Message preview when the full email is available. The client could monitor a signal and update its preview when it's "upgraded". #### #13 Updated by Robert Schroll 5 months ago One other preview issue that I've just noticed: An email with the first mime part being "multipart/signed" with content "This is an S/MIME signed message." gets that text as its conversation list preview (which comes from the FetchResult). The conversation viewer preview (from the RFC822.Message) correctly comes from a "text/plain" part. #### #14 Updated by Jim Nelson 4 months ago * **Category** changed from _client_ to _engine_ #### #15 Updated by Jim Nelson about 1 month ago * **Target version** changed from _0.4.0_ to _0.5.0_ --- Bug imported by chaz@yorba.org 2013-11-21 20:26 UTC --- This bug was previously known as _bug_ 5743 at http://redmine.yorba.org/show_bug.cgi?id=5743 Imported an attachment (id=260945) Unknown milestone "unknown in product geary. Setting to default milestone for this product, "---". Setting qa contact to the default for this product. This bug either had no qa contact or an invalid one. Resolution set on an open status. Dropping resolution
This is still a problem in 0.8.1 (Linux). I see it in an "Other" type account.
Still an issue in 0.11.2-1~trusty1. Attaching the message source an screenshot.
Created attachment 338653 [details] Message Text
Created attachment 338654 [details] HTML Rendered as Preview
WIP branch going up here: wip/714317-hide-html-in-preview Already fixed: - CSS and HTML sometimes showing up in the header - Merged code paths for fetch and conversation message preview generation - Don't try to generate a preview unless the part is text/plain or text/html - Added a bunch of unit tests Still todo: - Fetch body structure so we know what the best part to fetch is, rather than guessing the first - Update the saved preview when full body has been fetched & notify the client - Use the saved preview in the conversation viewer for the primary message's preview (instead of re-generating it every time)
*** Bug 775489 has been marked as a duplicate of this bug. ***
WIP branch has been updated to use the full message body for generating the preview, in all circumstances: The email prefetcher was getting the complete message anyway, but now the conversation list also now fetches the complete message for unread messages, which is rather suboptimal, but necessary since fetching PREVIEW is still rather broken, for two reasons: 1. It's only prefetching a very small number of bytes, so with all of the markup and CSS overhead, little to no preview text can be obtained from HTML parts 2. Since we don't know the structure of the message beforehand, the preview is blindly choosing the first part, which is often a bad choice as it's commonly neither text/plain or text/html. To fix fetching PREVIEW properly, we need to fetch the IMAP BODYSTRUCTURE in advance, and that probably means also storing it in the DB. I've filed Bug 776309 to take care of that in the future, since it's non-trivial. For now, I'm inclined to land this on master and see if people are impacted by the conversation list pulling down complete messages. I'd kind of be surprised though since the email prefetcher would start downloading it almost immediately anyway. Before I do, can people here try building the wip/714317-hide-html-in-preview and running with that for a while and let me know how well it works for you? Note that it won't fix old, broken previews in the DB. That will require a rebuild.
*** Bug 751143 has been marked as a duplicate of this bug. ***
*** Bug 713605 has been marked as a duplicate of this bug. ***
*** Bug 713571 has been marked as a duplicate of this bug. ***
A few other minor comments: - Base64 encoded plain text parts commonly use LF for line delims, and hence quotes, etc aren't being stripped by Geary.RFC822.Utils::to_preview_text(): it expects RFC822 style CRLF line delims. It should probably instead require plain text input normalised to just LF, and fix callers to provide that - There's maybe some further tweaks left for stripping out unwanted text: e.g. the signature delim and everything after for plain parts, any URLs in the first n chars (unless that's the only content), etc. - The branch also contains some fixes along the lines of Bug 713191 (commit deb0c41) but again would need a DB FTS table rebuild to apply it to old messages.
When text/html only e-Mails are passed through a mailman, it uses multipart/mixed to add the text/plain footer as a separate part to the e-mail. Before merging this branch, the preview for such a mail would be only the text/plain footer (which is useless), now it's just empty (not better). I think there should be some work put into correctly handling multipart/mixed here.
I just pushed another fix to the WIP branch that addresses the Base64 issue above, and that also incidentally fixes quotes, armour, etc not being stripped from plain-text-only messages. (In reply to Marvin W from comment #12) > When text/html only e-Mails are passed through a mailman, it uses > multipart/mixed to add the text/plain footer as a separate part to the > e-mail. Before merging this branch, the preview for such a mail would be > only the text/plain footer (which is useless), now it's just empty (not > better). I think there should be some work put into correctly handling > multipart/mixed here. Thanks for pointing this out. There's a few things going on here. Geary does have some issues handling Mailman's multipart/mixed when HTML is involved (I've just filed Bug 776329 to cover that). Also, in some situations it defaults to text parts to generate the preview, which is why you may have seen only the footer in the past. If you still have that message around, can you attach it here so I can look at what is going on in this instance? If it's private, feel free to redact email addresses and content, it's mostly the structure that I'm interested in.
Created attachment 342309 [details] e-mail affected by problem as mentioned in comment #12
(In reply to Marvin W from comment #14) > Created attachment 342309 [details] > e-mail affected by problem as mentioned in comment #12 Thanks for that. So it looks like the issue here is indeed the fact that for generating the preview, Geary prefers the plain text version of the message's content, and we're not doing a very good job of extracting that for these mailman messages (per Bug 776329). A workaround would be to make Geary prefer the HTML version for generating the preview, but I think that could be sub-optimal for cases where senders are doing the right thing and actually providing a proper plain text version of their HTML content. Because of that, and because fixing Bug 776329 will also fix this issue, I'm thinking I'm going to leave the current preference as-is, unless someone has a strong argument otherwise.
I am not an expert in MIME, but as far as I know, there are different types of multiparts: multipart/alternative when the same content is provided with different content types (e.g. text/html and text/plain) and multipart/mixed when different content is provided with different content types. In the multipart/mixed case both should be displayed to the user whereas multipart/alternative only one of the alternatives should be displayed. The same should obviously hold for the preview, which means that a multipart/mixed should concatenate all parts and multipart/alternative should choose the easiest-to-parse alternative (which is text/plain in most cases). ref: RFC 2046, 5.1.3/5.1.4 By the way: while the preview of the linked mail displayed the footer only, the content display the html part only and skips the footer. This looks to me like multipart/mixed as handled the same as multipart/alternative. However the RFC says that 'Any "multipart" subtypes that an implementation does not recognize must be treated as being of subtype "mixed"', which means that support for multipart/mixed is important.
(In reply to Marvin W from comment #16) > I am not an expert in MIME, but as far as I know, there are different types > of multiparts: multipart/alternative when the same content is provided with > different content types (e.g. text/html and text/plain) and multipart/mixed > when different content is provided with different content types. In the > multipart/mixed case both should be displayed to the user whereas > multipart/alternative only one of the alternatives should be displayed. The > same should obviously hold for the preview, which means that a > multipart/mixed should concatenate all parts and multipart/alternative > should choose the easiest-to-parse alternative (which is text/plain in most > cases). > > ref: RFC 2046, 5.1.3/5.1.4 > > By the way: while the preview of the linked mail displayed the footer only, > the content display the html part only and skips the footer. This looks to > me like multipart/mixed as handled the same as multipart/alternative. > However the RFC says that 'Any "multipart" subtypes that an implementation > does not recognize must be treated as being of subtype "mixed"', which means > that support for multipart/mixed is important. Absolutely, but all of this are separate bugs (i.e. Bug 769868 & Bug 776329), so let's handle these issues over there. Given the (incorrect) text that is currently being fed into the preview generator in this case, the preview generator itself is I think currently doing the right thing with it, which is what this bug is about.
Branch merged to master as commit 1ad0228.