GNOME Bugzilla – Bug 712904
Header labels are affected by message body
Last modified: 2014-09-10 02:21:17 UTC
---- Reported by geary-maint@gnome.bugs 2012-06-21 15:24:00 -0700 ---- Original Redmine bug id: 5440 Original URL: http://redmine.yorba.org/issues/5440 Searchable id: yorba-bug-5440 Original author: Arturo Torres Sánchez Original description: I received a HTML message from Edublogs, and the “From:”, “Subject:”, “Date:“ labels are shown with a bigger font and misplaced. Related issues: related to geary - 6990: HTML/CSS processing (Open) duplicated by geary - 6896: Elements of <head> in email end up in body of document in... (Fixed) duplicated by geary - 5191: Formatting errors in email header (Duplicate) duplicated by geary - 6054: From/Subject/Date in wrong font in Conversation Viewer (Duplicate) ---- Additional Comments From geary-maint@gnome.bugs 2013-09-04 13:20:00 -0700 ---- ### History #### #1 Updated by Adam Dingle over 1 year ago * **Priority** changed from _Normal_ to _High_ * **Target version** set to _0.2_ #### #2 Updated by Adam Dingle about 1 year ago * **Target version** deleted (<strike>_0.2_</strike>) #### #3 Updated by Jim Nelson 10 months ago * **Category** set to _client_ * **Target version** set to _0.3.0_ #### #4 Updated by Jim Nelson 10 months ago Incidentally, I have messages in my personal Inbox that trigger this bug. #### #5 Updated by Jim Nelson 9 months ago * **Target version** changed from _0.3.0_ to _0.4.0_ #### #6 Updated by Jim Nelson 7 months ago * **Category** changed from _client_ to _conversations_ #### #7 Updated by Jim Nelson 7 months ago See Robert Schroll's note in comment 3 of #6896 for how this ticket is different than that ticket. Both are related to a similar problem. #### #8 Updated by Robert Schroll 6 months ago What we need to do here is take each CSS rule and prepend "#message_1234 .body". If "body" appears in the rule, it should be removed. There is an additional complication of what to do with nested emails, since they don't have a message id. If you'll allow me to wax philosophic: It would be nice to be able to write something like #message_1234 .body { ...existing CSS rules... } and have it correctly interpreted. This could also be useful for our own stylesheet: instead of writing these long conditional rules, we could write body:not(.nohide) { .email.hide { .body {...} .header_container {...} } } Stylesheet preprocessors like [LESS](http://lesscss.org/) and [SASS](http ://sass-lang.com/) do exactly this. These two are are written in Javascript and Ruby, respectively, so we may not want to depend on them. They also include all sorts of features beyond nesting that we don't necessarily need. But I wonder if we're going to do enough work to fix this bug, if we should make a general nested-rule resolver that we can use elsewhere? #### #9 Updated by Jim Nelson 6 months ago Robert Schroll wrote: > But I wonder if we're going to do enough work to fix this bug, if we should make a general nested-rule resolver that we can use elsewhere? I presume you mean a nested-rule resolver we would use at runtime and not at compile-time. Um ... hmm. I mean, I can see the real value in this, but I also know that we're opening up a new frontier here when we start building our own document- manipulation tools. One thing I've been thinking about is all the DOM manipulation we do inside the client (particularly the Conversation Viewer). A lot of the reason is that we're piggybacking on WebKit's DOM tools. Before I was against using WebKit inside of the engine to manipulate HTML. Now I'm wondering if we should offer some manipulation tools, that way the HTML mail can be "prepped" for the client. It strikes me that some of what you're talking about here falls into that rubric, and perhaps that's how we should be proceeding -- a new module inside the engine for manipulating HTML mail in DOM-ish ways to prepare it for examination and display by whatever client wants to use it. Right now so much of the prepwork seems ad hoc. We do have dreams of other clients using the Geary engine, and it would be great if all that code was reusable. Does that make sense? Curious if you agree. #### #10 Updated by Robert Schroll 6 months ago Jim Nelson wrote: > I presume you mean a nested-rule resolver we would use at runtime and not at compile-time. Yes, so it could be used for solving this particular bug. That said, we should keep in mind that the CSS manipulation to solve this would be significantly easier than a general nested-rule resolver, so it may be better to just solve this bug at run-time and use a CSS preprocessor to do the static CSS at compile time (if desired). > Does that make sense? Curious if you agree. It does strike me as rather silly that our HTML-to-plain-text converter lives in the composer itself, just because that's where WebKit is. So in the abstract, a separate set of DOM and CSS tools sounds good. I wonder about the implementation, though. It doesn't seem that WebKit lets you make a DOM document separately from a WebView. Would this library have its own WebView for doing all these manipulations? Does that even work if it's never shown? #### #11 Updated by Jim Nelson 6 months ago Robert Schroll wrote: > That said, we should keep in mind that the CSS manipulation to solve this would be significantly easier than a general nested-rule resolver, so it may be better to just solve this bug at run-time and use a CSS preprocessor to do the static CSS at compile time (if desired). I have concerns about using a preprocessor even at compile time, as that means future developers have to learn the preprocessor's syntax. Neither tool's syntax looks all that bad, I suppose it wouldn't be such a burden. > Would this library have its own WebView for doing all these manipulations? Does that even work if it's never shown? The reason my thoughts returned to this problem is that Charles' recent work on HTML normalization (#6837) uses libxml2 to parse the HTML and build a simple DOM. libxml2 is not the prettiest thing, but we've used it in Shotwell and it does work as advertised. I also recall that the library can be compiled with XPath, which I think would ease some of our DOM traversal woes. My thought is that, just as the RFC822 module abstracts out GMime, we could have a DOM module that abstracts out libxml2. We definitely want to wrap it, incidentally, as libxml2's memory management is error-prone. How well this would work with our current approach I don't know. But it something I'm interested in exploring. That plus some CSS manipulation routines might alleviate both issues you mention here. #### #12 Updated by Robert Schroll 6 months ago Jim Nelson wrote: > I have concerns about using a preprocessor even at compile time, as that means future developers have to learn the preprocessor's syntax. Neither tool's syntax looks all that bad, I suppose it wouldn't be such a burden. A nice thing about both LESS and SASS (in the SCSS syntax) is that they are supersets of CSS. We can just start with CSS and only use the additional features when they help. We could also purposefully restrict ourselves to a subset of their syntax. I'd be more worried about simply adding another dependency, especially one outside of the GObject ecosystem. (This could be mitigated by including the resultant CSS files in the git repository.) This is why I find building our own sort of tempting, especially if we have to build half of it for this bug. #### #13 Updated by Jim Nelson 6 months ago Robert Schroll wrote: > I'd be more worried about simply adding another dependency, especially one outside of the GObject ecosystem. (This could be mitigated by including the resultant CSS files in the git repository.) This is why I find building our own sort of tempting, especially if we have to build half of it for this bug. You probably understand when I say that I prefer not to include auto-generated files into source code control. There are always exceptions to these kinds of rules, though. The dependency thing is ok if these tools are widely distributed in the Linux world. If they're packaged for Debian and RPM, then that's probably good enough for us. I would be more inclined to handroll our own CSS preprocessor if there was some stable tool out there that could parse CSS. I'm sure it's not difficult to write our own, but said parser will become something we have to maintain. I've written HTML parsers in the past and the problem is not in the HTML but in writing a _forgiving_ HTML parser. Since this parser has to deal with CSS "in the wild", it faces a similar problem. (If it was only parsing our own CSS, I would have less issue here.) Still, I do see the benefits of an HTML/CSS unit in Geary. I think it's worthwhile to start considering what shape it would take and what features it would offer. #### #14 Updated by Jim Nelson 6 months ago I've opened a ticket for this task: #6990. Let's move further discussion there. #### #15 Updated by Robert Schroll 6 months ago I've posted code for a CSS de-nester in #6990. This would have the capability of solving this bug, among other things. If we don't want the full de-nester for other purposes, we can rip out the fancier parts and use the remainder here. #### #16 Updated by Jim Nelson 3 months ago * **Target version** changed from _0.4.0_ to _0.5.0_ --- Bug imported by chaz@yorba.org 2013-11-21 20:17 UTC --- This bug was previously known as _bug_ 5440 at http://redmine.yorba.org/show_bug.cgi?id=5440 Imported an attachment (id=260515) Imported an attachment (id=260516) Unknown milestone "unknown in product geary. Setting to default milestone for this product, "---". Setting qa contact to the default for this product. This bug either had no qa contact or an invalid one. Resolution set on an open status. Dropping resolution
*** Bug 726477 has been marked as a duplicate of this bug. ***
Bug #726477 I believe shows off another aspect of this same bug: styles aren't properly limited in scope.
You’re right, that’s the same issue. Just to know, why the scoping solution wasn’t adopted ? Seems doable with a simple regexp.
Both the shadow DOM and <style scoped> would be rather easy solutions here, but I can't find any evidence that either is available through WebKitGTK. The last comment on this bug (https://bugs.webkit.org/show_bug.cgi?id=97655) suggests that scoped styles aren't currently implemented. My testing suggests that createShadowRoot() is unavailable by default; I didn't see any likely looking switches to enable it. If and when WebkitGTK gives us these, this would be a rather easy fix. Until then, our options are manipulating or disabling any <style> elements in emails.
Created attachment 272471 [details] [review] patch I made a fix using a regex-based sanitize_css function to manually scope a CSS stylesheet. Seems pretty resilent, since it works with weird stylsheets like for[class="\{\\\""], bar[class=\\\{]:after{content:'\}\\\';}
Thanks, we'll review soon.
Robert, this patch fell off my radar screen. Can you review? If it passes muster, please commit.
There are a number of issues here, none of them fatal. I'll start with the largest: I recently learned there's a W3C interface to style sheets, which lets you go through the rules programmatically. (See https://developer.mozilla.org/en-US/docs/Web/API/CSSStyleSheet and subsequent pages.) I don't know if WebKitGTK implements these or not, but if it does, it would allow us to manipulate style sheets without needing our own parser. This would be a great help in avoiding various edge cases. I'll look into this further and let you know what I find. This code does not handle @media declarations. That's not so bad in and of itself, but the parser gets confused enough that additional rules can sneak through un-prefixed. For example, these rules @media (screen) { * { color: blue; } } div { background: yellow; } get turned into html>body>#message_container>div[id="message_[34490/32745]"]>.email_container>.body @media (screen){ * { color: blue; } html>body>#message_container>div[id="message_[34490/32745]"]>.email_container>.body } div{ background: yellow; } and everything is bright yellow. I guess the regexes aren't matching brackets? It'd be nice if body (and html) rules got turned into just "<prefix>" rules, instead of "<prefix> body", so they still apply to the message body. A question: Why did the web_view.container.insert_before() call get moved? There are a few minor code style issues: trailing white spaces and no spaces between keywords and open parens. All that said, we'd be no worse off with this, and in most cases it'd work just fine. We could commit it and try to tidy things up later. My instinct, though, is to see if this new CSS interface will work for us first, so I haven't committed it.
Created attachment 285106 [details] [review] Scope style rules in emails The WebKitGTK bindings contain enough to let us read the rules, but not alter them. We can make do with that, by re-writing the whole <style> element. This also means that we can't descend into @media rules and the like, so we skip them completely. Some types of rules, like @font-face, don't make sense to scope, so we let them through un-molested.
> A question: Why did the web_view.container.insert_before() call get moved? I have no idea. I assume you mean in the WebKit API? > There are a few minor code style issues: trailing white spaces and no spaces > between keywords and open parens. Yorba's policy is to clean up minor code guideline problems and commit with the submitted credited as author (i.e. git commit --author "name <email@example.com>") Regarding your patch, do you have a ready test case for this issue? Is your patch to be added on top of Simon's?
(In reply to comment #10) > > A question: Why did the web_view.container.insert_before() call get moved? > > I have no idea. I assume you mean in the WebKit API? Sorry, I meant in Simon's patch. It was more curiosity on my part, but DOM operations have been rather sensitive to order in the past. > Regarding your patch, do you have a ready test case for this issue? Is your > patch to be added on top of Simon's? My patch would be in place of Simon's. I think using the CSS object model is the way to go. But the missing API means that I've taken it as far as it can go. If we want to handle @media rules, for example, we're going to need to do our own parsing, in which case Simon's approach may be the way to go. (I don't really see a need for this, though.)
Created attachment 285213 [details] Email body You can use this as an email body as a test case. Ideally, the email would have a red background with blue text in a serif font. My patch gets all of this except the blue (which is set in a @media rule). Simon's patch only gets the font-family and incorrectly gives everything a yellow background.
I agree, using the CSS API looks to me a more contained approach. I have one code request: please add a comment to scope_style_sheet() that explains what's going on there, so the next person isn't baffled. Commit!
Comment on attachment 285106 [details] [review] Scope style rules in emails Attachment 285106 [details] pushed as 6454860 - Scope style rules in emails
Now that this is in place, we may want to revisit bug #714452. If we allowed entire HTML document in, style sheets would no longer be a problem. This is related to bug #713547 and bug #730680.