GNOME Bugzilla – Bug 610797
Improve UTF-8 text sanitizing
Last modified: 2010-08-20 22:41:44 UTC
Created attachment 154487 [details] [review] 0001-Use-_g_utf8_make_valid-for-better-sanitizing.patch We've added this patch to maemo GtkHTML, to improve recovery in modest with emails with broken or unknown encoding. The patch uses a copy of glib's _g_utf8_make_valid(), but we'll be proposing this method to be made public briefly.
Thanks for a bug report. Evolution related packages are depending on glib 2.24, where I see in gunicode.h also _g_utf8_make_valid, thus I think you can use it already, rather than redefining. Also, please do not create functions in GtkHTML with glib prefix, as it may create issues later (just rename that _g_utf8_len_and_size). Otherwise should work, I guess, though I didn't compile yes, as I expect an updated patch. Thanks in advance.
This patch fixed my problems and allows me to read my email, with only a few "<?>" glyphs instead of entire paragraphs elided. Very nice! Can we please get this fix into gtkhtml, if the updated patch is not forthcoming? It's really critical as without it entire email messages are simply unreadable. thanks!
Oh ah... oops... OK. Well, this fix makes the viewing of my emails very nice, but it COMPLETELY bollixes up the Evolution composer. Every character I type is printed with a few extra bizarre graphical characters afterwards. A complete mess. I'll have to back this out. But, can someone PLEASE figure out how to fix this problem in a way that allows me to read my email AND write it in a coherent way? Thanks very much!
FYI, I'm building the latest Evolution 2.30 (evolution-data-server, evolution, etc.) from the git gnome-2.30 branch locally, and ditto with gtkhtml, libsoup.
I can confirm Paul's finding on actual master with a patch. Any letter writing adds some garbage after the new letter.
Paul, if I got it right from the discussion on a mail list [1], you have an HTML email downloaded from IMAP, whose HTML part doesn't have charset set on a Content-Type header, or this is different from the one used in the HTML content itself, right? Could you paste here that part's Content-* headers and the META tag from the HTML, just for a testing purpose? (Hmm, thinking of it, shouldn't we open a new bug report? I'm not sure now.) [1] http://mail.gnome.org/archives/evolution-hackers/2010-April/msg00009.html
I filed bug 624234 (with an example) to demonstrate the problem which occurs when the MIME Content-Type: header doesn't specify the (correct) charset, and we should be taking it from the <META HTTP-EQUIV> tag.
Created attachment 165908 [details] [review] gtkhtml patch for gtkhtml; Update patch from the previous one. I realized that my request to use _g_utf8_make_valid instead of creating its copy in GtkHTML was incorrect, because it expects NUL-terminated string, which we do not use always. Thus this change. It works for both showing a message and a composer, so I'm committing to master.
Created commit 17e5d90 in gtkhtml master (3.31.6+)
Argh. I've not been getting email from bugzilla for months now and all my requests to the bugzilla devs to look into it have gone for naught. I'll try to get this built and tested out, but it would be good if this patch could be backported to the current stable stream since I'm not using the latest stuff yet.
Paul, the problem with receiving mail is almost certainly a known misconfiguration of GNOME bugzilla. Reported as bug 621198 two months ago, but unfortunately still not fixed.