After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 610797 - Improve UTF-8 text sanitizing
Improve UTF-8 text sanitizing
Status: RESOLVED FIXED
Product: GtkHtml
Classification: Other
Component: Parsing
3.30.x
Other All
: Normal enhancement
: ---
Assigned To: gtkhtml-maintainers
gtkhtml-maintainers
Depends on: 610969
Blocks:
 
 
Reported: 2010-02-23 11:29 UTC by Claudio Saavedra
Modified: 2010-08-20 22:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
0001-Use-_g_utf8_make_valid-for-better-sanitizing.patch (8.02 KB, patch)
2010-02-23 11:29 UTC, Claudio Saavedra
needs-work Details | Review
gtkhtml patch (6.82 KB, patch)
2010-07-14 19:37 UTC, Milan Crha
committed Details | Review

Description Claudio Saavedra 2010-02-23 11:29:42 UTC
Created attachment 154487 [details] [review]
0001-Use-_g_utf8_make_valid-for-better-sanitizing.patch

We've added this patch to maemo GtkHTML, to improve recovery in modest with emails with broken or unknown encoding. The patch uses a copy of glib's _g_utf8_make_valid(), but we'll be proposing this method to be made public briefly.
Comment 1 Milan Crha 2010-04-27 14:04:39 UTC
Thanks for a bug report. Evolution related packages are depending on glib 2.24, where I see in gunicode.h also _g_utf8_make_valid, thus I think you can use it already, rather than redefining. Also, please do not create functions in GtkHTML with glib prefix, as it may create issues later (just rename that _g_utf8_len_and_size). Otherwise should work, I guess, though I didn't compile yes, as I expect an updated patch. Thanks in advance.
Comment 2 Paul Smith 2010-06-20 21:55:02 UTC
This patch fixed my problems and allows me to read my email, with only a few "<?>" glyphs instead of entire paragraphs elided.  Very nice!

Can we please get this fix into gtkhtml, if the updated patch is not forthcoming?  It's really critical as without it entire email messages are simply unreadable.

thanks!
Comment 3 Paul Smith 2010-06-20 22:00:57 UTC
Oh ah... oops... OK.  Well, this fix makes the viewing of my emails very nice, but it COMPLETELY bollixes up the Evolution composer.  Every character I type is printed with a few extra bizarre graphical characters afterwards.  A complete mess.  I'll have to back this out.

But, can someone PLEASE figure out how to fix this problem in a way that allows me to read my email AND write it in a coherent way?

Thanks very much!
Comment 4 Paul Smith 2010-06-20 22:01:58 UTC
FYI, I'm building the latest Evolution 2.30 (evolution-data-server, evolution, etc.) from the git gnome-2.30 branch locally, and ditto with gtkhtml, libsoup.
Comment 5 Milan Crha 2010-07-12 17:11:24 UTC
I can confirm Paul's finding on actual master with a patch. Any letter writing adds some garbage after the new letter.
Comment 6 Milan Crha 2010-07-12 17:18:29 UTC
Paul, if I got it right from the discussion on a mail list [1], you have an HTML email downloaded from IMAP, whose HTML part doesn't have charset set on a Content-Type header, or this is different from the one used in the HTML content itself, right? Could you paste here that part's Content-* headers and the META tag from the HTML, just for a testing purpose? (Hmm, thinking of it, shouldn't we open a new bug report? I'm not sure now.)

[1] http://mail.gnome.org/archives/evolution-hackers/2010-April/msg00009.html
Comment 7 David Woodhouse 2010-07-13 10:45:01 UTC
I filed bug 624234 (with an example) to demonstrate the problem which occurs when the MIME Content-Type: header doesn't specify the (correct) charset, and we should be taking it from the <META HTTP-EQUIV> tag.
Comment 8 Milan Crha 2010-07-14 19:37:17 UTC
Created attachment 165908 [details] [review]
gtkhtml patch

for gtkhtml;

Update patch from the previous one. I realized that my request to use _g_utf8_make_valid instead of creating its copy in GtkHTML was incorrect, because it expects NUL-terminated string, which we do not use always. Thus this change. It works for both showing a message and a composer, so I'm committing to master.
Comment 9 Milan Crha 2010-07-14 19:38:28 UTC
Created commit 17e5d90 in gtkhtml master (3.31.6+)
Comment 10 Paul Smith 2010-08-20 22:16:52 UTC
Argh.  I've not been getting email from bugzilla for months now and all my requests to the bugzilla devs to look into it have gone for naught.

I'll try to get this built and tested out, but it would be good if this patch could be backported to the current stable stream since I'm not using the latest stuff yet.
Comment 11 David Woodhouse 2010-08-20 22:41:44 UTC
Paul, the problem with receiving mail is almost certainly a known misconfiguration of GNOME bugzilla. Reported as bug 621198 two months ago, but unfortunately still not fixed.