GNOME Bugzilla – Bug 200921
charset override command for MailDisplay
Last modified: 2013-09-10 13:59:18 UTC
Apparently some mailers send text with explicit, but incorrect, charset encodings. http://lists.helixcode.com/archives/public/evolution-hackers/2000-November/001060.html Also, we currently have no way to correctly display text sent with no charset encoding where the real encoding is not iso-8859-1. So, we want some sort of command to try an alternate encoding. Emacs appears to have some ability to automagically detect the encoding of a buffer, although it's not 100% reliable (it can't distinguish iso-8859-1 from iso-8859-2 for instance). We might want to look into that.
Ok, I see 3 major scenarios with this: 1. the charset specified exists, we convert it to utf8 inside the message. When the user asks for a different charset, we convert it from utf8 back to the charset specified in the message, then we convert from that, saying its the user's charset, back to utf8, for display. 2. the charset specified doesn't exist or we dont know about it. In this case we'll have the raw charset stored in the message, we just convert that to utf8 using the charset the user asks for. 3. the charset specified exists but the content contains characters outside of that charset. In this case the conversion to utf8 may be incomplete, so conversion back may also be incomplete. Not sure what we do about this case. For conversion we can just use a stream-filter to write to, with the right filters attached, as we write the stream to memory for processing in the display code. So its probably not all that hard to handle, and maybe add a charset parameter/member to the messagedisplay api, or something like that. Its probably not worth adding the automatic charset detector, although I guess it could be used to prompt the user.
iconv() tells you when it either can't translate, or badly translates characters. So the code can detect when case #3 is happening and bail out and fall back to case #2. For #1, there's the possibility that foo -> UTF8 -> foo isn't an identity, because there could be multiple ways of representing the same characters (by decomposing accents or whatever). This might be enough of an edge case that we don't care though.
*** bug 206448 has been marked as a duplicate of this bug. ***
*** bug 209759 has been marked as a duplicate of this bug. ***
I've just implemented this in CVS - if the data wrapper's contents are in raw form, then it assumes the charset is the user's preferred charset encoding and so uses that when converting to UTF8 before sending it off to GtkHTML.
No, the bug is not fixed. In raw form, all non-ascii is displayed as ??? (if message is raw 8bit) or just as QP or base64!!! Anyway, if wasn't - it's VERY BACKWARDS to tweak 'prefered encoding for sending mail' to affect display the current message! And what about mails that contain attachments - they will be also visible (in raw form) in raw view!!! And think about html mails - user will see machine-generated HTML code instead of message text! Menu item 'Message charset' in 'View' menu is needed, whoose subitems are radiomenuitems - charset names known to Evo.
the bug *is* fixed. This fix is not supposed to affect viewing in raw form. Raw form is exactly that - RAW. This means it does NO charset conversions, no base64 decoding, no QP decoding...NOTHING.
I understand that the fix didn't do anything about fixing raw view. As I understand your comment, upon receipt of the message with incorrectly specified charset, user has to switch to raw message display, then go to options and select another charet, right? If yes, it's completely broken at best. Add to that that raw view shows non-ascii as ??? and that it doesn't decode content transfer encoding like base64 or QP (not decoding is OK for raw view, I agree). This means that user won't be able to read anything at all. And what about replying? Reopened the bug.
you got it totally wrong. a data-wrapper in raw form is NOT anything that the user sees. That is the backend representation of the data.
OK, please explain how the user should override charset of the message then - which menuitem? Are any means for this available in 0.15.99? I have Oct 4 snapshot and I didn't find any mean for this.
Haven't recieved any answer to my question. Reopening.
Now actually reopening..
*** bug 208474 has been marked as a duplicate of this bug. ***
View -> Character Encoding -> *** charset ***
Reopening. It seem not to work at all. When one displays mail say with russian and selects different russian or even latin-1 charset from View->Encoding NOTHING CHANGES. The text of the letter should change regardless of whether the charset is specified in the mail headers and regardless of content transfer encoding. Tested with Evo-0.16 from current ximian gnome (not a snapshot).
I discovered a bug recently ( a week or 2 ago?) so it may not have made it into the 0.16 release (yaneti was the one who let me know of the bug). anyways, you still can't override a message's charset if it was transformed to UTF-8 without problems - and I don't see why you'd want to anyway? If it was validly transformed to UTF-8 given the charset that the message claimed it to be in, then changing it to another charset is more likely to make it render badly than it is to render correctly.
You are totally wrong - one needs ability to override charset of the message regardless of whether it was correctly transformed to utf8 or not. For example, when one writes a mail through a broken www iterface, e.g. usa.net, mail gets charset marked iso8859-1. So russian is displayed as a mess of latin1 characters most of them with umlauts. The properly implemented recoding would allow to read such mail, and mature MUAs like Mozilla and Outlook Express have this functionality implemented and working correctly. Reopening..
fine fine, implemented. for future reference, can you hit return when you're typeing in bugzilla? otherwise the text scrolls forever to the right and I have to manually scroll to read what you wrote which is a royal pain for me.
Thank you very much for fixing it (though I didn't check it yet)!! I'm very sorry for bad formatting of my comments - I'm using lynx and they look pretty reasonable in lynx. I will format my comments propery next time.
much appreciated :-)