GNOME Bugzilla – Bug 582678
Evolution could do a better job when displaying characters that are in windows-1252 but not iso-8859-1
Last modified: 2021-05-19 11:09:37 UTC
Please describe the problem: Ubuntu 9.04. evolution 2.26.1-0ubuntu1 I'm using the Exchange connector but I doubt that this affects Evolution's behaviour. I received a multi-part HTML email from another user of the same Exchange server. Despite the HTML part starting with Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable it contained the quoted character "=92" which is the windows-1252 code for a curly quote. This appeared in Exchange as a "character not available in font" box containing "0092". The same problem occurred with a hand-crafted plain text email. Of course this is clearly Exchange or Outlook's fault for claiming iso-8859-1 but then using windows-1252 characters. But having said that, it would seem to be quite easy for Evolution to work around this and show the expected glyph. The characters in that range are considered to be control characters in iso-8859-1 and could therefore be automatically mapped to their corresponding Unicode code points based on the windows-1252 encoding. See http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout . Steps to reproduce: Here's how I reproduced without involving Exchange or Outlook: 1. Enter the following as an appropriate SMTP session (changing addresses as appropriate): mail from: Me <me@here> rcpt to: Me <me@there> data From: Me <me@here> To: Me <me@there> Subject: 1252 test Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable There=92s a strange character here. . 2. View the email in Evolution. Actual results: The character represented by =92 appears as a box containing 0092. Expected results: The character represented by =92 appears as the curly quote character at Unicode code point 2019. Does this happen every time? Yes. Other information: All characters in the range 0x80 to 0x9f behave the same and would benefit from the same workaround. This bug was originally entered at https://bugs.launchpad.net/ubuntu/+source/gtkhtml3.14/+bug/373325
0) Confirming. As suggested in comment #0, the Exchange Connector angle is irrelevant. 1) The wikipedia page contains a link to (a version of) the HTML 5 Draft Recommendation, with this interesting phrase: "When a user agent would otherwise use an encoding given in the first column of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it must instead use the encoding given in the cell in the second column of the same row. When a byte or sequence of bytes is treated differently due to this encoding aliasing, it is said to have been misinterpreted for compatibility. Character encoding overrides Input encoding Replacement encoding References [...] ISO-8859-1 windows-1252 [RFC1345] [WIN1252] [...] Note: The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification, motivated by a desire for compatibility with legacy content." 2) Haven't looked at any code yet. Can't say whether evolution can be said to "convert" as described in that phrase. 3) Anyway, one question that comes up is whether evolution should treat iso-8859-1 as windows-1252: - for all characters; or - for characters in the range 0x80 to 0x9f only.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/Community/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/evolution/-/issues/ Thank you for your understanding and your help.