GNOME Bugzilla – Bug 163999
Determine the encoding from the HTML headers
Last modified: 2018-05-26 14:02:14 UTC
When opening an HTML document (from a file, not from the web), gedit should see if there's a meta tag that describes what character encoding it is. If found, that encoding should be used instead of auto-detecting. 1. Save the web page at http://www.mechon-mamre.org/i/t/t0101.htm 2. Open it in gedit. 3. You see gibberish western characters instead of hebrew, even though it explicitly says what encoding is used.
Yes, it is a nice idea, even if we will have to manage HTML document (and XML to I think, right?) as special cases. Why do you say "not from the web"? Why is it different? If you run "gedit http://www.mechon-mamre.org/i/t/t0101.htm", you will see the same problem you see opening a local copy of the same HTML page. I think there should be code in Bluefish and/or in screem we could try to use for this operation.
Oops, I hadn't actually tried running gedit with the url. In that case, let's add part two of this bug: Gedit should also obey the Content-Type header when it provides a charset. In this case they do not set the charset header but they do provide a meta tag so we should use that. So the behaviour should be: 1. Look at the HTTP headers. 2. Look at the HTML or XML headers. 3. Guess. References: http://www.w3.org/TR/html4/charset.html#h-5.2.2 http://www.ietf.org/rfc/rfc2616.txt
Still in 3.2
It's unlikely that this feature is ever going to be implemented, so I close the bug.