After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 163999 - Determine the encoding from the HTML headers
Determine the encoding from the HTML headers
Status: RESOLVED WONTFIX
Product: gtksourceview
Classification: Platform
Component: File loading and saving
unspecified
Other All
: Normal enhancement
: ---
Assigned To: GTK Sourceview maintainers
GTK Sourceview maintainers
Depends on:
Blocks:
 
 
Reported: 2005-01-13 21:49 UTC by Josh Lee
Modified: 2018-05-26 14:02 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement



Description Josh Lee 2005-01-13 21:49:54 UTC
When opening an HTML document (from a file, not from the web), gedit should see
if there's a meta tag that describes what character encoding it is. If found,
that encoding should be used instead of auto-detecting.

1. Save the web page at http://www.mechon-mamre.org/i/t/t0101.htm
2. Open it in gedit.
3. You see gibberish western characters instead of hebrew, even though it
explicitly says what encoding is used.
Comment 1 Paolo Maggi 2005-01-14 09:57:08 UTC
Yes, it is a nice idea, even if we will have to manage HTML document (and XML to
I think, right?) as special cases.
Why do you say "not from the web"? Why is it different?

If you run "gedit http://www.mechon-mamre.org/i/t/t0101.htm", you will see the
same problem you see opening a local copy of the same HTML page.

I think there should be code in Bluefish and/or in screem we could try to use
for this operation.
Comment 2 Josh Lee 2005-01-14 14:32:17 UTC
Oops, I hadn't actually tried running gedit with the url.
In that case, let's add part two of this bug: Gedit should also obey the
Content-Type header when it provides a charset. In this case they do not set the
charset header but they do provide a meta tag so we should use that.

So the behaviour should be:
1. Look at the HTTP headers.
2. Look at the HTML or XML headers.
3. Guess.

References:
http://www.w3.org/TR/html4/charset.html#h-5.2.2
http://www.ietf.org/rfc/rfc2616.txt
Comment 3 André Klapper 2012-08-01 12:16:41 UTC
Still in 3.2
Comment 4 Sébastien Wilmet 2018-05-26 14:02:14 UTC
It's unlikely that this feature is ever going to be implemented, so I close the bug.