Bug 163999 – Determine the encoding from the HTML headers

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 163999 - Determine the encoding from the HTML headers


Summary:	Determine the encoding from the HTML headers


Status:	RESOLVED WONTFIX

Product:	gtksourceview
Classification:	Platform
Component:	File loading and saving
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	GTK Sourceview maintainers
QA Contact:	GTK Sourceview maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2005-01-13 21:49 UTC by Josh Lee
Modified:	2018-05-26 14:02 UTC

See Also:
GNOME target:	---
GNOME version:	Unversioned Enhancement

Description Josh Lee 2005-01-13 21:49:54 UTC

When opening an HTML document (from a file, not from the web), gedit should see
if there's a meta tag that describes what character encoding it is. If found,
that encoding should be used instead of auto-detecting.

1. Save the web page at http://www.mechon-mamre.org/i/t/t0101.htm
2. Open it in gedit.
3. You see gibberish western characters instead of hebrew, even though it
explicitly says what encoding is used.

Comment 1 Paolo Maggi 2005-01-14 09:57:08 UTC

Yes, it is a nice idea, even if we will have to manage HTML document (and XML to
I think, right?) as special cases.
Why do you say "not from the web"? Why is it different?

If you run "gedit http://www.mechon-mamre.org/i/t/t0101.htm", you will see the
same problem you see opening a local copy of the same HTML page.

I think there should be code in Bluefish and/or in screem we could try to use
for this operation.

Comment 2 Josh Lee 2005-01-14 14:32:17 UTC

Oops, I hadn't actually tried running gedit with the url.
In that case, let's add part two of this bug: Gedit should also obey the
Content-Type header when it provides a charset. In this case they do not set the
charset header but they do provide a meta tag so we should use that.

So the behaviour should be:
1. Look at the HTTP headers.
2. Look at the HTML or XML headers.
3. Guess.

References:
http://www.w3.org/TR/html4/charset.html#h-5.2.2
http://www.ietf.org/rfc/rfc2616.txt

Comment 3 André Klapper 2012-08-01 12:16:41 UTC

Still in 3.2

Comment 4 Sébastien Wilmet 2018-05-26 14:02:14 UTC

It's unlikely that this feature is ever going to be implemented, so I close the bug.