After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 395050 - autodetection of the CSS file's encoding fails
autodetection of the CSS file's encoding fails
Status: RESOLVED WONTFIX
Product: libcroco
Classification: Core
Component: General
unspecified
Other All
: Normal normal
: ---
Assigned To: libcroco maintainers
libcroco maintainers
gnome[unmaintained]
Depends on:
Blocks:
 
 
Reported: 2007-01-10 16:05 UTC by Dominic Lachowicz
Modified: 2020-08-11 15:46 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dominic Lachowicz 2007-01-10 16:05:09 UTC
From http://mail.gnome.org/archives/libcroco-list/2006-November/msg00001.html:

I was trying to use

    cr_om_parser_simply_parse_file ((const guchar *) css_filename, CR_AUTO,
                                    &css_file_contents)

(hoping for autodetection of the CSS file's encoding), but it always returns
an error code. Reason is that
    cr_om_parser_simply_parse_file calls
    cr_parser_parse_file calls
    cr_tknzr_new_from_uri calls
    cr_input_new_from_uri calls
    cr_input_new_from_buf calls
    cr_enc_handler_get_instance
which doesn't know about the encoding!
Comment 1 Benjamin Dauvergne 2007-01-10 16:45:56 UTC
I think that we shoulnt't try to implement charset detection inside libcroco. 
To handle @charset rule is Ok, it's not really detection, but it's all.

Even mozilla has difficulties recognizing a latin1 from an utf8 page sometimes (or maybe it was the content-encoding header from the web server that was wrong I dont know).

I'm against the CR_AUTO flag and advocate some CR_DEFAULT, formally equivalent to CR_UTF_8.
Comment 2 Dominic Lachowicz 2007-01-10 16:51:07 UTC
I'm not sure that libcroco should have to deal with encodings *at all*. Libcroco should assume that its input comes in some well-defined encoding (most likely, UTF-8), and put the burden of determining the file's encoding to a higher level, which might have some knowledge of the document's encoding. 

For instance, it is possible that the css snippet is inside of a UTF-8 encoded XML document, or that a HTTP header says that it is iso-8859-1. The responsibility should be on the invoking application to convert the CSS into libcroco's expected encoding.
Comment 3 Benjamin Dauvergne 2007-01-10 17:08:59 UTC
And what do you do of the @charset command inside the stylesheet ?
Are you asking the UA to parse a little bit of CSS to handle this ?
Comment 4 Dominic Lachowicz 2007-01-10 17:20:19 UTC
That's a good point. However, (if we were to follow my suggestion), we might be able to get away with pushing the responsibility to the user agent:

http://www.w3.org/International/questions/qa-css-charset

"Only one @charset rule may appear in an external style sheet and it must appear at the very start of the document. It must not be preceded by any characters, not even comments [other than byte-order markers]."

If I understand the specification correctly, the CSS 2.1 spec seems to delegate almost *all* of the responsibility of determining the character encoding of a CSS snippet to the user-agent. Of their list of 5 priorities, libcroco simply cannot know about #1, #3, or #4. #5 is a fall-back if nothing else is known. So that just leaves #2 as nebulous. Furthermore, "User agents must ignore style sheets in unknown encodings."

http://www.w3.org/TR/CSS21/syndata.html#q23
Comment 5 André Klapper 2020-08-11 15:46:34 UTC
libcroco is not under development anymore. Its codebase has been archived.

Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect
reality. Please feel free to reopen this ticket (or rather transfer the project
to GNOME Gitlab, as GNOME Bugzilla is being shut down) if anyone takes the
responsibility for active development again.