After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 162613 - UTF-8 BOM not recognised with push parser
UTF-8 BOM not recognised with push parser
Status: RESOLVED FIXED
Product: libxml
Classification: Deprecated
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
Daniel Veillard
Depends on:
Blocks:
 
 
Reported: 2004-12-31 12:54 UTC by jorton
Modified: 2005-02-11 14:36 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description jorton 2004-12-31 12:54:36 UTC
When using a push parser, a document which begins with the UTF-8 BOM cannot be
parsed, getting the "Document is empty" error.

ctxt->charset seems to get initialized to XML_CHAR_ENCODING_UTF8 by the call to
xmlCreatePushParserCtxt, when no initial chunk is provided.

The places where the encoding is auto-detected i.e. xmlParseTryOrFinish is hence
never reached:

            case XML_PARSER_START:
		if (ctxt->charset == XML_CHAR_ENCODING_NONE) {
...
Comment 1 jorton 2004-12-31 12:55:27 UTC
using 2.6.16, FC3 package libxml2-2.6.16-3
Comment 2 Daniel Veillard 2004-12-31 13:08:22 UTC
Please provide an example, I assume xmllint --push --noout fails while
xmllint --noout on the same instance suceeds.

Daniel
Comment 3 jorton 2005-01-04 16:24:06 UTC
--push doesn't trigger it because it *does* pass an initial chunk to the
CreatePushParserCtx call; this bug only occurs whan an initial chunk is not
provided.  Any document with a UTF-8 BOM is an example.

(printf '\xEF\xBB\xBF'; cat anyxml.xml) > utf8-bom.xml
Comment 4 William M. Brack 2005-02-11 14:36:02 UTC
I changed xmlCreatePushParserCtxt so that, if no initial chunk is given,
ctxt->charset is set to XML_CHAR_ENCODING_NONE (instead of the previous
XML_CHAR_ENCODING_UTF8, automatically provided by xmlNewParserCtxt).  After this
change, an additional change to xmlParseTryOrFinish was required to properly
take care of this case.  The changed code (parser.c) is in CVS.

Thanks for the report.

Bill