After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 692915 - libxml2 v2.9.0 emits spurious errors when dealing with large CDATA sections in UTF-16 input
libxml2 v2.9.0 emits spurious errors when dealing with large CDATA sections i...
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other Mac OS
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2013-01-31 03:05 UTC by Mark Rowe
Modified: 2013-02-15 22:13 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test case (283.91 KB, text/plain)
2013-01-31 03:05 UTC, Mark Rowe
Details

Description Mark Rowe 2013-01-31 03:05:18 UTC
Created attachment 234884 [details]
Test case

I've been testing WebKit against libxml2 v2.9.0 recently and have discovered a regression in handling of UTF-16 input. When xmlParseChunk is passed a large block of data there appear to be some conditions in which libxml2 will only process a portion of the input, and will raise an error due to concluding that the input ended prematurely. When this happens there are bytes in the raw input buffer that have yet to be decoded.

The attached program can be compiled using a command-line like:
cc -g -lxml2 -I/usr/include/libxml2 -o libxml2-decoding-bug libxml2-decoding-bug.c

It can be run without arguments.

The incorrect output looks like so:
> Character count: 271751
> startElement: root
> CDATA block of length 300
> xmlParseChunk result: 0
> CDATA block of length 300
> Extra content at the end of the document
> xmlParseChunk result: 5

The expected output is:
> Character count: 271751
> startElement: root
> CDATA block of length 271700
> xmlParseChunk result: 0
> xmlParseChunk result: 0

The bug appears to have been introduced in the range 65c7d3b..a78d803, which was the implementation of the new input buffers.