After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 633166 - XInclude large text: invalid character
XInclude large text: invalid character
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal major
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2010-10-26 10:04 UTC by pu.medvidek
Modified: 2012-08-17 15:03 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
main.xml and xincluded text.txt that demonstrate the bug (1.63 KB, application/x-gzip)
2010-10-26 10:04 UTC, pu.medvidek
  Details
Testcase for xinclude parse text (626 bytes, application/zip)
2012-05-18 11:21 UTC, Vitaly Ostanin
  Details
Patch (1.15 KB, patch)
2012-05-20 11:49 UTC, Vitaly Ostanin
none Details | Review
Actual patch (1.19 KB, patch)
2012-05-26 18:45 UTC, Vitaly Ostanin
none Details | Review

Description pu.medvidek 2010-10-26 10:04:58 UTC
Created attachment 173245 [details]
main.xml and xincluded text.txt that demonstrate the bug

The bug appears when xincluding a special text file with <xi:include href="text.txt" parse="text"/>. I've been trying hard to make the text file as simple as possible. The simplest included file is attached. It includes only 
- 'x' (former alfanumeric and special characters)
- 's' characters (former whitespace)
- newline characters
- accented characters in utf-8 encoding (ěščřžý...)

The bug appears even if including the file with <xi:include href="text.txt" parse="text" encoding="utf-8"/>.

I find the bug mysterious, because the text file is correctly included when text.txt is modified in ANY of the following ways:
- the very first character is removed
- the first line is removed (it contains no special characters)
- the first special character is removed
- the first two special characters are removed
- the first line is moved to the end of file (including the newline char)

It seems that the sizes of the included file matters...

I'm testing xinclusion with:

  xmllint --xinclude main.xml

I'm using libxml2 v. 2.7.7
Comment 1 Vitaly Ostanin 2012-05-18 11:21:53 UTC
Created attachment 214302 [details]
Testcase for xinclude parse text
Comment 2 Vitaly Ostanin 2012-05-18 11:22:40 UTC
I confirm the bug is reproduced.

$ xmllint --xinclude test.xml
test.xml:5: element include: XInclude error : test.txt contains invalid char
test.xml:5: element include: XInclude error : could not load test.txt, and no fallback was found
<?xml version="1.0" encoding="utf-8"?>
<test xmlns:xi="http://www.w3.org/2001/XInclude">

        <xi:include href="test.txt" parse="text"/>
</test>

Testcase attached.

xmllint: using libxml version 20708
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib
Comment 3 Vitaly Ostanin 2012-05-20 09:15:10 UTC
This is a bug in xinclude.c, xmlXIncludeLoadTxt().

The problem occurs when a multibyte char crosses the boundary of the internal buffer (4000 bytes). At the end of the current buffer is an incomplete symbol, then test IS_CHAR returns an error.
Comment 4 Vitaly Ostanin 2012-05-20 11:49:43 UTC
Created attachment 214484 [details] [review]
Patch

Added fallback for multibyte char at buffer boundary. If no IS_CHAR in current position and position is close to the end (< 4 byte) of buffer, restart buffer read.
Comment 5 Vitaly Ostanin 2012-05-26 18:45:13 UTC
Created attachment 215055 [details] [review]
Actual patch

Attached fixed version of patch (don't duplicate buffer content). Thanks for his comments to Alexey Ponomarev <ponomarev@yandex-team.ru>.
Comment 6 Daniel Veillard 2012-08-17 15:03:27 UTC
Patch applied, thanks a lot Vitaly !

http://git.gnome.org/browse/libxml2/commit/?id=dce1c8baaeaa4f23874c59da91d9ecc0e31a787c

Daniel