After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 694982 - 2.9 fails to encode huge buffers completely thus truncating output xml
2.9 fails to encode huge buffers completely thus truncating output xml
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2013-03-02 07:26 UTC by Mikhail Titov
Modified: 2013-03-27 03:02 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed patch (869 bytes, patch)
2013-03-03 16:46 UTC, Mikhail Titov
none Details | Review
Reproducible example (766 bytes, text/plain)
2013-03-03 20:35 UTC, Mikhail Titov
  Details

Description Mikhail Titov 2013-03-02 07:26:42 UTC
libxml2 fails to read & save certain SVG files with embedded BASE64 images https://bugs.launchpad.net/ubuntu/+source/inkscape/+bug/1130225

Here is the test code http://pastebin.com/3WAZj62N . Sample SVG is available at http://www.sendspace.com/file/doyain .

I failed to create standalone reproducible example. Though I managed to roughly narrow down the issue. Bug was introduced somewhere between 

6f6feba876eeff3a75fc10cdc2f414cc66204dde 2012-07-25	Fixup for buf.c
baaf03f80f817bb34c421421e6cb4d68c353ac9a 2012-07-20	Fix an error in previous commit
Comment 1 Mikhail Titov 2013-03-03 06:23:28 UTC
File is read correctly. The length and the ending of xlink:href look reasonable. Something weird happens when either saving doc to a file or dumping to a memory buffer.
Comment 2 Mikhail Titov 2013-03-03 09:26:33 UTC
xmlCharEncOutput converts only 64 * 1024 bytes at a time. Either it or xmlOutputBufferFlush should make sure that entire buffer is converted, flushed, and written before closing everything. Otherwise a content of huge buffer simply gets lost.
Comment 3 Mikhail Titov 2013-03-03 16:46:07 UTC
Created attachment 237881 [details] [review]
Proposed patch

This patch resolves an issue for me. No crashes from Inkscape plugins.
Comment 4 Mikhail Titov 2013-03-03 17:31:17 UTC
Perhaps it was not the right place to patch. If I try to create an XML from scratch in-memory, then dump, read, and write again, I can see

I/O: wrote 6740848 chars

whereas when I work with sample SVG it always says 65536 though it should be way more.
Comment 5 Mikhail Titov 2013-03-03 20:35:57 UTC
Created attachment 237899 [details]
Reproducible example

gcc -Wall -ggdb -I/opt/libxml2/include/libxml2 -L/opt/libxml2/lib tst2.c -lxml2 -o tst2

mlt@nb:/opt/libxml2$ LD_LIBRARY_PATH=/opt/libxml2/lib /opt/libxml2/tst2 2>111 && xmllint -noout /tmp/bad.xml
/tmp/bad.xml:3: parser error : AttValue: ' expected
 blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
                                                                               ^
/tmp/bad.xml:3: parser error : Premature end of data in tag root line 2
 blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
                                                                               ^
Comment 6 Daniel Veillard 2013-03-04 06:57:45 UTC
I have made fixes in this area on upstream.
Could you check again with the git head of libxml2 ?
I'm not sure I really want to change that part of the buffer handling code
of libxml2 in the way suggested, since I made other fixes which could
actually resolve the issue but at a different level in git would you
mind checking ? http://git.gnome.org/browse/libxml2/

   thanks,

Daniel
Comment 7 Mikhail Titov 2013-03-04 17:10:36 UTC
I don't see new commits much compared to yesterday. Did you forget to push changes?

I guess 65536 limit for encoding is meant for being memory conservative. I think it is not that bad to encode the rest while flushing. This way if it is a single huge chunk followed by numerous small buffer writes, it all will probably be encoded before flushing. I guess I just came across a rare case.

mlt@nb:~/workspace/libxml2(master)$ git log -1
commit a09890684c71fcfcc13c2c4c8731cbb6e9d923f8
Author: Gilles Espinasse <g.esp@free.fr>
Date:   Mon Mar 4 22:46:21 2013 +0800

    Fix configure cannot remove messages
    
    this is the other way to solve ./configure cannot remove messages by
    simply removing rm detection in configure.in
    
    There is already a raw 'rm -f' at the end on configure.in

mlt@nb:~/workspace/libxml2(master)$ LD_LIBRARY_PATH=/opt/libxml2/lib /opt/libxml2/tst2 && xmllint -noout /tmp/bad.xml 
/tmp/bad.xml:3: parser error : AttValue: ' expected
 blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
                                                                               ^
/tmp/bad.xml:3: parser error : Premature end of data in tag root line 2
 blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
                                                                               ^
Comment 8 Daniel Veillard 2013-03-06 09:40:36 UTC
Okay, I will try to find the problem :-)
Comment 9 Daniel Veillard 2013-03-27 03:02:45 UTC
Okay, I looked closer, 100% agree with the patch, it's a Flush operation
and hence everything must be converted !

I have had trouble with buffer conversion in normal operations hence my
initial reply, both are somehow related I think the default lower level
operation changed semantic slightly,

  thanks a lot, commited and pushed with a small comment fix !
https://git.gnome.org/browse/libxml2/commit/?id=8e2098aef7d119ee95228564174d6a87d4183f4a

Daniel