GNOME Bugzilla – Bug 694982
2.9 fails to encode huge buffers completely thus truncating output xml
Last modified: 2013-03-27 03:02:45 UTC
libxml2 fails to read & save certain SVG files with embedded BASE64 images https://bugs.launchpad.net/ubuntu/+source/inkscape/+bug/1130225 Here is the test code http://pastebin.com/3WAZj62N . Sample SVG is available at http://www.sendspace.com/file/doyain . I failed to create standalone reproducible example. Though I managed to roughly narrow down the issue. Bug was introduced somewhere between 6f6feba876eeff3a75fc10cdc2f414cc66204dde 2012-07-25 Fixup for buf.c baaf03f80f817bb34c421421e6cb4d68c353ac9a 2012-07-20 Fix an error in previous commit
File is read correctly. The length and the ending of xlink:href look reasonable. Something weird happens when either saving doc to a file or dumping to a memory buffer.
xmlCharEncOutput converts only 64 * 1024 bytes at a time. Either it or xmlOutputBufferFlush should make sure that entire buffer is converted, flushed, and written before closing everything. Otherwise a content of huge buffer simply gets lost.
Created attachment 237881 [details] [review] Proposed patch This patch resolves an issue for me. No crashes from Inkscape plugins.
Perhaps it was not the right place to patch. If I try to create an XML from scratch in-memory, then dump, read, and write again, I can see I/O: wrote 6740848 chars whereas when I work with sample SVG it always says 65536 though it should be way more.
Created attachment 237899 [details] Reproducible example gcc -Wall -ggdb -I/opt/libxml2/include/libxml2 -L/opt/libxml2/lib tst2.c -lxml2 -o tst2 mlt@nb:/opt/libxml2$ LD_LIBRARY_PATH=/opt/libxml2/lib /opt/libxml2/tst2 2>111 && xmllint -noout /tmp/bad.xml /tmp/bad.xml:3: parser error : AttValue: ' expected blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah ^ /tmp/bad.xml:3: parser error : Premature end of data in tag root line 2 blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah ^
I have made fixes in this area on upstream. Could you check again with the git head of libxml2 ? I'm not sure I really want to change that part of the buffer handling code of libxml2 in the way suggested, since I made other fixes which could actually resolve the issue but at a different level in git would you mind checking ? http://git.gnome.org/browse/libxml2/ thanks, Daniel
I don't see new commits much compared to yesterday. Did you forget to push changes? I guess 65536 limit for encoding is meant for being memory conservative. I think it is not that bad to encode the rest while flushing. This way if it is a single huge chunk followed by numerous small buffer writes, it all will probably be encoded before flushing. I guess I just came across a rare case. mlt@nb:~/workspace/libxml2(master)$ git log -1 commit a09890684c71fcfcc13c2c4c8731cbb6e9d923f8 Author: Gilles Espinasse <g.esp@free.fr> Date: Mon Mar 4 22:46:21 2013 +0800 Fix configure cannot remove messages this is the other way to solve ./configure cannot remove messages by simply removing rm detection in configure.in There is already a raw 'rm -f' at the end on configure.in mlt@nb:~/workspace/libxml2(master)$ LD_LIBRARY_PATH=/opt/libxml2/lib /opt/libxml2/tst2 && xmllint -noout /tmp/bad.xml /tmp/bad.xml:3: parser error : AttValue: ' expected blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah ^ /tmp/bad.xml:3: parser error : Premature end of data in tag root line 2 blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah ^
Okay, I will try to find the problem :-)
Okay, I looked closer, 100% agree with the patch, it's a Flush operation and hence everything must be converted ! I have had trouble with buffer conversion in normal operations hence my initial reply, both are somehow related I think the default lower level operation changed semantic slightly, thanks a lot, commited and pushed with a small comment fix ! https://git.gnome.org/browse/libxml2/commit/?id=8e2098aef7d119ee95228564174d6a87d4183f4a Daniel