GNOME Bugzilla – Bug 117702
[PATCH] xmlreader returns incomple textnodes
Last modified: 2009-08-15 18:40:50 UTC
xmlreader will sometimes return TEXT_NODE with only initial part of the real content. The remaining part of such text node is lost (it is not returned as another TEXT_NODE of the same parent ELEMENT such as in the case of SAX). Bug is specific to xmlreader which is built on top of SAX. SAX is not affected by this bug. It affects big text nodes - xmlreader has buffersize 512, textnode must be larger. Problems occured on parsing 8KB text nodes.
Created attachment 18386 [details] [review] Full fix of this bug.
Created attachment 18387 [details] [review] Trivia debug dump to show the bug by testReader.
The patch seems a bit broken to me. Basically as soon as you parse an element with a text node children, then you would read the full subtree of the element in memory. I slightly chnaged the test to : *** xmlreader.c 9 Jun 2003 09:10:33 -0000 1.49 --- xmlreader.c 18 Jul 2003 15:13:59 -0000 *************** *** 770,775 **** --- 770,778 ---- ((oldstate == XML_TEXTREADER_BACKTRACK) || (reader->node->children == NULL) || (reader->node->type == XML_ENTITY_REF_NODE) || + ((reader->node->children != NULL) && + (reader->node->children->type == XML_TEXT_NODE) && + (reader->node->children->next == NULL)) || (reader->node->type == XML_DTD_NODE) || (reader->node->type == XML_DOCUMENT_NODE) || (reader->node->type == XML_HTML_DOCUMENT_NODE)) && to be sure we stop if the parser started building a sibling node to the child text node. thanks, I hope this fixes it, I commited to CVS, Daniel
This should be closed by release of libxml2-2.5.9, thanks, Daniel