GNOME Bugzilla – Bug 159550
xmlNodeDump will never return
Last modified: 2009-08-15 18:40:50 UTC
If the xmlNodePtr structure given to xmlNodeDump contains some invalid encoded UTF-8 strings, xmlNodeDump will never return. xmlNodeDump should detect invalid strings and quit in error to avoid infinite recursion.
Created attachment 34151 [details] A atom feed containing some invalid utf-8 strings Using the attached file to create a xmlNodePtr structure, just call the function at the bottom and see what happends....infinite recursion :( xmlChar *extractHTMLNode(xmlNodePtr cur, int children) { xmlBufferPtr buf; xmlChar * result; printxmlNodeRecurse(cur, 0); buf = xmlBufferCreate(); if (children) { while (cur != NULL) { xmlNodeDump(buf, cur->doc, cur, 0, 0); cur = cur->next; } } else { xmlNodeDump(buf, cur->doc, cur, 0, 0); } if (xmlBufferLength(buf) > 0) { result = xmlCharStrdup(xmlBufferContent(buf)); } xmlBufferFree(buf); return (result); }
1) your test routine has a questionable line of code "printxmlNodeRecurse(cur, 0);" - I just removed it. 2) I'm not very clear on what you are expecting. Your testfile begins with a processing instruction declaring that the encoding is in UTF-8 (which is not true). There are many places within the parsing, etc. where checks are made to assure the input data is (internally) valid, i.e. UTF-8. Apparently you are bypassing these checks (using XML_PARSE_RECOVER ??), thereby allowing invalid text data into the tree. Then, you attempt to output this invalid data, and are surprised when there is a problem. 3) notwithstanding (2), I have put in an additional test which (should) prevent the "dead loop" which you experienced (it was not "infinite recursion", as that would have eventually terminated when stack space was exhausted). This change is in CVS (xmlIO.c). Although this should work, I strongly suggest you should re-think your approach to working with data which is known to be invalid.
This should be closed by release of libxml2-2.6.17, thanks, Daniel