GNOME Bugzilla – Bug 344390
xmlReconciliateNs is not XML_PARSE_COMPACT aware and crashes
Last modified: 2006-10-14 08:46:44 UTC
Steps to reproduce: 1. parse a document with the XML_PARSE_COMPACT option 2. call xmlReconciliateNs on a node Stack trace: The function uses this code, irrespective of the node type: attr = node->properties; while (attr != NULL) { if (attr->ns != NULL) { Other information:
XML_PARSE_COMPACT should really not be used for document which are modified programmatically after parsing ! This may get fixed, but really you should not do that, back to normal priority ! Daniel
I wouldn't know why, though. Sure, I can't free text node content without calling into the API and I have to take care myself I don't do unsafe things with it. But as a user, I would definitely expect API calls to be safe, wherever the tree came from. So, this is still a crasher. I won't argue about the priority, since I've worked around it by copying the function into lxml and fixing the bug there. I would still prefer this bug to be also fixed in libxml2. Stefan
If XML_PARSE_COMPACT is not intended to be used if the tree is intended to be modified - if I understood correctly - then I see no reason why trying to adjust any other non-fitting function to work with XML_PARSE_COMPACT. I think it is cruicial to be aware that working around such restrictions, might lead to messing with the code.; you may have fixed xmlReconciliateNs() to work with XML_PARSE_COMPACT, but this may open end up in trying to adjust all other funtions to do work with it. And please, don't intermix Python's philosophy with this C library; it _will_ crash if not used properly. We should change the status back to normal, since your are misusing the API. Additionally consider the following: someone might add a flag to the xmlDoc struct in the future, indicating that the doc is immutable; if this flag would then be used by any other Libxml2 function, would you go and copy/fix the library in lxml? However, you seem to have already found an easy way to fix xmlReconciliateNs() for your needs, so posting that diff would be of help. Customizing is always a delicate battlefield, so I can understand your need for tricks. I'm using a customized way to handle namespace declarations/references in my Delphi wrapper, but the step to open a critical bug, because my wrapper crashes due to my misuse is something I wouldn't consider to be logical.
Ok, guess I'll take it back out. However, if there is apparently no interest in getting this to work, this would be a good thing to document somewhere. It is not a good idea to enable options that happily break the API without telling people about it *before* they use them.
Right, documentation is an Achilles' heel in Libxml2. I'll change the doc to the following (and wonder how it will look like in the generated HTML docs, since multi-line): XML_PARSE_COMPACT = 1<<16 /* compact small text nodes; no modification of the tree allowed afterwards; you might encounter crashes if you try */ Daniel, what do the stylesheets do with documentation if put before the item to be documented? I.e.: /* compact small text nodes; no modification of the tree allowed afterwards; you might encounter crashes if you try */ XML_PARSE_COMPACT = 1<<16
OK, committed the change to parser.h.
Stefan, regardless of my rant about the severity of this bug entry, we would still appreciate it if you post the fix which healed your problem with xmlReconciliateNs().
w.r.t. #2 and #7, if you have a bug fix please provide it of course ! Daniel
Sure, sorry. The problem is that the fixed function was not written in C but in Pyrex and also is based on our tree traversal framework (written as a pair of C macros). So I can't easily post a patch. The main thing to fix, however, is to check if the node is an element before accessing its attributes. This prevents reading the node->properties pointer for text nodes, where COMPACT uses it to store text data instead of a real pointer. BTW, I think the function is worth a refactoring anyway. Spelling out two identical loops that are pretty lengthy and only differ in their start value is a rather uncommon programming style. Stefan
Okay, I see the missing node type test, dohh ! For my excuse I will say I didn't wrote this but I should have checked it :-\ W.r.t. the duplicated loop, I see only one external loop on the tree recursion and another loop internally on the attributes, so I'm a bit surprized by the second comnment. Anyway ReconciliateNs is somehow experimental, various people have played with it over time, if you can do it more safely in an upper layer it's probably better. At least the core bug should be fixed in CVS now, thanks ! Daniel