GNOME Bugzilla – Bug 688171
generate-id generates the same ID for all document nodes
Last modified: 2012-11-21 03:16:02 UTC
Created attachment 228781 [details] Two small XML files and an XSL transform to run The xsltGenerateIdFunction function creates the same unique ID for all document nodes (namely, "idp0") This is a regression caused by change ecb6bcb8d1b7e44842edde3929f412d46b40c89f I have checked http://www.w3.org/TR/xslt to verify that generate-id is required to generate IDs that are unique for "nodes" and not "nodes in one specific document". The spec is silent on the issue but implies in the example towards the end of section 12.1 that generated IDs should be comparable between documents - my attached test case uses that example to demonstrate the problem. I've reproduced it with the latest git snapshot downloaded November 12th 2012 @ 13:12:00 UTC. This problem occurs 100% of the time. However, I discovered this whilst investigating occasional failures in our builds relating to one of our XSL transforms failing to generate valid output files. These fail because different nodes in different documents end up with the same generated ID, which is fundamentally the same problem, but dependent on the pointers provided by the memory manager (which makes it intermittent and hard to reproduce) The reason the failure occurs is that the algorithm for generating the ID is to compute the difference between the node pointer and the document pointer. Thus it is occasionally possible for two nodes in different documents to both be the same offset from their respective documents and end up with the same ID, although this happens only occasionally and is hard to reproduce reliably. However, when the node concerned is the document node itself, this is always at the same address as its document node, so all document nodes end up with the same ID ("idp0"), all the time. As proof of concept, I changed the algorithm back to "val = (unsigned long)((char *)cur - (char *)0);" to demonstate that this fixes the problem for me, but obviously reintroduces the issue that the earlier change was designed to eliminate. Additionally, neither the current nor previous expression for computing 'val' is safe on systems with 32-bit long and 64-bit pointers, which I had originally guessed might have been the cause of the problem I was having, but it isn't (long turned out to be 64-bits as well) The attached .tar.gz file contains test.xsl, one.xml and two.xml. Unpack somewhere and try: xsltproc test.xsl one.xml There is no output expected. The actual output is: Error: Two documents, but generate-id created: idp0 == idp0 (This message is generated by the test.xsl file) The fix is to ensure that something unique about the node's owner document is included in all the unique IDs. Appending or prepending some identifier based on the xmlDocPtr would fix the issue, even if it's just some serial number based on which document it is.
Created attachment 229048 [details] [review] Suggested patch that resolves the issue My proposed patch changes the ID generation function to include a representation related to the doc pointer - it uses the offset of the xmlDocPtr from the address of a static local variable, in the same style as the code already does for the node pointer. The buffer length has been extended to 48 to ensure sufficient space - the longest possible identifier now is 45 bytes when long is 64 bits. This algorithm will still fail if long is 32-bit and pointers are 64-bit, but I'll raise that as a separate issue.
Created attachment 229050 [details] [review] New testcases to verify the generate-id function is working This patch adds a testcase that verifies that the generate-id() function is working as expected.
Created attachment 229055 [details] [review] Smaller, simpler patch that fixes the issue The new suggested patch is much simpler two line change.
Okay, problem understood and that last patch looks just fine indeed ! I just commited your patch: http://git.gnome.org/browse/libxslt/commit/?id=d204e66735cb701e5b76e1489353faa48cfc6016 thanks a lot ! Daniel