GNOME Bugzilla – Bug 312945
xmlSchemaNewMemParserCtxt() incorrectly finds a validation error
Last modified: 2009-08-15 18:40:50 UTC
Please describe the problem: We have a Windows MFC application that was successfully using libxml2 2.6.14 to load an XSD. After upgrading to 2.6.20, libxml2 gives the following error when loading the XSD: Error 1717 (line 24) local atomic type: Internal error: xmlSchemaDeriveAndValidateFacets We traced the error to our source that parses the schema from a memory buffer: xmlSchemaParserCtxt* spctxt = xmlSchemaNewMemParserCtxt(buf, len); If we change the line to read direct from a file instead of a memory buffer, then the problem goes away: xmlSchemaParserCtxt* spctxt = xmlSchemaNewParserCtxt("c:\\our_schema.xsd"); So there is some change from 2.6.14 to 2.6.20 that causes memory buffer parsing to give a "local atomic type" error. Steps to reproduce: Following is the actual XSD that causes the problem. Line 24 refers to a SimpleType of integer with min/max restrictions of 1 and 16. <?xml version="1.0"?> <!-- edited with XMLSpy v2005 rel. 3 U (http://www.altova.com) by Dean Hill (DIGIMARC ID SYSTEMS) --> <xs:schema xmlns:z="http://www.digimarc.com/dmsg.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.digimarc.com/dmsg.xsd" elementFormDefault="qualified" attributeFormDefault="qualified" version="1.9, 2005-07-07"> <!--The Document section is the highest-level wrapper for a request/response.--> <xs:element name="Document"> <xs:annotation> <xs:documentation>Highest level wrapper for a request/response.</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="Application"> <xs:complexType> <xs:sequence> <xs:element name="Card"> <xs:annotation> <xs:documentation>Information associated with the printed card.</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="CardType"> <xs:annotation> <xs:documentation>Value from 1-16 indicating the type of card.</xs:documentation> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="1"/> <xs:maxInclusive value="16"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <!--Type Library--> </xs:schema> Actual results: Expected results: Does this happen every time? Other information:
I don't really understand where this could come from ! All the schemas parsing is based on an existing parsing tree. So this would mean that parsing from memory and from the disk lead to different results. The only points I can think of are: - encoding, but your schemas seems pure ascii - the XML base for the URI References may not be present but your schemas doesn't do any include... The regression tests check that parsing from memory and from disk don't diverge on a hundred of test case. Once parsed to tree the code would be identical. I can't understand where your problem comes from. I would need a reproductible standalone test case, and the schemas and instance provided as attachment or in a tar (cut an paste from a web form is completely unreliable and unusable for the kind of error you are reporting). I sugest double-checking that your tests for both APIs use are linking with the same version of the library. Sorry I can't process this bug further without complete informations. Daniel
When I build libxml2, testSchemas.exe is also created. I used testSchemas.exe to validate my XSD, but unfortunately it always parsed from file (not memory). After looking at testSchemas.c, I see that if HAVE_SYS_MMAN_H is defined the "--memory" flag becomes enabled to parse from memory. Since I'm running under Win32 and don't have mmap(), I'll modify testSchemas.c to load the file into memory the old fashioned way: open file, seek to EOF to get file size, alloc memory, read file into memory. Using a modified testSchemas should allow me to either verify the problem is in libxml2 or some mistake I'm making. I'll let you know the results. Dean
Hum, I tend to use xmllint --schema instead of testSchemas those days. But yeah I understand... Daniel
I enhanced testSchemas.exe to validate from memory under Windows. This allowed me to reproduce the problem and find a fix in xmlschema.c. The fix was to add the following line to xmlSchemaNewMemParserCtxt(). The line already exists in xmlSchemaNewParserCtxt(). ret->type = XML_SCHEMA_CTXT_PARSER; Do you want me to send you the new xmlschema.c and enhanced testSchemas.c? Also, it seems that the regression tests need modified because they weren't catching this problem. Let me know what I should do next.
Dohh, stupid mistake, sorry ! I fixed a couple more occurences of the same problem in xmlschema.c , it is commited in CVS in the diff between revisions 1.165 and 1.166 for that file. I'm not sure this really can be plugged easilly in the regression tests, nor that it makes much sense in that specific case, if the --memory flag of xmllint when used in conjunction with --schema were loading the stylesheet from memory yes that could be one way to provide checking, but I'm not sure it's crucial to add. There should be an updated snapshot tarball on ftp://xmlsoft.org/ within one hour of my commit with the current state, thanks a lot for your report and finding the problem ! Daniel
I'm glad the change resolved the problem. I have two remaining questions. (1) I notice that "xmllint --memory" actually calls xmlReaderForMemory(). Whereas "testschemas --memory" calls xmlSchemaNewMemParserCtxt(). Do both these functions get exercised during regression tests? (2) In both xmllint and testschemas the "--memory" flag causes the mmap() function to be used to read the file into memory. Since this function doesn't exist on the Windows systems I run under, I had to write equivalent code to simulate mmap(). I think my code is platform independent. Can xmllint and testschems be changed to use my new code so the "--memory" option is available on all platforms? Thanks.
(1) xmlReaderForMemory() is called only is --stream is used, i.e. streaming validation on top of the reader. there is actually 3 modes for XSD validation, tree, reader and SAX. Now you could have the instance parsed from file, or from memory or from a file descriptor or through custom I/O. You can also have the schemas parsed from file, from memory from an fd or custom I/O . All different variations are not exercised during regression tests, most share common paths, the slight variations from one code path to another are not worth unrolling the full XSD test suite though all of them. xmlReaderForMemory() is exercised, xmlSchemaNewMemParserCtxt() may not be but I'm unsure. The regression tests will also depend on the platorm. I don't run any regression tests outside ow Linux. On windows I rely on contributors especially Igor. (2) I want to use mmap() on Linux since this is by far the best performing solution for very large files. If you want to provide a patch so that --memory works on Windows I will take it. Note that this is no more a bug, rather a request for enhancement, but that's fine too. Daniel
Created attachment 50594 [details] Alternative to mmap() under Windows (1) Thanks for the information. (2) My "Windows mmap() patch" is attached in testschemas.c. A diff will show my changes, but the summary is: - I added the #define LIBXML_MEMPARSE_ENABLED to indicate if mem parsing was allowed. The HAVE_SYS_MMAN_H #define used to serve this purpose. - If HAVE_SYS_MMAN_H is true, then mmap() is used for memory parsing, otherwise my patch is used. Tweak this to fit our standards. xmllint.c could be changed in a similar way. Thanks.
This should be closed by release of libxml2-2.6.21, thanks, Daniel