Bug 312945 – xmlSchemaNewMemParserCtxt() incorrectly finds a validation error

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 312945 - xmlSchemaNewMemParserCtxt() incorrectly finds a validation error


Summary:	xmlSchemaNewMemParserCtxt() incorrectly finds a validation error


Status:	VERIFIED FIXED

Product:	libxml2
Classification:	Platform
Component:	general
Version:	2.6.20
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Daniel Veillard
QA Contact:	libxml QA maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2005-08-08 22:00 UTC by Dean Hill
Modified:	2009-08-15 18:40 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Alternative to mmap() under Windows (692.71 KB, text/plain) 2005-08-11 21:23 UTC, Dean Hill	Details

Description Dean Hill 2005-08-08 22:00:16 UTC

Please describe the problem:
We have a Windows MFC application that was successfully using libxml2 2.6.14 
to load an XSD.  After upgrading to 2.6.20, libxml2 gives the following error 
when loading the XSD:
    Error 1717 (line 24) local atomic type: Internal error: 
xmlSchemaDeriveAndValidateFacets

We traced the error to our source that parses the schema from a memory buffer:
    xmlSchemaParserCtxt* spctxt = xmlSchemaNewMemParserCtxt(buf, len);

If we change the line to read direct from a file instead of a memory buffer, 
then the problem goes away:
    xmlSchemaParserCtxt* spctxt = xmlSchemaNewParserCtxt("c:\\our_schema.xsd");

So there is some change from 2.6.14 to 2.6.20 that causes memory buffer 
parsing to give a "local atomic type" error.

Steps to reproduce:
Following is the actual XSD that causes the problem.  Line 24 refers to a 
SimpleType of integer with min/max restrictions of 1 and 16.

<?xml version="1.0"?>
<!-- edited with XMLSpy v2005 rel. 3 U (http://www.altova.com) by Dean Hill 
(DIGIMARC ID SYSTEMS) -->
<xs:schema xmlns:z="http://www.digimarc.com/dmsg.xsd" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" 
targetNamespace="http://www.digimarc.com/dmsg.xsd" 
elementFormDefault="qualified" attributeFormDefault="qualified" version="1.9, 
2005-07-07">
	<!--The Document section is the highest-level wrapper for a 
request/response.-->
	<xs:element name="Document">
		<xs:annotation>
			<xs:documentation>Highest level wrapper for a 
request/response.</xs:documentation>
		</xs:annotation>
		<xs:complexType>
			<xs:sequence>
				<xs:element name="Application">
					<xs:complexType>
						<xs:sequence>
							<xs:element 
name="Card">
								<xs:annotation>
								
	<xs:documentation>Information associated with the printed 
card.</xs:documentation>
							
	</xs:annotation>
							
	<xs:complexType>
								
	<xs:sequence>
									
	<xs:element name="CardType">
									
		<xs:annotation>
									
			<xs:documentation>Value from 1-16 indicating the type 
of card.</xs:documentation>
									
		</xs:annotation>
									
		<xs:simpleType>
									
			<xs:restriction base="xs:integer">
									
				<xs:minInclusive value="1"/>
									
				<xs:maxInclusive value="16"/>
									
			</xs:restriction>
									
		</xs:simpleType>
									
	</xs:element>
								
	</xs:sequence>
							
	</xs:complexType>
							</xs:element>
						</xs:sequence>
					</xs:complexType>
				</xs:element>
			</xs:sequence>
		</xs:complexType>
	</xs:element>
	<!--Type Library-->
</xs:schema>


Actual results:


Expected results:


Does this happen every time?


Other information:

Comment 1 Daniel Veillard 2005-08-08 22:30:47 UTC

I don't really understand where this could come from !
All the schemas parsing is based on an existing parsing tree.
So this would mean that parsing from memory and from the disk lead to 
different results. The only points I can think of are:
   - encoding, but your schemas seems pure ascii
   - the XML base for the URI References may not be present
     but your schemas doesn't do any include...
The regression tests check that parsing from memory and from disk don't
diverge on a hundred of test case. Once parsed to tree the code would
be identical.
I can't understand where your problem comes from. I would need a reproductible
standalone test case, and the schemas and instance provided as attachment
or in a tar (cut an paste from a web form is completely unreliable and 
unusable for the kind of error you are reporting). I sugest double-checking that
your tests for both APIs use are linking with the same version of the library. 
Sorry I can't process this bug further without complete informations.

Daniel

Comment 2 Dean Hill 2005-08-08 22:42:59 UTC

When I build libxml2, testSchemas.exe is also created.  I used testSchemas.exe 
to validate my XSD, but unfortunately it always parsed from file (not 
memory).  After looking at testSchemas.c, I see that if HAVE_SYS_MMAN_H is 
defined the "--memory" flag becomes enabled to parse from memory.

Since I'm running under Win32 and don't have mmap(), I'll modify testSchemas.c 
to load the file into memory the old fashioned way: open file, seek to EOF to 
get file size, alloc memory, read file into memory.

Using a modified testSchemas should allow me to either verify the problem is 
in libxml2 or some mistake I'm making.

I'll let you know the results.

Dean

Comment 3 Daniel Veillard 2005-08-08 22:54:56 UTC

Hum, I tend to use xmllint --schema instead of testSchemas those days. But
yeah I understand...

Daniel

Comment 4 Dean Hill 2005-08-10 20:25:43 UTC

I enhanced testSchemas.exe to validate from memory under Windows.  This 
allowed me to reproduce the problem and find a fix in xmlschema.c.

The fix was to add the following line to xmlSchemaNewMemParserCtxt().  The 
line already exists in xmlSchemaNewParserCtxt().

    ret->type = XML_SCHEMA_CTXT_PARSER;

Do you want me to send you the new xmlschema.c and enhanced testSchemas.c?  
Also, it seems that the regression tests need modified because they weren't 
catching this problem.

Let me know what I should do next.

Comment 5 Daniel Veillard 2005-08-10 21:46:03 UTC

Dohh, stupid mistake, sorry !
I fixed a couple more occurences of the same problem in xmlschema.c , it is 
commited in CVS in the diff between revisions 1.165 and 1.166 for that file.
I'm not sure this really can be plugged easilly in the regression tests, nor
that it makes much sense in that specific case, if the --memory flag of xmllint
when used in conjunction with --schema were loading the stylesheet from
memory yes that could be one way to provide checking, but I'm not sure it's
crucial to add.

There should be an updated snapshot tarball on ftp://xmlsoft.org/ within 
one hour of my commit with the current state,

  thanks a lot for your report and finding the problem !

Daniel

Comment 6 Dean Hill 2005-08-11 13:48:50 UTC

I'm glad the change resolved the problem.

I have two remaining questions.
(1) I notice that "xmllint --memory" actually calls xmlReaderForMemory().  
Whereas "testschemas --memory" calls xmlSchemaNewMemParserCtxt().  Do both 
these functions get exercised during regression tests?

(2) In both xmllint and testschemas the "--memory" flag causes the mmap() 
function to be used to read the file into memory.  Since this function doesn't 
exist on the Windows systems I run under, I had to write equivalent code to 
simulate mmap().  I think my code is platform independent.  Can xmllint and 
testschems be changed to use my new code so the "--memory" option is available 
on all platforms?

Thanks.

Comment 7 Daniel Veillard 2005-08-11 19:26:06 UTC

(1) xmlReaderForMemory() is called only is --stream is used, i.e. streaming
    validation on top of the reader. there is actually 3 modes for XSD
    validation, tree, reader and SAX. Now you could have the instance parsed
    from file, or from memory or from a file descriptor or through custom I/O. 
    You can also have the schemas parsed from file, from memory from an fd or
    custom I/O . All different variations are not exercised during regression
    tests, most share common paths, the slight variations from one code path
    to another are not worth unrolling the full XSD test suite though all of 
    them.

xmlReaderForMemory() is exercised, xmlSchemaNewMemParserCtxt() may not be but 
I'm unsure. The regression tests will also depend on the platorm. I don't run
any regression tests outside ow Linux. On windows I rely on contributors 
especially Igor.

(2) I want to use mmap() on Linux since this is by far the best performing 
    solution for very large files. If you want to provide a patch so that
    --memory works on Windows I will take it.
    Note that this is no more a bug, rather a request for enhancement, but
    that's fine too.

Daniel

Comment 8 Dean Hill 2005-08-11 21:23:01 UTC

Created attachment 50594 [details]
Alternative to mmap() under Windows

(1) Thanks for the information.

(2) My "Windows mmap() patch" is attached in testschemas.c.  A diff will show
my changes, but the summary is:

- I added the #define LIBXML_MEMPARSE_ENABLED to indicate if mem parsing was
allowed.  The HAVE_SYS_MMAN_H #define used to serve this purpose.

- If HAVE_SYS_MMAN_H is true, then mmap() is used for memory parsing, otherwise
my patch is used.

Tweak this to fit our standards.  xmllint.c could be changed in a similar way.

Thanks.

Comment 9 Daniel Veillard 2005-09-05 08:59:59 UTC

This should be closed by release of libxml2-2.6.21,

  thanks,

Daniel