GNOME Bugzilla – Bug 323510
XML schema element occurences are not enforced
Last modified: 2005-12-13 12:10:05 UTC
Please describe the problem: The following document should not validate with the following schema. The schema should enforce each post to have at least one paragraph. The bug happens only when maxOccurs equals unbounded, setting it to 40, for example does not trigger the bug. I am using the latest snapshot present on the xmlsoft.org website. I am using this library through xmllint, not that it should matter. Document: <?xml version="1.0" encoding="iso-8859-1"?> <blog xmlns="http://x2a.org/article/2005/11" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://x2a.org/pub/blog.xsd"> <name>Jonathan’s Blog</name> <description>Life, the universe and everything as seen by Jonathan Bastien-Filiatrault</description> <post> <title>An analogy between prohibition and DVD playback on Free Software platforms</title> <content> </content> </post> <post> <title>Because fiction wants to be reality</title> <content> <paragraph>[The Onion](http://theonion.com) has a short story that tells how the [RIAA would ban telling other people about songs using “unauthorized peer-\ to-peer notification of the existence of copyrighted material”](http://www.theonion.com/content/node/43029).</paragraph> <paragraph>The best part is that it uses the same rhetoric of treating people as criminals that the RIAA uses frequently. Hopefully, privatization of culture at \ that point will not happen, not that we are headed in the right direction.</paragraph> <paragraph>Brought to my attention [by Robin](http://rym.waglo.com/wordpress/2005/12/03/links-for-2005-12-03/).</paragraph> </content> </post> </blog> Schema: <?xml version="1.0" encoding="iso-8859-1"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://x2a.org/article/2005/11" xmlns="http://x2a.org/article/2005/11" elementFormDefault="qualified"> <xs:element name="blog" type="blog_type"/> <xs:complexType name="blog_type"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="description" type="xs:string" minOccurs="0"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="post" type="post_type"/> </xs:sequence> </xs:sequence> </xs:complexType> <xs:complexType name="post_type"> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="content" type="text_type"/> </xs:sequence> </xs:complexType> <xs:complexType name="text_type"> <xs:sequence> <xs:element name="paragraph" minOccurs="1" maxOccurs="unbounded" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:schema> Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Other information:
The algorithms in xmlFAEliminateSimpleEpsilonTransitions() and xmlFAEliminateEpsilonTransitions() are changing and elimitating transitions, so that the first state will become the final state, thus a call to xmlRegExecPushString(inode->regexCtxt, NULL, NULL), which would normally catch the missing child element, return 1 due to if (comp->compact[state * (comp->nbstrings + 1)] == XML_REGEXP_FINAL_STATE) return(1); A reduced case for this issue: Schema: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="content"> <xs:complexType> <xs:sequence> <xs:element name="paragraph" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Instance: <content/>
Okay I will look it should be simple, but I'm quite busy at the moment, probably easier in a couple of weeks ... Daniel
As suggested by Daniel, we substituted the epsilon transition for a labelled transition, in order to avoid a bug in xmlregexp.c which eliminated the epsilon transition and marked the initial state as final. This is fixed in CVS, xmlschemas.c revision 1.187. Thanks for the report!
Cheers, great work, I will try it out.