GNOME Bugzilla – Bug 326031
RELAXNG fails to allow valid grammar choices
Last modified: 2017-06-12 19:06:14 UTC
Please describe the problem: Valid choices in a RELAX NG grammar are ignored, possibly due to overly aggresive optimizations of "epsilon transitions". The following patch seems to prevent the problem, but likely slows down processing and/or generates other problems. However, it works well for me at the moment. ================================ xmlregexp.c PATCH ============ --- xmlregexp.c.orig 2005-08-23 09:37:26.000000000 -0400 +++ xmlregexp.c 2006-01-06 15:34:48.000000000 -0500 @@ -1686,6 +1686,7 @@ printf("Found simple epsilon trans from start %d to %d\n", statenr, newto); #endif +#if 0 } else { #ifdef DEBUG_REGEXP_GRAPH printf("Found simple epsilon trans from %d to %d\n", @@ -1725,6 +1726,7 @@ state->nbTrans = 0; +#endif } } Steps to reproduce: Attempt to validate the following XML with the following RELAX NG schema: ================================ RELAX NG schema ============ <?xml version="1.0" ?> <!DOCTYPE grammar> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="bug"> <choice> <group> <element name="test"> <element name="title"> <text /> </element> </element> </group> <group> <element name="test"> <element name="title"> <text /> </element> <element name="content"> <text /> </element> </element> </group> </choice> </element> </start> </grammar> ================================ XML sample file ============ <?xml version="1.0" ?> <!DOCTYPE bug> <bug> <test> <title>hey</title> <content>there</content> </test> </bug> Actual results: The error message "Did not expect element content there" is generated and the XML is (incorrectly) deemed invalid. Expected results: No error message should be generated and the XML should be deemed valid. For example, validating this with jing generates no error output. Does this happen every time? Yes. Other information: This problem seems similar to the one reported in bug #302836.
Created attachment 56886 [details] [review] simple patch to bypass epsilon transition removals
Created attachment 56887 [details] simple test case RNG file
Created attachment 56888 [details] simple test case XML file
Created attachment 57038 [details] simple test case XML file
Created attachment 57039 [details] simple test case RNG file
I can confirm the bug, even with 2.6.23, but I have doubts about the patch though... Daniel
Thanks for looking at this! As you noted, it's still a problem with 2.6.23, although I had to change the examples to demonstrate it. In general, I happen to have a lot of test cases involving a <choice> between multiple variant definitions of an XML element. I, too, am suspicious of my hack patch -- I'll attach another one for 2.6.23/CVS head, but it's probably not the right thing to do. An interesting tangent: on my RHEL 3 system, I get the following from the 2.6.23 Regexp regression tests, *without* any patches to the sources. I'm using libiconv 1.10 and zlib 1.2.3. I'll try on an RHEL 4 system in a bit. ## Regexp regression tests xpath result 7c7 < a/b/c: Ok --- > a/b/c: Fail 9,11c9,11 < a:*/b:*/c:*: Ok < child::a/child::b:*: Ok < child::a/child::b:*|a/*/b|.//a:b: Ok --- > a:*/b:*/c:*: Fail > child::a/child::b:*: Fail > child::a/child::b:*|a/*/b|.//a:b: Fail Adding my "hack patch" simply adds the following one extra failure: hard result 7c7 < b0aaa: Ok --- > b0aaa: Fail
Created attachment 57041 [details] [review] hack patch around epsilon transition removal
Yup, same reports from the Regexp regression tests on RHEL 4 (although I don't see why it would make a difference), without any patches ... maybe not a concern, I can't tell offhand, as the overall test suite reports "Success!"
I worked on fixes in the regexps a couple of weeks ago, and this seems fixed in CVS as far as I can tell: paphio:~/XML -> ./xmllint --noout --relaxng tst.rng tst.xml tst.xml validates Daniel
I've tried both CVS head as of today, and 2.6.22 with the latest xmlregexp.c and relaxng.c from CVS head, and in both cases, I see a failure with the following RELAX NG file when validating the same sample XML file from the original bug report. I'll attach both as files as well. In general, my real-life test cases are simply more complex versions of these files, with multiple alternate definitions of an element, often with 5 or more variations in terms of which sub-elements are allowed and in which order. ================================ RELAX NG schema ============ <?xml version="1.0" ?> <!DOCTYPE grammar> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="bug"> <choice> <group> <element name="test"> <element name="title"> <text /> </element> <element name="content"> <text /> </element> </element> </group> <group> <element name="test"> <element name="title"> <text /> </element> <element name="other"> <text /> </element> <element name="content"> <text /> </element> </element> </group> </choice> </element> </start> </grammar> ================================ XML sample file ============ <?xml version="1.0" ?> <!DOCTYPE bug> <bug> <test> <title>hey</title> <content>there</content> </test> </bug>
Created attachment 63408 [details] [review] same hack patch, for latest CVS version Actually, looks like the XML and RNG test files I updated a while back are sufficient to show the problem; they're the same as the ones I quote in the previous comment. This hack patch still seems to work to make them work, too.
Seems I fixed this when fixing #302836 earlier: the original test case validates as expected paphio:~/XML -> cat tst2.rng <?xml version="1.0" ?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="bug"> <choice> <group> <element name="test"> <element name="title"> <text /> </element> </element> </group> <group> <element name="test"> <element name="title"> <text /> </element> <element name="content"> <text /> </element> </element> </group> </choice> </element> </start> </grammar> paphio:~/XML -> xmllint --relaxng tst2.rng tst.xml <?xml version="1.0"?> <bug> <test> <title>hey</title> <content>there</content> </test> </bug> tst.xml validates paphio:~/XML -> and the one with the extra 'other' element correctly indicates failure: paphio:~/XML -> cat tst.rng <?xml version="1.0" ?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="bug"> <choice> <group> <element name="test"> <element name="title"> <text /> </element> <element name="content"> <text /> </element> </element> </group> <group> <element name="test"> <element name="title"> <text /> </element> <element name="other"> <text /> </element> <element name="content"> <text /> </element> </element> </group> </choice> </element> </start> </grammar> paphio:~/XML -> xmllint --relaxng tst.rng tst.xml <?xml version="1.0"?> <bug> <test> <title>hey</title> <content>there</content> </test> </bug> tst.xml:4: element content: Relax-NG validity error : Did not expect element content there tst.xml fails to validate paphio:~/XML -> so it seems it was a dup of 302836, at least it's fixed in CVs now, thanks, Daniel *** This bug has been marked as a duplicate of 302836 ***
Sorry for the delay -- I finally had some time to take a look at this. I tried my test cases using 2.6.27, which looks like it has the fix for 302836 in it in the ChangeLog. As you note above, the test with the extra <other> element fails with the message "Did not expect element content there". Alas ... it should not fail, I think. Testing it with jing, it doesn't fail, and I don't see why the RelaxNG grammar would be invalid. It's just providing a choice between two definitions of the <test> element, one with and one without an <other> sub-element. So the test should pass, not fail, I'm afraid. Which means, I guess, that this can't be just a duplicate of 302836. So, I'm re-opening the bug report -- my apologies!
RELAXNG is now correctly validating this sort of situation, but is reporting very much the wrong error. <?xml version="1.0" encoding="UTF-8"?> <grammar ns="http://www.example.com/choice" xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="doc"> <choice><ref name="option_2"/><ref name="option_1"/></choice> </element> </start> <define name="option_1"> <attribute name="type"><value>content</value></attribute> <zeroOrMore><element name="something"><empty/></element></zeroOrMore> </define> <define name="option_2"> <attribute name="type"><value>no-content</value></attribute> </define> </grammar> This grammar should lead to <doc type="content"><something/></doc> being valid, and <doc type="no-content"/> being valid, and <doc type="content"><something>Hah, not empty!</something></doc> being invalid. xmllint (compiled against libxml 20627) correctly reports these validity states, but for the third one: $ xmllint --noout --relaxng sample.rng sample.xml sample.xml:4: element something: Relax-NG validity error : Element doc has extra content: something sample.xml fails to validate This problem does not persist if the <choice> is collapsed down to remove the type element, or if the <zeroOrMore> is either removed or changed to <oneOrMore> -- in these cases, the correct error (unexpected text content in <something>) is reported.
That's a very different issue, the error reporting is far from perfect, but I think the original bug of failing to validate (or not) the instances is correctly fixed, Daniel