GNOME Bugzilla – Bug 340316
Schema validation doesn't work when using entities
Last modified: 2021-07-05 13:24:08 UTC
Please describe the problem: Validating xml files that contain entities fails unless --noent is passed to xmllint. Steps to reproduce: Use the following schema bug.xsd <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2001/XMLSchema"> <element name="test" type="string"/> </schema> to validate this document bug.xml <?xml version="1.0"?> <!DOCTYPE test [ <!ENTITY nbsp " "> ]> <test> </test> Actual results: If you run xmllint --noout --schema bug.xsd bug.xml the result is Unimplemented block at xmlschemas.c:27500 bug.xml:6: element test: Schemas validity error : Element 'test': Element content is not allowed, because the type definition is simple. bug.xml fails to validate Expected results: The document should validate. Does this happen every time? Yes. Other information: If you pass the --noent option the document validates.
Created attachment 64628 [details] Testcase
The processor does not work with entity references currently; they must be expanded before presented to the schema processor. I don't know what is the intended (by the spec) mechanism here, so it would be great if someone could google a bit about this issue, or, even better, ask the mailing lists at xml-dev@lists.xml.org, xmlschema-dev@w3.org, etc.
XML Schema Part 0 has an Appendix about entities: http://www.w3.org/TR/xmlschema-0/#usingEntities It says that entities should be resolved before validation. So xmllint should enforce the effect of the --noent option when validating against a schema. But the document output of xmllint should contain the unresolved entities unless --noent is really provided.
Ah, it was in the primer. Not sure how to do this, since I'm not into entities. What to do if the entity contains markup? Should we dynamically add nodes to the instance-tree in this case?
I'd say yes, but I don't think I'm qualified to answer that question. For me the whole problem is solved using the --noent option. I just mentioned the issue because currently the error message you get when using entities seems completely bogus.
:-) True; we get the crazy message, since the tree-traversal breaks when hitting the XML_ENTITY_NODE and XML_ENTITY_REF_NODE handling code in xmlSchemaVDocWalk(); this results in the element "test" being processed a second time - as the child of the same element. I think we should better raise an internal error here.
We'll raise an internal error and stop the validation now if an entity is found in the instance document. The error message will invite the user to substitute entities before validation. Committed to CVS, xmlschemas.c, rev. 1.200. I think we should leave this bug open, as we might really need to perform automatic entity substitution in the future - but I'm not sure here. Thanks for the report!
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.