GNOME Bugzilla – Bug 148326
xsltproc fails to process standalone parsed external entities
Last modified: 2009-08-15 18:40:50 UTC
when a document passed to xsltproc refers to a parsed external entity, that is declared as standalone, xsltproc failes. consider the following simple document, called a.xml: <?xml version="1.0" encoding="utf-8" standalone="yes" ?> <!DOCTYPE a [ <!ENTITY b SYSTEM "b.xml"> ]> <a> &b; </a> and the document b.xml: <?xml version="1.0" encoding="utf-8" standalone="yes" ?> <b/> when processing this document, with any XSLT, the following error is generated: $ xsltproc a.xsl a.xml b.xml:1: parser error : parsing XML declaration: '?>' expected <?xml version="1.0" encoding="utf-8" standalone="yes" ?> ^ a.xml:7: error: Failure to process entity b &b; ^ a.xml:7: parser error : Entity 'b' not defined &b; ^ unable to parse a.xml if b.xml is not specified as standalone, it is processed fine as a parsed external entity by xsltproc.
If you indicate the document is standalone, while it reference external parsed entities, this is a Well formedness error, the processing *must* stop, as the document is not XML. I am 100% sure of this. the behaviour is normal, in accordance to the XML-1.0 specification, and again you should rather read and understand the specs before reporting erronous bugs. If it was working for you before on other tools, those tools were just not compliants to the specifications. Daniel
Sorry, but I have to disaggree with you. First, let me have a slight change in my example above, to remove a confusion here. For a.xml be: <?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE a [ <!ENTITY b SYSTEM "b.xml"> ]> <a> &b; </a> (note there is no standalone declaration there anymore). But neveretheless, that's not the heart of the problem, as the it lies with b.xml. b.xml is marked as standalone, which does _not_ refer to any externalities. Thus, it is a correct standalone document. Please note the error message given by xmltproc above: it gives an error for b.xml, not for a.xml. Second, the standalone declaration in the XML specification states that there are no external _markup declarations_ to an XML file. In my example, there are no external markup declarations, in fact, there aren't any markup declarations at all, neither in the original a.xml, the new a.xml or b.xml. An external parsed entity is not a markup declaration. Thus, all the sample XML files presented here are valid XML files. Moreover, there is a validity constraint in the XML specification for the standalone document declaration, but the above error is given also if the --novalid flag is used to invoke xsltproc. This flag, according to the documentation, turns off validation. So if there in fact would be external markup declarations (which there aren't), then the files should still pass with --novalid. (Actually the validiy constraint is not even that picky, see the XML specification.) Before your being 100% sure, please read up on the XML specification. In this case, section 2.9 Standalone Document Declaration, http://www.w3.org/TR/2004/REC-xml-20040204/#sec-rmd
In your initial example a.xml was not well-formed. Processing could not work, that's normal. The fact taht you're hitting the problem on b instead of a firstr is just an implementation issue. In the second example it's still not well formed because your external parsed entity is not well formed. You're reading the wrong part of the spec: http://www.w3.org/TR/2004/REC-xml-20040204/#TextEntities "External parsed entities SHOULD each begin with a text declaration." [77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>' A text declaration is *not* an XML Declaration, it doesn't allow standalone and the encoding is mandatory instead of optionnal. Again, libxslt is right (or rather libxml2), it reports the failure correctly. W.r.t. --novalid, you are confusing validity checking and reporting with well-formedness checking. The first one consist of checking that the document conforms to the DTD, it's optional, libxml2 does not do it by default and libxslt doesn't either since the XSLT (or rather XPAth) spec does not require it. The error you are seeing are well-formedness errors, i.e. fatal errors to conform to the XML grammar. They must be checked and the parser is forbidden to recover from them. The spec is not trivial. Try to have a full grasp of it, libxml2 is compliant, really, and I know what I'm talking about since I'm in the W3C group which maintains the specification. There might be bugs, but not trivial ones. Daniel
Indeed, b.xml fails the TextDecl rule, that's the problem. Thanks for pointing it out. (But than again, this has nothing to do with the standalone validity constraint, as posted in your first reply. On the contrary, standalone documents are free to reference external parsed entities.)