GNOME Bugzilla – Bug 122001
Incorrect behavior when resolving an unparsed entity using a catalog
Last modified: 2009-08-15 18:40:50 UTC
When resolving an unparsed entity defined by a relative URI, xsltproc returns the absolute URI based on the catalog instead of the one based on the real URI. This behavior is counter-intuitive; catalogs should be used in a transparent way. Here's an example of this problem: * My XML file: <?xml version="1.0"?> <!DOCTYPE para PUBLIC "-//Norman Walsh//DTD Website Full V2.4.0//EN" "http://docbook.sourceforge.net/release/website/2.4.0/website-full.dtd" [ <!ENTITY % entities SYSTEM "http://www.vinc17.org/www.ent"> %entities; ]> <para><olink targetdocent="local.index.en">test</olink></para> * My XSLT file: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="olink"> <a href="{unparsed-entity-uri(@targetdocent)}"> <xsl:apply-templates/> </a> </xsl:template> </xsl:stylesheet> * http://www.vinc17.org/www.ent is a file in which I define unparsed entities that are relative to http://www.vinc17.org/. For instance: <!ENTITY local.index.en SYSTEM "index.en.html" NDATA XML> As I don't want to connect to http://www.vinc17.org/ to generate the URI, I use a catalog with the following entry: <rewriteSystem systemIdStartString="http://www.vinc17.org/www.ent" rewritePrefix="file:///home/lefevre/wd/www-new/www.ent"/> (in fact, http://www.vinc17.org/www.ent doesn't even exist in the reality, however the XSLT processor doesn't have to know that). But then, xsltproc generates the following file: <?xml version="1.0"?> <a href="file:///home/lefevre/wd/www-new/index.en.html">test</a> instead of: <?xml version="1.0"?> <a href="http://www.vinc17.org/index.en.html">test</a>
Well, I disagree. Your entity is defined as an URI-Reference. It uses the base which is the URI of the resource containing it. But when you fetch from a catalog I really think that the base becomes the one from the local copy and not the one from the entity without catalog resolution. I have heard that discussed and stated clearly in W3C groups but I'm unable to find normative prose about this (though RFC 2396 really insists the base comes from the resource itself). Another example is when you get an HTTP redirect for a resource, the base herited is really the one from the redirected resource not the initial URI. If you really want a definitive answer, the best is to ask Norman Walsh, heh he may say I'm wrong, but I doubt it :-) Daniel
Well, this basically means that catalogs are more than a cacheing system, i.e. a redirection system. This is quite annoying: if I want the real URI to be returned, then I shouldn't use catalogs and I would need a working Internet connection (this is not always the case with a laptop in particular). So, to solve the problem, I can see 3 possibilities (as a wishlist): 1) libxslt implements an option to see catalogs either as a redirection system or as a cacheing system; 2) libxslt implements a separate cacheing system. 3) libxslt uses a separate cacheing system. What do you think of that? Since the XML/XSLT standards aren't really clear (and nothing says what catalogs are), (1) would be a good idea IMHO. In any case, I think that something should be said in the documentation.
Confirmed by Norm, Daniel