After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 303290 - xml catalog prefer="public" not supported
xml catalog prefer="public" not supported
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: catalog
2.6.19
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2005-05-06 19:11 UTC by Bob Stayton
Modified: 2021-07-05 13:26 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Bob Stayton 2005-05-06 19:11:57 UTC
When XML catalogs are used, and the catalog has a prefer="public" on the 
catalog element or the group element, it doesn't work.  That is, the catalog 
is not consulted when a System ID works and there is a catalog entry for the 
PUBLIC id.  Also, if the System ID does not work, if the catalog is consulted, 
the match on system ID is preferred.

Here is a catalogtest.xml:
<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">

<catalog prefer="public" xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

<group prefer="public">

  <public
      publicId="-//SAGEHILL//General Entities//EN"
      uri="mysection2.ent"/>

  <system
      systemId="bogus.ent"
      uri="mysection2.ent"/>

  <system
      systemId="bogus1.ent"
      uri="mysection1.ent"/>

</group>


</catalog>

Here is a main test document entitytest.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article
[
<!ELEMENT article (section*)>
<!ELEMENT section (title?)>
<!ELEMENT title (#PCDATA)>
<!ENTITY good PUBLIC "-//SAGEHILL//General Entities//EN" "missing.ent">
<!ENTITY good2 PUBLIC "-//SAGEHILL//General Entities//EN" "bogus.ent">
<!ENTITY bad PUBLIC "-//SAGEHILL//General Entities//EN" "mysection1.ent">
<!ENTITY bad2 PUBLIC "-//SAGEHILL//General Entities//EN" "bogus1.ent">
]>
<article>
&good;

&good2;

&bad;

&bad2;
</article>

Here is one system entity file mysection1.ent:

<?xml version="1.0" encoding="utf-8"?>
<section>
<title>section 1 title</title>
</section>

Here is a second system entity file mysection2.ent:
<?xml version="1.0" encoding="utf-8"?>
<section>
<title>section 2 title</title>
</section>

Here is the xmllint version (on Windows XP):
c:\xml\libxml\xmllint.exe: using libxml version 20619CVS2407
   compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude 
Ic
onv Unicode Regexps Automata Schemas Modules

Here is the command I tested with:
XML_DEBUG_CATALOG=1 \
XML_CATALOG_FILES="catalogtest.xml" \
xmllint --noent --valid entitytest.xml  > result.xml

Here is the result.xml output:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article [
<!ELEMENT article (section)*>
<!ELEMENT section (title)?>
<!ELEMENT title (#PCDATA)>
<!ENTITY good PUBLIC "-//SAGEHILL//General Entities//EN" "missing.ent">
<!ENTITY good2 PUBLIC "-//SAGEHILL//General Entities//EN" "bogus.ent">
<!ENTITY bad PUBLIC "-//SAGEHILL//General Entities//EN" "mysection1.ent">
<!ENTITY bad2 PUBLIC "-//SAGEHILL//General Entities//EN" "bogus1.ent">
]>
<article>

<section>
<title>section 2 title</title>
</section>

<section>
<title>section 2 title</title>
</section>

<section>
<title>section 1 title</title>
</section>

<section>
<title>section 1 title</title>
</section>

</article>

If prefer="public" were working, then all of these should say
"section 2 title".
Comment 1 Heiko Oberdiek 2005-05-11 11:58:22 UTC
Hello,

As far as I understand this, libxml2 tries to implement the resolution
algorithm, specified in:
  http://www.oasis-open.org/committees/entity/spec-2001-08-06.html

In your example, in each case a publicId and a systemId are provided.
So the algorithm  "7.1.2. Resolution of External Identifiers" has to
be applied:

1. initial catalog setup, clear.
2. for cases "good2" and "bad2" system id exists in the catalog,
   "bogus.ent" and "bogus2.ent" that redirect to "mysection2.ent"
   and "mysection1.ent".
3. catalog does not contain rewriteSystem.
4. catalog does not contain delegateSystem.
5. public id is provided for &good; and &bad;, thus in both
   cases the rule for <public> matches and the resolution result
   should be "mysection2.ent" in *both* cases.

The real problem is that this is not the case for &bad;.
The output of XML_DEBUG_CATALOG shows that resolve algorithm
is never called for this entity. The reason is that xmlIO.c's
xmlDefaultExternalEntityLoader does not apply catalog lookup
for system identifier that exists:

    /*
     * If the resource doesn't exists as a file,
     * try to load it from the resource pointed in the catalogs
     */

Thus one of the purposes and advantages in using catalogs is
gone, to redirect/rewrite the location of a resource by means
of a catalog file. Thus I consider this the real bug in the
entity loader functions of xmlIO.c that is shown and detected
by your example files.

The "prefer" attribute, however, is only used for delegatePublic
in step 6 that is never reached in this szenario, because the
resolution was successful in previous steps. Thus the value of
"prefer" does not matter here.

See also "4.1.1. The prefer  attribute" of the specification.

Yours sincerely
  Heiko <oberdiek@uni-freiburg.de>
Comment 2 Daniel Veillard 2005-05-11 13:16:08 UTC
Thanks for this analysis.
I don't think I will change libxml2 behaviour. Forcing to round-trip 
on the catalog for a local resource directly referenced by a system URI
is more likely to cause troubles as:
   - being unexpected behaviour
   - adding catalog parsing cost to any single file reference
     (any time one would parse a single file with libxml2, this would
      require prior parsing of the catalog first)
that I think it makes sense as the general default case.

The entity resolver in xmlDefaultExternalEntityLoader can be
overrided trivially with a single API call, and applications can very
easilly force a different behaviour.
I'm not 100% sure the current behaviour can be considered an XML Catalog
failure, based on the Abstract of the spec which defines the intended 3
main use cases. 
There isn't really a conformance section in the spec. 
  http://www.oasis-open.org/committees/entity/spec-2001-08-06.html
 
Daniel
 
Comment 3 Bob Stayton 2005-05-18 17:32:16 UTC
I think this bug should be reopened.  I respectfully disagree with your 
assessment of the expected default behavior.  If a catalog file is specified, 
it must be possible for the catalog to override a hardcoded reference 
specified in a file, even if it is to a local resource that exists.  That 
means every reference goes through the catalog first to see if it should be 
remapped to a new location. 

I asked Norman Walsh, the author of the XML Catalog specification, about it, 
and he agrees that all references should go through the catalog first.  That 
is the behavior of the Apache Java resolver classes (which is not surprising 
since he wrote them).

I agree that the specification could be more clear on its conformance 
standards.  For example, I have another question in to him about how the spec 
describes the resolution of the "prefer" attribute.
Comment 4 Daniel Veillard 2005-05-18 21:59:42 UTC
"If a catalog file is specified"
On Linux a catalog file is *always* present.
This was done on the assumption that this would not cost
penalty for simple local file access.
If this is the case I can't revisit that decision, but
I'm not ready to inflict double parsing cost because of this
while the limited change in behaviour can be trivially overrided
using the public API, for the specific applications which may 
really need this.

Lot of applications use libxml2 to just parse a single config file.
Forcing all of them to parse 2 files even if they don't need a 
catalog sounds far too much of a cost for something which never
raised an real error reported by any application. This does not sound
reasonnable to me, what application has a trouble with this ? 
Why can't that application be modified in a trivial way if it really
need that support.
If I had known about this aspect of the spec before implementing and
pushing for it, I would first have objected to this aspect of the
spec and then would also have made catalog support in libxml2 at
user option and not the default.

  I stand on my position unless you can provide a reasonable justification
for the inherent cost of what you're suggesting.

Daniel
Comment 5 Bob Stayton 2005-05-18 22:46:00 UTC
I agree that not all libxml2 applications should be forced to use a catalog if 
it is simply present on a system.  I would like to change this request so that 
the default behavior of libxml2 with regards to catalogs is not changed, but 
that the two applications xmllint and xsltproc adopt this behavior.  You are 
correct that the presence of a catalog file on a system (like 
Linux's /etc/catalog) should not be the determining factor of whether the 
catalog is used.  Rather, if either application specifies a catalog file 
through the XML_CATALOG_FILES environment variable, then it should be used for 
all lookups needed by that application.  I believe that is the expected 
behavior of users of those two applications.

  
Comment 6 Bob Stayton 2005-05-19 07:00:06 UTC
I thought perhaps a specific use case would make this request more clear.

The DocBook XML DTD has several modules.  One of the modules is a placeholder 
designed to contain user-defined entities.  Here is how it is declared in the 
docbookx.dtd file:

<!ENTITY % dbgenent PUBLIC
"-//OASIS//ENTITIES DocBook Additional General Entities V4.4//EN"
"dbgenent.mod">
%dbgenent;

The dbgenent.mod file that ships with the DTD is empty, because it is just a 
placeholder.  One could edit the original file, but only if one has write 
access to the file on the system.  But that still leaves you with just one 
collection of user-defined entities.

My application requires reusing the same DocBook files with different 
conditional text.  One way to implement conditional text is with general 
entities.  For example, one could define a companyname entity, with the idea 
of substituting the actual company name at runtime.  So I would like to be 
able to define several collections of entities, with each collection 
containing the same entity names but with different expansion text.  I would 
like to be able to select the collection at runtime.  

So my runtime specifies a catalog file that maps the PUBLIC id 
"-//OASIS//ENTITIES DocBook Additional General Entities V4.4//EN"
to one of my entity collection files.  By choosing a different catalog at 
runtime, I can choose a different collection of entity values.  Unfortunately, 
this doesn't work in xmllint and xsltproc.  Because the default dbgenent.mod 
file exists, the catalog is not consulted, and I don't get my entities.

As I stated in my earlier message, I believe that users of these two 
applications expect all of their system references to be looked up in their 
catalog, and falling back to the default resource only if no catalog entry 
matches.  Even if you don't want to make this the default behavior for these 
two applications, I believe there should be a command line option to specify 
this behavior.  

Comment 7 Bob Stayton 2005-05-25 08:43:10 UTC
I see that the status of this bug is still NEEDINFO.   Is there is any other 
information that I can supply in order to get some resolution?  As I said, I'm 
only looking to change the behavior of the applications xmllint and xsltproc.
Comment 8 Daniel Veillard 2005-05-25 09:33:24 UTC
Hum, no, it should be switched to NEW,

Daniel
Comment 9 GNOME Infrastructure Team 2021-07-05 13:26:53 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.