GNOME Bugzilla – Bug 790226
Update Entities' URLs to fix FTBFS in offline environment
Last modified: 2018-04-26 12:57:39 UTC
Created attachment 363417 [details] [review] patch to update URLs As reported in https://github.com/GNOME/gnumeric/pull/1 Outdated links to Entities documents cause Gnumeric to FTBFS in offline environment (even when "w3c-sgml-lib" package is installed): ~~~~ Error: Could not parse document: connection refused ..//C/gnumeric.xml:51: I/O warning : failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent" %isopub; ^ %isopub; ^ connection refused ..//C/gnumeric.xml:56: I/O warning : failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/ent/isonum.ent" %isonum; ^ %isonum; ^ connection refused ..//C/gnumeric.xml:61: I/O warning : failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/ent/isogrk1.ent" %isogrk1; ^ ~~~~ The attached patch fixes canonical URLs to Entities documents.
http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent http://www.oasis-open.org/docbook/xml/4.5/ent/isonum.ent and http://www.oasis-open.org/docbook/xml/4.5/ent/isogrk1.ent appear to work just fine. I see no point in changing them. Since we are using docbook to generate teh documentation I think it is cleaner to stay with the versions of these files included in docbook.
These links are only used when the documentation is build so having them only work online seems to me to be acceptable.
If you look inside .ent documents you will see that they refer to themselves by URLs (from www.w3.org) that I suggest to use. As for online build, there is no such option in Debian. Do you suggest not to build documentation at all? Offline builds are not just more secure but they are also more reliable (and reproducible) because build does not depend on connectivity or availability of external resources. I had to introduce this patch downstream in Debian in order to be able to build Gnumeric...
Andreas: does anything break if we change?
Nothing should break if we change. In fact the full specification is: <!ENTITY % isopub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN//XML" "http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent"> where "http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent" is the system identifier and "ISO 8879:1986//ENTITIES Publishing//EN//XML" the public identifier. The system identifier should only be used if the resource is not locatable using the public identifier. In an offline environment where "ISO 8879:1986//ENTITIES Publishing//EN//XML" is available, the system identifier should not be used. To quote from "Installing And Using An XML/SGML DocBook Editing Suite": "If the processing tools fail to find a mapping from the PUBLIC identifier to a SYSTEM identifier in the catalog file(s) they will fall back to using the SYSTEM identifier specified in the document."
Nothing will break of course. I just want to point out that if you open document http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent you won't find "http://www.oasis-open.org" inside -- only canonical System identifier: http://www.w3.org/2003/entities/iso8879/isopub.ent I suppose it would be better to use the very system identifier from the document. At least we have one good reason to do so...
"w3c-sgml-lib" on my Linux Mint contains three isopub.ent files, two of them identical: -rw-r--r-- 1 root root 6961 Mar 14 2011 /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-MathML3-20101021/isopub.ent -rw-r--r-- 1 root root 6961 Mar 14 2011 /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-xml-entity-names-20100401/isopub.ent -rw-r--r-- 1 root root 6594 Mar 14 2011 /usr/share/xml/w3c-sgml-lib/schema/dtd/XX-MathML2-20031104/iso8879/isopub.ent # grep isopub /usr/share/xml/w3c-sgml-lib/schema/dtd/catalog.xml <public publicId="-//W3C//ENTITIES Publishing//EN" uri="REC-MathML3-20101021/isopub.ent" /> <system systemId="http://www.w3.org/Math/DTD/mathml3/isopub.ent" uri="REC-MathML3-20101021/isopub.ent" /> <public publicId="-//W3C//ENTITIES Publishing//EN//XML" uri="REC-xml-entity-names-20100401/isopub.ent" /> <system systemId="http://www.w3.org/2003/entities/2007/isopub.ent" uri="REC-xml-entity-names-20100401/isopub.ent" /> None of these match the public id we're looking for, so no local match. The public id seems to have changed between the 2003 version we refer to and the 2007 version that I end up with on my system. I think we would get a local match if we referred to the 2007 version of the file, but I don't see a way to make this work with both 2003 and 2007 versions. Dmitry: what do you get for the above grep command?
Dmitry: ping?
This is with w3c-sgml-lib/1.3-2: ~~~~ $ cd /usr/share/xml/w3c-sgml-lib/schema/dtd/ && ack isopub xml.soc 93: "REC-MathML3-20101021/isopub.ent" 94:SYSTEM "http://www.w3.org/Math/DTD/mathml3/isopub.ent" 95: "REC-MathML3-20101021/isopub.ent" 1386: "REC-xml-entity-names-20100401/isopub.ent" 1387:SYSTEM "http://www.w3.org/2003/entities/2007/isopub.ent" 1388: "REC-xml-entity-names-20100401/isopub.ent" XX-MathML2-20031104/xhtml-math11-f.dtd 8947:<!ENTITY % ent-isopub 8949: "iso8879/isopub.ent" > 8952: File isopub.ent produced by the XSL script characters.xsl XX-MathML2-20031104/iso8879/isopub.ent 3: File isopub.ent produced by the XSL script characters.xsl XX-MathML2-20031104/mathml2.dtd 2128:<!ENTITY % ent-isopub 2130: "iso8879/isopub.ent" > 2131:%ent-isopub; catalog.xml 36: <public publicId="-//W3C//ENTITIES Publishing//EN" uri="REC-MathML3-20101021/isopub.ent" /> 37: <system systemId="http://www.w3.org/Math/DTD/mathml3/isopub.ent" uri="REC-MathML3-20101021/isopub.ent" /> 538: <public publicId="-//W3C//ENTITIES Publishing//EN//XML" uri="REC-xml-entity-names-20100401/isopub.ent" /> 539: <system systemId="http://www.w3.org/2003/entities/2007/isopub.ent" uri="REC-xml-entity-names-20100401/isopub.ent" /> REC-MathML3-20101021/isopub.ent 3: File isopub.ent produced by the XSL script entities.xsl 32: System identifier: http://www.w3.org/2003/entities/2007/isopub.ent 39: <!ENTITY % isopub PUBLIC 41: "http://www.w3.org/2003/entities/2007/isopub.ent" 43: %isopub; REC-MathML3-20101021/mathml3.dtd 76:<!ENTITY % isopub PUBLIC "-//W3C//ENTITIES Publishing//EN" "isopub.ent"> 77:%isopub; REC-xml-entity-names-20100401/isopub.ent 3: File isopub.ent produced by the XSL script entities.xsl 32: System identifier: http://www.w3.org/2003/entities/2007/isopub.ent 39: <!ENTITY % isopub PUBLIC 41: "http://www.w3.org/2003/entities/2007/isopub.ent" 43: %isopub; REC-xml-entity-names-20100401/w3centities.ent 46:<!ENTITY % isopub PUBLIC "-//W3C//ENTITIES Publishing//EN" "isopub.ent"> 47:%isopub; REC-xml-entity-names-20100401/htmlmathml.ent 56:<!ENTITY % isopub PUBLIC "-//W3C//ENTITIES Publishing//EN" "isopub.ent"> 57:%isopub; ~~~~
A bit of a mess. It's solvable for xmllint which supports --nonet and --path. It does not appear easily solvable for itstool which does not. I can fake --nonet by setting HTTP_PROXY=http://127.0.0.1 That will take care of itstool's urge to contact the outside world which, I assume, is a red flag all by itself. Then, of course, it fails. Since both you and I have these public IDs available in the installed catalog: "-//W3C//ENTITIES Publishing//EN" "-//W3C//ENTITIES Publishing//EN//XML" and since these probably are the modern-day versions of what we use, I am guessing we can update to one of those. Would that please everyone?
If not, we'll have to have configure look for an entry. That's just too much effort to throw at the problem.
OpenSuSE has "ISO 8879:1986//ENTITIES Publishing//EN//XML" and not the others. I.e., a configure solution is needed.
This problem has been fixed in our software repository. The fix will go into the next software release. Once that release is available, you may want to check for a software upgrade provided by your Linux distribution.