After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 790226 - Update Entities' URLs to fix FTBFS in offline environment
Update Entities' URLs to fix FTBFS in offline environment
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: Compilation
1.12.x
Other Linux
: Normal normal
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2017-11-12 04:06 UTC by Dmitry Smirnov
Modified: 2018-04-26 12:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch to update URLs (1.75 KB, patch)
2017-11-12 04:06 UTC, Dmitry Smirnov
none Details | Review

Description Dmitry Smirnov 2017-11-12 04:06:54 UTC
Created attachment 363417 [details] [review]
patch to update URLs

As reported in https://github.com/GNOME/gnumeric/pull/1

Outdated links to Entities documents cause Gnumeric to FTBFS in offline environment (even when "w3c-sgml-lib" package is installed):

~~~~
Error: Could not parse document:
 connection refused ..//C/gnumeric.xml:51:  I/O  warning :  failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent"
   %isopub;
           ^
  %isopub;
          ^
 connection refused ..//C/gnumeric.xml:56:  I/O  warning :  failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/ent/isonum.ent"
   %isonum;
           ^
  %isonum;
          ^
 connection refused ..//C/gnumeric.xml:61:  I/O  warning :  failed to load external entity "http://www.oasis-open.org/docbook/xml/4.5/ent/isogrk1.ent"
   %isogrk1;
            ^
~~~~

The attached patch fixes canonical URLs to Entities documents.
Comment 1 Andreas J. Guelzow 2017-11-12 18:52:45 UTC
http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent http://www.oasis-open.org/docbook/xml/4.5/ent/isonum.ent and http://www.oasis-open.org/docbook/xml/4.5/ent/isogrk1.ent appear to work just fine. I see no point in changing them.

Since we are using docbook to generate teh documentation I think it is cleaner to stay with the versions of these files included in docbook.
Comment 2 Andreas J. Guelzow 2017-11-12 19:13:44 UTC
These links are only used when the documentation is build so having them only work online seems to me to be acceptable.
Comment 3 Dmitry Smirnov 2017-11-13 20:58:59 UTC
If you look inside .ent documents you will see that they refer to themselves by URLs (from www.w3.org) that I suggest to use.

As for online build, there is no such option in Debian. Do you suggest not to build documentation at all?
Offline builds are not just more secure but they are also more reliable (and reproducible) because build does not depend on connectivity or availability of external resources.

I had to introduce this patch downstream in Debian in order to be able to build Gnumeric...
Comment 4 Morten Welinder 2017-11-15 22:26:04 UTC
Andreas: does anything break if we change?
Comment 5 Andreas J. Guelzow 2017-11-15 23:33:18 UTC
Nothing should break if we change.

In fact the full specification is:
<!ENTITY % isopub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN//XML"
           "http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent">
where "http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent" is the system identifier and "ISO 8879:1986//ENTITIES Publishing//EN//XML" the public identifier.

The system identifier should only be used if the resource is not locatable using the public identifier.  In an offline environment where "ISO 8879:1986//ENTITIES Publishing//EN//XML" is available, the system identifier should not be used. To quote from "Installing And Using An XML/SGML DocBook Editing Suite": "If the processing tools fail to find a mapping from the PUBLIC identifier to a SYSTEM identifier in the catalog file(s) they will fall back to using the SYSTEM identifier specified in the document."
Comment 6 Dmitry Smirnov 2017-11-16 10:11:19 UTC
Nothing will break of course. I just want to point out that if you open document 

  http://www.oasis-open.org/docbook/xml/4.5/ent/isopub.ent

you won't find "http://www.oasis-open.org" inside -- only canonical 
System identifier: http://www.w3.org/2003/entities/iso8879/isopub.ent

I suppose it would be better to use the very system identifier from the document.
At least we have one good reason to do so...
Comment 7 Morten Welinder 2017-11-17 01:58:35 UTC
"w3c-sgml-lib" on my Linux Mint contains three isopub.ent files,
two of them identical:

-rw-r--r-- 1 root root 6961 Mar 14  2011 /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-MathML3-20101021/isopub.ent
-rw-r--r-- 1 root root 6961 Mar 14  2011 /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-xml-entity-names-20100401/isopub.ent
-rw-r--r-- 1 root root 6594 Mar 14  2011 /usr/share/xml/w3c-sgml-lib/schema/dtd/XX-MathML2-20031104/iso8879/isopub.ent

# grep isopub /usr/share/xml/w3c-sgml-lib/schema/dtd/catalog.xml 
  <public publicId="-//W3C//ENTITIES Publishing//EN" uri="REC-MathML3-20101021/isopub.ent" />
  <system systemId="http://www.w3.org/Math/DTD/mathml3/isopub.ent" uri="REC-MathML3-20101021/isopub.ent" />
  <public publicId="-//W3C//ENTITIES Publishing//EN//XML" uri="REC-xml-entity-names-20100401/isopub.ent" />
  <system systemId="http://www.w3.org/2003/entities/2007/isopub.ent" uri="REC-xml-entity-names-20100401/isopub.ent" />

None of these match the public id we're looking for, so no local match.
The public id seems to have changed between the 2003 version we refer to
and the 2007 version that I end up with on my system.

I think we would get a local match if we referred to the 2007 version of
the file, but I don't see a way to make this work with both 2003 and 2007
versions.

Dmitry: what do you get for the above grep command?
Comment 8 Morten Welinder 2018-04-22 03:19:40 UTC
Dmitry: ping?
Comment 9 Dmitry Smirnov 2018-04-25 09:54:34 UTC
This is with w3c-sgml-lib/1.3-2:

~~~~
$ cd /usr/share/xml/w3c-sgml-lib/schema/dtd/ && ack isopub

xml.soc
93:       "REC-MathML3-20101021/isopub.ent"
94:SYSTEM "http://www.w3.org/Math/DTD/mathml3/isopub.ent"
95:       "REC-MathML3-20101021/isopub.ent"
1386:       "REC-xml-entity-names-20100401/isopub.ent"
1387:SYSTEM "http://www.w3.org/2003/entities/2007/isopub.ent"
1388:       "REC-xml-entity-names-20100401/isopub.ent"

XX-MathML2-20031104/xhtml-math11-f.dtd
8947:<!ENTITY % ent-isopub
8949:             "iso8879/isopub.ent" >
8952:     File isopub.ent produced by the XSL script characters.xsl

XX-MathML2-20031104/iso8879/isopub.ent
3:     File isopub.ent produced by the XSL script characters.xsl

XX-MathML2-20031104/mathml2.dtd
2128:<!ENTITY % ent-isopub
2130:             "iso8879/isopub.ent" >
2131:%ent-isopub;

catalog.xml
36:  <public publicId="-//W3C//ENTITIES Publishing//EN" uri="REC-MathML3-20101021/isopub.ent" />
37:  <system systemId="http://www.w3.org/Math/DTD/mathml3/isopub.ent" uri="REC-MathML3-20101021/isopub.ent" />
538:  <public publicId="-//W3C//ENTITIES Publishing//EN//XML" uri="REC-xml-entity-names-20100401/isopub.ent" />
539:  <system systemId="http://www.w3.org/2003/entities/2007/isopub.ent" uri="REC-xml-entity-names-20100401/isopub.ent" />

REC-MathML3-20101021/isopub.ent
3:     File isopub.ent produced by the XSL script entities.xsl
32:       System identifier: http://www.w3.org/2003/entities/2007/isopub.ent
39:       <!ENTITY % isopub PUBLIC
41:         "http://www.w3.org/2003/entities/2007/isopub.ent"
43:       %isopub;

REC-MathML3-20101021/mathml3.dtd
76:<!ENTITY % isopub PUBLIC "-//W3C//ENTITIES Publishing//EN" "isopub.ent">
77:%isopub;

REC-xml-entity-names-20100401/isopub.ent
3:     File isopub.ent produced by the XSL script entities.xsl
32:       System identifier: http://www.w3.org/2003/entities/2007/isopub.ent
39:       <!ENTITY % isopub PUBLIC
41:         "http://www.w3.org/2003/entities/2007/isopub.ent"
43:       %isopub;

REC-xml-entity-names-20100401/w3centities.ent
46:<!ENTITY % isopub PUBLIC "-//W3C//ENTITIES Publishing//EN" "isopub.ent">
47:%isopub;

REC-xml-entity-names-20100401/htmlmathml.ent
56:<!ENTITY % isopub PUBLIC "-//W3C//ENTITIES Publishing//EN" "isopub.ent">
57:%isopub;
~~~~
Comment 10 Morten Welinder 2018-04-26 01:33:57 UTC
A bit of a mess.

It's solvable for xmllint which supports --nonet and --path.
It does not appear easily solvable for itstool which does not.

I can fake --nonet by setting HTTP_PROXY=http://127.0.0.1
That will take care of itstool's urge to contact the outside world which,
I assume, is a red flag all by itself.

Then, of course, it fails.  Since both you and I have these public IDs
available in the installed catalog:

    "-//W3C//ENTITIES Publishing//EN"
    "-//W3C//ENTITIES Publishing//EN//XML"

and since these probably are the modern-day versions of what we use, I am
guessing we can update to one of those.

Would that please everyone?
Comment 11 Morten Welinder 2018-04-26 01:44:33 UTC
If not, we'll have to have configure look for an entry.  That's just too
much effort to throw at the problem.
Comment 12 Morten Welinder 2018-04-26 12:32:18 UTC
OpenSuSE has "ISO 8879:1986//ENTITIES Publishing//EN//XML" and not the
others.  I.e., a configure solution is needed.
Comment 13 Morten Welinder 2018-04-26 12:57:39 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Once that release is available, you may want to check for a software upgrade provided by your Linux distribution.