After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 541529 - xsl:output/@encoding may produce character references in element and attribute names
xsl:output/@encoding may produce character references in element and attribut...
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: general
2.6.x
Other All
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2008-07-04 10:06 UTC by Michael Ludwig
Modified: 2021-07-05 13:24 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Michael Ludwig 2008-07-04 10:06:42 UTC
Please describe the problem:
Using xsl:output/@encoding, we can control the output encoding.

Characters in text nodes that are not available in the selected encoding are converted to character references. Characters in element or attribute names that are not available in the selected encoding, however, must result in run-time errors.

LibXSLT does not report run-time errors. Instead, it converts said element and attribute names using character references, which are illegal in XML element and attribute names.

Steps to reproduce:
Run an identity transform on an XML document containing element and attribute names that are not available in the specified output encoding.

mludwig@forelle:~/Werkstatt/xsl > cat Uebelkeit.xml 
<Urmel>
        <Vorspeise>Süßkirschen mit Käsesoße</Vorspeise>
        <Übelkeit möglicherweise="beträchtlich"/>
</Urmel>
mludwig@forelle:~/Werkstatt/xsl > cat Uebelkeit-output-encoding.xsl 
<xsl:transform version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output encoding="US-ASCII"/>
        <xsl:template match="@*|node()">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:transform>

mludwig@forelle:~/Werkstatt/xsl > xsltproc Uebelkeit-output-encoding.xsl Uebelkeit.xml

Actual results:
Invalid XML is output. It is invalid in containing character references in element and attribute names.

<?xml version="1.0" encoding="US-ASCII"?>
<Urmel>
        <Vorspeise>S&#252;&#223;kirschen mit K&#228;seso&#223;e</Vorspeise>
        <&#220;belkeit m&#246;glicherweise="betr&#228;chtlich"/>
</Urmel>

Expected results:
A run-time error should be reported. For example, the processor Saxon (version 9.0.0.4, Java) says:

  SERE0008: Element name contains a character (decimal + 220) not available in the selected encoding
Transformation failed: Run-time errors were reported

Does this happen every time?
Yes.

Other information:
Comment 2 Daniel Veillard 2008-07-04 11:43:21 UTC
Not a libxslt bug, a libxml2 one. Or rather an efficiency trade-off
as explained on-list

-----------------------------------------------
  You ask for something impossible. You get a non-xml document instead of
getting an immediate failure.
  It's a trade-off, unrelated to libxslt, it's actually in libxml2.
The transcoding is done on a preserialized UTF-8 document (or document
fragment), detecting the error means each time a character is not serializable
in the target encoding, when issuing the escaped sequence to do a rewind lookup
and try to guess (it's guessing because at that point you're manipulating
strings there is no notion of document structure) if you're within 
markup or within content.
  Basically it makes everybody pay a rather hight cost for the few who asked
for something impossible.
  The current state is there since the beginning of libxml2 (nearly a decade)
so the bug is extremely uncommon. This makes me even less comfortable with
the expansion of the cost. Again, it's a trade-off, a concious one, for 
more informations see libxml2 encoding.c around line 2057 that's where the
escaping is done. If you see another way to handle this not penalizing
heavilly the normal process, I'm all for fixing this. But right now I
don't see a solution.
-----------------------------------------------

Daniel
Comment 3 GNOME Infrastructure Team 2021-07-05 13:24:18 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.