After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 92438 - problem (?) with encoding and URL escaping (ISO-8859-1)
problem (?) with encoding and URL escaping (ISO-8859-1)
Status: VERIFIED NOTABUG
Product: libxslt
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
Daniel Veillard
Depends on:
Blocks:
 
 
Reported: 2002-09-03 23:21 UTC by Marc-Olivier Bernard
Modified: 2009-08-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Marc-Olivier Bernard 2002-09-03 23:21:21 UTC
Hi,

Working with ISO-8859-1 (input, stymesheet, output), my sample stylesheet
take a text node and set it as an attribute. The string must be URL
escaped, but the conversion is bad, for example :

  "é" is converted as "C3%A9"

I think it is due to the intermediate UTF-8 conversion which convert "é" in
something like "é", and that string is URL encoded. In conclusion, the URL
encoding should be done *after* the UTF8 -> ISO-8859-1 conversion, *not
before*.

(PS: by URL unescaping the attributes values in following output, then
converting to ISO-8859-1 then URL escaping, i got the right result) 

Consider file.xml :

~~~~~~~~~~~~~

<?xml version='1.0' encoding='iso-8859-1'?>
<main><title>Accéder à l'organisation des catégories</title></main>

~~~~~~~~~~~~~

and the following stylesheet (taking the text node and setting as attribute) :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
            version="1.0">
<xsl:output method="html" encoding="ISO-8859-1"/>

<xsl:template match="main">
<a>
<xsl:attribute name="href"><xsl:value-of select="title"/></xsl:attribute>
<xsl:apply-templates/>
</a>
</xsl:template>

</xsl:stylesheet>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

xsltproc file.xsl file.xml gives :

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<a
href="Acc%C3%A9der%20%C3%A0%20l'organisation%20des%20cat%C3%A9gories">Accéder
à l'organisation des catégories</a>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Comment 1 Daniel Veillard 2002-09-04 07:24:38 UTC
Hum, no, I'm afraid you're wrong. The portable way to do it is to
have the UTF8 encoded version being URI escaped. The fundamental
reason is that URLs ought to be context independant (it's one
of the principles of the Web ask Tim Berners' Lee !).
In practice you may have to ask the XSLT stylesheet to output
UTF-8 encoded result so that older browsers don't get confused.
  See 
   http://www.w3.org/TR/xptr/#uri-escaping
   http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1

  Always a good idea to read the specs first :-)

Daniel