GNOME Bugzilla – Bug 92438
problem (?) with encoding and URL escaping (ISO-8859-1)
Last modified: 2009-08-15 18:40:50 UTC
Hi, Working with ISO-8859-1 (input, stymesheet, output), my sample stylesheet take a text node and set it as an attribute. The string must be URL escaped, but the conversion is bad, for example : "é" is converted as "C3%A9" I think it is due to the intermediate UTF-8 conversion which convert "é" in something like "é", and that string is URL encoded. In conclusion, the URL encoding should be done *after* the UTF8 -> ISO-8859-1 conversion, *not before*. (PS: by URL unescaping the attributes values in following output, then converting to ISO-8859-1 then URL escaping, i got the right result) Consider file.xml : ~~~~~~~~~~~~~ <?xml version='1.0' encoding='iso-8859-1'?> <main><title>Accéder à l'organisation des catégories</title></main> ~~~~~~~~~~~~~ and the following stylesheet (taking the text node and setting as attribute) : ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html" encoding="ISO-8859-1"/> <xsl:template match="main"> <a> <xsl:attribute name="href"><xsl:value-of select="title"/></xsl:attribute> <xsl:apply-templates/> </a> </xsl:template> </xsl:stylesheet> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ xsltproc file.xsl file.xml gives : ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <a href="Acc%C3%A9der%20%C3%A0%20l'organisation%20des%20cat%C3%A9gories">Accéder à l'organisation des catégories</a> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hum, no, I'm afraid you're wrong. The portable way to do it is to have the UTF8 encoded version being URI escaped. The fundamental reason is that URLs ought to be context independant (it's one of the principles of the Web ask Tim Berners' Lee !). In practice you may have to ask the XSLT stylesheet to output UTF-8 encoded result so that older browsers don't get confused. See http://www.w3.org/TR/xptr/#uri-escaping http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1 Always a good idea to read the specs first :-) Daniel