GNOME Bugzilla – Bug 553560
Sort items according to @lang xsl:sort attribute
Last modified: 2021-07-05 11:01:04 UTC
Please describe the problem: I translated the 2.24 release notes to Hungarian, and the supported languages' list is now sorted alphabetically in the translation: http://library.gnome.org/misc/release-notes/2.24/index.html.hu#rni18 However, it seems that it is not sorted based on the Hungarian sorting rules, but on the English ones. The visible part of the problem is that Estonian (Észt), is at the end of the list, instead of being between Danish (Dán) and Finnish (Finn). Steps to reproduce: 1. 2. 3. Actual results: Expected results: Does this happen every time? Other information:
That would be a xsltproc, as the xslt file has a correct lang attribute. FWIW I reduced it to: <languages> <lang>Dán</lang> <lang>Észt</lang> <lang>Finn</lang> </languages> And <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text" encoding="UTF-8" omit-xml-declaration="yes"/> <xsl:template match="languages"> <xsl:for-each select="lang"> <xsl:sort lang="hu"/> <xsl:value-of select="."/><xsl:text> </xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet> Output is: Dán Finn Észt And for the record, French would sort in the same way and has the same problem.
I have the same bug. :-)
I retried with the last SVN-snapshot on Linux. And an illegal lang gives no warning: <xsl:sort lang="eoxxZZ" select="."/>
Wieland inform me about this issue. I'm author of winapi port. Nick Wellnhofer is author. As example the French sort may not work as expected- GNU libc collating sequence don't define accented symbols and the order is based on position in unicode table. Bulgarian is not dictionary sort as the GNU libc define non-dictionary sort - it is lets call it "phonetic". So I would like to know you platform: GNU libc based, winapi or OS X ?
I use GNU libc based (debian linux). I sort languages in Esperanto: xml: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="lingvojsort.xsl"?> <lingvoj> <lingvo>dana</lingvo> <lingvo>ĝuanga</lingvo> <lingvo>ĉina</lingvo> <lingvo>zulua</lingvo> </lingvoj> xsl: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method='html' version='1.0' encoding='UTF-8' indent='yes'/> <xsl:template match="/"> <html><body> <h2>Lingvoj</h2> <table border="1"> <tr bgcolor="#9acd32"> <th align="left">Nomo</th> </tr> <xsl:for-each select="lingvoj/lingvo"> <xsl:sort lang="eo" select="."/> <tr> <td><xsl:value-of select="."/></td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet>
Maybe the bug is not related to xsltproc but glibc/locale: $ /bin/echo -e "ĉina\nzulua" | env LC_COLLATE=eo sort also is wrong.
I found by googling: sudo apt-get install language-pack-eo Now $ /bin/echo -e "ĉina\nzulua" | env LC_COLLATE=eo sort works fine. And $ locale -a shows also eo and eo.utf8 And there is a dir /usr/lib/locale/eo not only a file /usr/share/i18n/locales/eo and this file is new and changed. But /usr/local/bin/xsltproc lingvojsort.xsl lingvoj.xml > lingvoj.html still sorts wrong. Any help?
It should have been: $ /bin/echo -e "ĉina\nzulua" | LC_COLLATE=eo sort I made a simple text test: l.xml: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="l.xsl"?> <lingvoj> <lingvo>jida</lingvo> <lingvo>joruba</lingvo> <lingvo>ĝuanga</lingvo> <lingvo>ĉina</lingvo> <lingvo>zulua</lingvo> </lingvoj> l.xsl: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method='text' encoding='UTF-8'/> <xsl:template match="/">Lingvoj:<xsl:for-each select="lingvoj/lingvo"> <xsl:sort lang="eo" select="."/> <xsl:text> </xsl:text> <xsl:value-of select="."/> </xsl:for-each> <xsl:text> </xsl:text> </xsl:template> </xsl:stylesheet> To run the test: $ /usr/local/bin/xsltproc l.xsl l.xml Lingvoj: jida joruba zulua ĉina ĝuanga $ /usr/local/bin/xsltproc -V Using libxml 20631, libxslt 10122 and libexslt 813 xsltproc was compiled against libxml 20631, libxslt 10124 and libexslt 813 libxslt 10122 was compiled against libxml 20631 libexslt 813 was compiled against libxml 20631 Maybe I have to read the source of libxslt?
Looking into the source: $ grep -n "xsl:sort lang attribute" libxslt/xsltutils.c 988: /* TODO: xsl:sort lang attribute */ This is not yet implemented? That's a reason it's not working correct.
Wieland, thanks efforts. I would like to see section LC_COLLATE from you /usr/share/i18n/locales/eo .
Maybe the content was not changed, only the mod-time. LC_COLLATE copy "iso14651_t1" collating-symbol <ccirc> collating-symbol <gcirc> collating-symbol <hcirc> collating-symbol <jcirc> collating-symbol <scirc> collating-symbol <ubreve> reorder-after <c> <ccirc> reorder-after <g> <gcirc> reorder-after <h> <hcirc> reorder-after <j> <jcirc> reorder-after <s> <scirc> reorder-after <u> <ubreve> reorder-after <U0043> <U0108> <ccirc>;<CIR>;<CAP>;IGNORE % Ĉ reorder-after <U0063> <U0109> <ccirc>;<CIR>;<MIN>;IGNORE % ĉ reorder-after <U0047> <U011C> <gcirc>;<CIR>;<CAP>;IGNORE % Ĝ reorder-after <U0067> <U011D> <gcirc>;<CIR>;<MIN>;IGNORE % ĝ reorder-after <U0048> <U0124> <hcirc>;<CIR>;<CAP>;IGNORE % Ĥ reorder-after <U0068> <U0125> <hcirc>;<CIR>;<MIN>;IGNORE % ĥ reorder-after <U004A> <U0134> <jcirc>;<CIR>;<CAP>;IGNORE % Ĵ reorder-after <U006A> <U0135> <jcirc>;<CIR>;<MIN>;IGNORE % ĵ reorder-after <U0053> <U015C> <scirc>;<CIR>;<CAP>;IGNORE % Ŝ reorder-after <U0073> <U015D> <scirc>;<CIR>;<MIN>;IGNORE % ŝ reorder-after <U0055> <U016C> <ubreve>;<BRE>;<CAP>;IGNORE % Ŭ reorder-after <U0075> <U016D> <ubreve>;<BRE>;<MIN>;IGNORE % ŭ reorder-end END LC_COLLATE
The code xsltNewLocale in libxslt/xsltlocale.c expects something like "pt-br" "<language-country> and tries to convert it into "pt_BR.utf8" This says the comment: /* Convert something like "pt-br" to "pt_BR.utf8" */ But if the lang-attribute is only "eo" or "hu" the code has a bug. This becomes "eo........." no ending '\0' but random sh.t. Now I use "eo-us" and it works fine. Maybe Frederic can try "hu-hu" instead of "hu". The locale "hu_HU.utf8" must be working. Of course this is a bug. But fixing seems easy to me.
Wieland, I miss one important part from you report : libxslt 10122 (!). The initial sort support is added after release 1.1.24. Quote from change log: ----- Tue Jun 3 18:26:26 CEST 2008 Daniel Veillard <daniel@veillard.com> ... patch from Nick Wellnhofer adding xsl:sort lang support using the locale support from the C library. ... Tue May 13 17:51:05 CEST 2008 Daniel Veillard <daniel@veillard.com> * configure.in doc/*: release of 1.1.24 ----- Next about Esperanto. It is vendor specific language and is not in GNU libc HEAD. The "eo" file is not from "language-pack-eo". This package in a debian based distribution contain file /var/lib/locales/supported.d/eo with content: eo_US.UTF-8 UTF-8 eo.UTF-8 UTF-8 Note eo_US. The "US" as region ("subtag" from rfc3066.txt) is not appropriate. I don't know how to handle this. The simple solutions is to modify xsltlocale.c after call xsltDefaultRegion do not return NULL if default region is missing. As example: +++++ if (region == NULL) q--; else { *q++ = region[0]; *q++ = region[1]; } +++++ memcpy(q, ".utf8", 6); Other modification is to modify method xsltDefaultRegion to return "US" for "eo" :( but in this case locale command has to return eo_us in the list. For now I won't propose patch for libxslt - not before Esperanto to be accepted in GNU libc. Will glibc support a language without region or will assign a neural region as example "UN" ?
About Hungarian : It work for me: xsltproc --version ... libxslt 10124-SVN1494 ...., i.e. trunk. Input file: <?xml version="1.0" encoding="UTF-8"?> <languages> <lang>É</lang> <lang>E</lang> <lang>F</lang> <lang>D</lang> </languages> Output file: D E É F I would like propose bug to be closed as invalid.
I later compiled a newer version. Sorry. $ xsltproc -V Using libxml 20631, libxslt 10124 and libexslt 813 xsltproc was compiled against libxml 20631, libxslt 10124 and libexslt 813 libxslt 10124 was compiled against libxml 20631 libexslt 813 was compiled against libxml 20631 The bug is, that other xslt-processors can handle lang="eo". E.g. Firefox and saxon can handle it. I don't like to change to lang="eo-us". It is better to use "eo" or "eo.utf8" than use no locale without warning. But if you want to close the bug, it's ok. Thanks a lot for your help and patience.
Created attachment 129213 [details] [review] eo-sample vendor patch Wieland, may be you linux vendor will accept patch similar to attached "eo-sample vendor patch" for next libxsl version. With patch esperanto work for me. For know I don't know how to deal properly with non-region aware locales.
Wieland, you issue is not the same as for the reporter (Gabor). The problem is not same. It is a feature request as the GNU libc don't support Esperanto as locale. May I propose you to open a new request for Esperanto. About he original report (Hungarian) - it work for me with trunk version. This is reason to propose bug as invalid. Daniel what about ?
I cloned this to http://bugzilla.gnome.org/show_bug.cgi?id=573327
Probably I'm doing something wrong, but still not good, even after compiling from trunk: $ xsltproc srt.xslt langs.xml D E F É gabor@gabor-desktop:~/Asztal$ cat srt.xslt <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text" encoding="UTF-8" omit-xml-declaration="yes"/> <xsl:template match="languages"> <xsl:for-each select="lang"> <xsl:sort lang="hu"/> # hu-hu or hu_HU.utf8 gives the same result <xsl:value-of select="."/><xsl:text> </xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet> gabor@gabor-desktop:~/Asztal$ cat langs.xml <?xml version="1.0" encoding="UTF-8"?> <languages> <lang>É</lang> <lang>E</lang> <lang>F</lang> <lang>D</lang> </languages> gabor@gabor-desktop:~/Asztal$ which xsltproc /usr/local/bin/xsltproc gabor@gabor-desktop:~/Asztal$ xsltproc -V Using libxml 20632, libxslt 10124 and libexslt 813 xsltproc was compiled against libxml 20632, libxslt 10124 and libexslt 813 libxslt 10124 was compiled against libxml 20632 libexslt 813 was compiled against libxml 20632 This is on Ubuntu Jaunty, if that matters. However, sort works as expected: $echo -e "é\ne\nf\nd" | sort d e é f
Gabor, my output is: $ ./xsltproc --version Using libxml 20632, libxslt 10124-SVN1494 and libexslt 813 ... It seems to me you program use system libraries, not new one.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxslt/-/issues/ Thank you for your understanding and your help.