After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 566044 - Do not sort output from .xml.in
Do not sort output from .xml.in
Status: RESOLVED NOTGNOME
Product: intltool
Classification: Deprecated
Component: general
0.40.x
Other Linux
: Normal normal
: ---
Assigned To: intltool maintainers
intltool maintainers
Depends on:
Blocks:
 
 
Reported: 2008-12-30 12:32 UTC by Dwayne Bailey
Modified: 2012-03-16 12:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
An XML file that show the problematic ordering (1.63 KB, application/xml)
2009-01-05 07:36 UTC, Dwayne Bailey
  Details
Resultant .h file produced by intltool-extract (615 bytes, text/plain)
2009-01-05 07:37 UTC, Dwayne Bailey
  Details
Proposed changes (1.23 KB, patch)
2010-02-07 18:42 UTC, Rail Aliev
none Details | Review

Description Dwayne Bailey 2008-12-30 12:32:31 UTC
Running intltool-update on various xml.in files results in the translation strings being sorted.

This creates a problem for translation as the file is following a system of:

group
  name
  description
group
  etc

The result is that the connection between name and description is lost and a translator cannot infer information from the description to help translate a cryptic name.

A simple hack to remove the sorting results in a order that doesn't represent the order of the underlying file either.
Comment 1 Rodney Dawes 2008-12-31 15:34:25 UTC
Can you attach a simple test case showing this problem? Or point to a publicly accessible source repo that exhibits the issue?
Comment 2 Dwayne Bailey 2009-01-05 07:22:05 UTC
comps.xml for Fedora exhibits this problem.  You can access the files here:
http://cvs.fedoraproject.org/viewvc//comps/
Comment 3 Dwayne Bailey 2009-01-05 07:36:33 UTC
Created attachment 125770 [details]
An XML file that show the problematic ordering

I've taken the Fedora comps-f11.xml.in and reduced it to show a simple problem of reordering. You can quickly see that the order in this file is important but due to sorting we lose any association.

E.g. The LXDE desktop environment starts with L and appears very far down the real file.  Yet its description, because it starts with 'A' appears at the start of the resultant PO files.

I used:
intltool-extract --type=gettext/xml demo.xml.in

To get the .h files for analysis.
Comment 4 Dwayne Bailey 2009-01-05 07:37:30 UTC
Created attachment 125771 [details]
Resultant .h file produced by intltool-extract
Comment 5 Dwayne Bailey 2009-01-28 15:34:17 UTC
I think I understand the problem.  In intltool-extract there is a section that sorts the keys of the resultant messages, which is done I assume because the actual dictionary in which the messages are stored is a type with an arbitraty order.

Somehow %messages needs to remember its extraction order so that we can produce files that follow the order of specification.  Otherwise we get these wacko ordered list.

I know have a glade file where the message from the same dialogue are spread across the POT file, absolutely no way to convey any kind of context to the translator.

With this bug its a wonder anyone has translated any glade based applications.
Comment 6 Rodney Dawes 2009-01-30 22:19:20 UTC
I think the problem is that you're expecting there to be context based on the position of something in the file, which doesn't really work anyway, because the resulting po files will have enough metadata between the strings, as to quickly lose any context you were hoping to gain.

The correct way to provide additional context to translators for this stuff is to use comments and the standard gettext context method of "context|string" in a string. Also, if the exact same string appears in multiple files in your source, it will only appear in one place in the po files, unless you provide context as "context|str" to distinguish them. This is how gettext behaves. It has nothing to do with intltool, really.

String contexts are relational, not positional, so I don't see how even having stuff unordered here would help, if you're not providing the context in the strings. You would still need to inform the translator of the relation, otherwise they are still just separate strings.
Comment 7 Claude Paroz 2009-01-30 22:47:33 UTC
Ideally Rodney is right. But practically when translating I'm more often than not lurking into the glade files to see if the string applies to a button, a window title, a tooltip, etc. You cannot decently ask the UI designer to add a translator comment to each element saying its nature. Until we have an automated way to output in the po file the nature of each element of glade files (either as context or comment), the correct ordering of glade strings in po file is of great help to translators.

As another interesting case, see also the libgweather po-locations pot file. The maintainer has provided his own script to generate the pot file, so as the ordering of strings is optimized for translating (countries, capitals, small cities). In this special case, the ordering is even different than that of the strings in the original xml file. Yes, there are cases where order matters.
Comment 8 F Wolff 2009-01-30 23:05:14 UTC
In all my translations I have found previous and following messages extremely useful, and often the most valuable source of contextual information. To say that programmers should do things that they often don't do, doesn't solve the problem that we are loosing context that is available if we provide the strings in their natural order.

I recently encountered this when translating a glade file. A menu had two entries "P_age Up" and "_Page down" that were nowhere close to each other in the .pot file, whereas their order in the glade file would have helped with consistency. No programmer is going to add notes to the first saying "_Page down follows" and to the latter that "P_age Up precedes this". Realistically no programmer can ever add all the context about where a string occurs.

When translating a dialogue box, labels and buttons can refer to each other which is much easier to translate if I can see them all on the screen at once while translating. It also makes it far easier to avoid problems with consistency and accelerators while translating before I do review of the translated GUI - another thing that programmer comments or context can't help with in any way. It is therefore not just about grouping the source strings together, but also their translations.

While duplicate messages could mean that a few messages would be removed out of context, this doesn't detract from the fact that it will be correct in at least the first occurrence, and most messages could be useful in 90% of the cases.

Lastly, if people want it sorted alphabetically, it is trivial to do after the fact with msgcat or xgettext by giving the --sort-output parameter. There is however no way to get the useful ordering after the fact.

Please consider keeping the order of the strings as we obtain them from all the source files.
Comment 9 Dwayne Bailey 2009-02-02 07:35:44 UTC
(In reply to comment #6)
> I think the problem is that you're expecting there to be context based on the
> position of something in the file, which doesn't really work anyway, because
> the resulting po files will have enough metadata between the strings, as to
> quickly lose any context you were hoping to gain.

This true in some aspects because PO will eliminate duplicates unless they have context.  But there is extrememly valuable information based on an entries position.  A mimetype XML definition that has a _name and _description means much more when they follow each other logically. So order is important, and its the main reason why xgettext doesn't sort by default.

Keeping order and a translator using a tool that filters only xml.in provides them with more then enough positive hits for this to be very useful.  With sorted data they are completely lost and cannot ensure consistency between messages before and after their current message.

> The correct way to provide additional context to translators for this stuff is
> to use comments and the standard gettext context method of "context|string" in
> a string. Also, if the exact same string appears in multiple files in your
> source, it will only appear in one place in the po files, unless you provide
> context as "context|str" to distinguish them. This is how gettext behaves. It
> has nothing to do with intltool, really.

As Friedel says expecting programmers to provide context is near impossible, especially when simply not sorting the extracted data would provide enough useful context to a translator.

I am aware of the behaviour of gettext in terms of merging terms. It would probably interest you to realise the msgctxt KDE's _: and GNOME's xxx|word where created for translators so that they could get more context.  Its really what this bug is asking for.  Please don't take away valuable context that is available through correct ordering.

Gettext gets correct ordering by default/accident in that it tracks line numbers in the extraction.  That is unfortunately lost in any of the intltool extraction.  A thought: putting in correct line numbers from the XML might be a correct solution as it would make the items sortable in a correct order.

> String contexts are relational, not positional, so I don't see how even having
> stuff unordered here would help, if you're not providing the context in the
> strings. You would still need to inform the translator of the relation,
> otherwise they are still just separate strings.

I'd disagree.  Although relational context exists e.g. same file.  Name one file format in intltool where we are preserving and hinting at relational contexts?  We lose all of it. We don't know that some name and some descripotion belong to text/x-wish-i-had-context-to-translate-this. Or that this message is in this dialog in glade.  We do get enough context by positional order because in a glade file all message we want to translate are in the same level of the XML heirachy.

One last comment. I think you overstate the merging issue in Gettext.  The reality is that many message that are translate and would love context would never get merged. A tooltip is unlikley to be duplicated and in our unordered world it would be close to the actual button text/dialogue family.  But in an ordered world it could be at the start while its button is at the end.

Try this for size:
1) A button with text: "Add Missing Language..."
2) A tooltip: "Your language is not in this list, clicking on this button will allow you to add it to the list"

A bit contrived, but mostly valid in an application we develop.
a) None of those will ever get merged
b) One is at the start and the other at the end of a 500 string file.

Hopefully that gives enough of an understanding about why we'll take as much context as we can get:)
Comment 10 Dwayne Bailey 2009-02-03 06:14:16 UTC
I realised that that description might have been long winded but rather let me give you a very real examples.

The snippet I posted 'An XML file that show the problematic ordering' is taken from Fedora's comps.xml files.

The entry about the LXDE desktop appears like this:

  <group>
    <id>lxde-desktop</id>
    <_name>LXDE</_name>
    <_description>A lightweight desktop environment that works well on low end machines.</_description>

Now there are many desktops and that description could also describe XFCE.  But in the XML you have good context and know they are related.

In the resultant POT file after sorting we have the following:



msgid "A lightweight desktop environment that works well on low end machines." - line 21

msgid "LXDE" - line 1213

The file is 1344 lines long, so the _description of the entry is at the top of the POT file and the _name is at 91%.  This is the same for all of this data.
Comment 11 Danilo Segan 2009-03-30 22:59:31 UTC
We should simply run 'msgcat -F' or equivalent on the produced POT file.
Comment 12 F Wolff 2009-04-05 09:48:55 UTC
(In reply to comment #11)
> We should simply run 'msgcat -F' or equivalent on the produced POT file.

The file locations (#: comments) come from the generated header file, which is already in the wrong order. Msgcat -F will simply maintain this order which is already sorted alphabetically.
Comment 13 Rail Aliev 2010-02-07 18:42:45 UTC
Created attachment 153222 [details] [review]
Proposed changes

Please take a look at the attached patch.

At least it works fine with Dwayne's file. The main idea is to use Tie::IxHash (with insertion "memory") and not to sort %messages later.
Comment 14 Dwayne Bailey 2010-02-08 06:13:37 UTC
Rail - thanks!

I've just tested the patch on demo.xml and it works correctly as Rail already stated.

I've run the same patched intltool-extract against Fedora's comps-f12.xml.in and the results are what I'd expect.  The .h file is no longer sorted and thus comments follow names as they are in the original XML file. This should then result in a POT file that is in an order that assist localisers.
Comment 15 Leonardo Ferreira Fontenelle 2010-05-23 03:36:53 UTC
Is the patch OK to be submitted?
Comment 16 Leonardo Ferreira Fontenelle 2011-01-06 22:18:38 UTC
Is this still the place to hold bug reports for intltool?
Comment 17 F Wolff 2011-09-20 08:03:18 UTC
Leonardo, the patch works perfectly. I use it on my machine since it was posted here. Activity for intltool seems to have moved to Launchpad. The bug with the patch is now here:
https://bugs.launchpad.net/intltool/+bug/520986
Comment 18 André Klapper 2012-03-16 12:39:30 UTC
intltool has switched from the GNOME to the launchpad.net infrastructure nearly three years ago: https://mail.gnome.org/archives/gnome-i18n/2009-April/msg00275.html
The intltool product in bugzilla.gnome.org has been deprecated and closed for new bug entry since April 2009.

I am now closing all remaining open reports about intltool as NOTGNOME as part of GNOME Bugzilla Housekeeping.

Reporter: If the problem that you reported here is still valid in a recent version of intltool we kindly ask you to report it again to https://bugs.launchpad.net/intltool/ so the intltool developers get notified about it.