After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 421155 - maybe intltool-merge should drop msgstr's that are identical to msgids
maybe intltool-merge should drop msgstr's that are identical to msgids
Status: RESOLVED NOTGNOME
Product: intltool
Classification: Deprecated
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: intltool maintainers
intltool maintainers
: 426925 459509 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2007-03-21 18:32 UTC by Ray Strode [halfline]
Modified: 2012-03-16 12:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
filter out translations that aren't different than the untranslated form (744 bytes, patch)
2007-03-21 18:36 UTC, Ray Strode [halfline]
rejected Details | Review

Description Ray Strode [halfline] 2007-03-21 18:32:22 UTC
Often times the translation for a string will end up being exactly the same as the untranslated version of the string.

In fact, right now Locations.xml in the weather applet has something like 7 megabytes of translations that aren't any different than the untranslated msg ids.

It might be a good idea to just omit the translation in cases where the msgid and msgstr are the same and rely on the application to fallback to the untranslated string.
Comment 1 Ray Strode [halfline] 2007-03-21 18:36:38 UTC
Created attachment 85064 [details] [review]
filter out translations that aren't different than the untranslated form
Comment 2 Danilo Segan 2007-03-22 11:34:50 UTC
I highly doubt this passes the testsuite: this is an incompatible change (some apps might have depended on it, like comparing gettext(something) == something), it will break multifile behaviour, and this is simply opening a whole new can of worms.

intltool provides for what you are asking, and that happens when there is no translation for a string.  So, you may instead post-process PO files if that's what you want, though I'd suggest using "-m" option for Locations.xml.

Anyway, before discussing this further, I wonder what is it you want to achieve/solve?

If this is solely about Locations.xml, then lets concentrate on that instead of changing intltool (incompatibly) for everybody.
Comment 3 Ray Strode [halfline] 2007-03-22 17:03:13 UTC
Hi,

Originally, the patch was motivated by some observations Matthias made on irc about the size of Locations.xml.  It's not an optimization I feel super strong about, it just seemed like it might be worthwhile to do.

running "make check" in the tests directory passes with the patch.

Can you clarify what you mean by "break multifile behavior" ?  Also, what "-m" option are you talking about?

On the surface, your pointer equality check seems a bit dubious to me.  Is this an idiom you've seen used before?  Do you think this patch is going to break existing applications?

This patch might be a bad idea, I don't know.  It just seemed like a quick hack that could have some real wins in some cases.
Comment 4 Danilo Segan 2007-07-20 21:19:19 UTC
*** Bug 426925 has been marked as a duplicate of this bug. ***
Comment 5 Danilo Segan 2007-07-23 17:08:45 UTC
*** Bug 459509 has been marked as a duplicate of this bug. ***
Comment 6 Danilo Segan 2007-07-23 17:12:28 UTC
So, this is what I can consider a real solution to the problem (from my comment in 459509):

"Everybody wants this to solve the Locations.xml problem. That's solving a wrong
problem. Just use "-m" to generate per-locale Locations.xml,
and provide it using language packs (that way, you can get even better memory,
disk space and speed improvements)."

And I am leaning strongly to marking this as NOTABUG or WONTFIX, unless I am convinced that you can gain more memory, disk space and speed than with language packs.
Comment 7 Matthias Clasen 2007-07-23 17:16:06 UTC
how is that solving the problem ? 
if the per-locale Locations.xml files are all in the same place, then the 
"language packs" will conflict with each other. And if they are installed in
different places or named differently, then the weather applet will not find them.
What language packs, by the way ? Last I checked, my rpms installed all languages at the same time, and I haven't seen any plans to change that...
Comment 8 Rodney Dawes 2007-07-23 21:39:29 UTC
Solaris has "language packs" and Debian/Ubuntu have been working on it as well. The Locations.xml won't conflict, because they will be named Locations-en_GB.xml or en_GB/Locations.xml and loaded that way. See the multi-file output support in intltool. A few projects utilize this already.

As for optimizing the actual .mo files for cases with internal strings, I think we probably don't generally have enough cases where this is an issue for program internal-only strings. However, perhaps stripping the strings that we merge back into static text files, and that aren't used internally by the program, from the resulting .mo files, would be a good optimization. It seems like the .mo file issue is only really an issue where we have a lot of strings in an external file that get translated, such as in gnome-applets.

Beyond that, I think ignoring msgid == msgstr is something that should probably be implemented in gettext, not intltool, as this would also benefit projects that don't use intltool, but do use gettext.
Comment 9 Matthias Clasen 2007-07-23 21:52:06 UTC
and it is still not solving the problem at all, since the sum of the sizes of all the Locations-foo_bar.xml will be even larger than the current humongous all-in-one Locations.xml. Are you interested in solving the size issue, or not ?

Pointing to hypothetical langpacks is just a cop-out.
Comment 10 Danilo Segan 2007-07-24 23:42:23 UTC
Matthias, with separate file per language, you only need to install those that are actually required on the system. And you end up mmap-ing smaller file.

And indeed, if ignoring msgid==msgstr strings is safe, it should go into gettext.  I am sure Bruno will be glad to implement that if anyone does the homework and checks that no programs are actually doing checks like "something" == gettext("something") to check that it's indeed translated, and change the logic based on that.

Introducing possibly incompatible behavior to solve one problem of Locations.xml file is wrong: if you want a hack to solve the size issue for Locations.xml, put it inside gnome-applets.

If you want the real solution, put it into gettext, or implement language packs.
Comment 11 Matthias Clasen 2007-07-24 23:50:14 UTC
files that are actually required on the system == all languages
don't buy that language pack crap
Comment 12 Matthias Clasen 2007-07-24 23:57:28 UTC
to explain further, the origin of this bug report is that we are trying to fit stuff on a live cd. And we are pretty much opposed to creating multiple region-specific live cds just because intltool plays stupid.
Comment 13 Danilo Segan 2007-07-25 07:19:58 UTC
I like it that you want to include all the languages on the live CD.

But, what other problems except Locations.xml do you have?

In general, it is *not* common to use the same string for translation except when you want the same string (Locations.xml is an exception to this, because it is huge, and it affects the status too much, so translators used to "msgen" it).

In other cases, I am sure it's not compatible with old behaviour, but I am not sure if it will break any existing programs.
Comment 14 Simos Xenitellis 2007-07-25 14:22:26 UTC
Locations.xml appears to be a low-handing fruit, and as shown above, it can shave off 7MB of space without extra effort. If intltool-merge had an option to discard messages when msgid==msgstr when merging, this would be great. With such an option in place, it would be a one-line patch in the gnome-applets Makefile.

The part that I do not understand (considering your comment at bug 459509) is the difference for an application when a message is untranslated and a message is translated (but msgid==msgstr). 

Wouldn't it wrong for an application to assume that specific messages have some translation? There are several locales in GNOME that have no translations but do not experience regressions due to the lack of translations.

An valid concern I noticed was for a custom configuration when one gives custom values to the LANGUAGE variable. See http://lists.kde.org/?l=kde-i18n-doc&m=118521750624019&w=2

This happens when someone uses the LANGUAGE variable with a value such as "es:fr" which means show me messages in Spanish and if something is untranslated show me in French. If a message has msgid==msgstr for Spanish but not for French, then it would show in French if we go along with the proposed optimisation. 
Comment 15 Rodney Dawes 2008-04-12 18:20:45 UTC
(In reply to comment #14)
> This happens when someone uses the LANGUAGE variable with a value such as
> "es:fr" which means show me messages in Spanish and if something is
> untranslated show me in French. If a message has msgid==msgstr for Spanish but
> not for French, then it would show in French if we go along with the proposed
> optimisation. 

Which may be incorrect, as the "translation" in Spanish may be that it is the same as the untranslated C locale version, but in French, it is not. For instance, the country name "Colombia" is "Colombia" in both C and es, but in fr, is "Colombie". With the suggested change, the incorrect translation would be used in the case where LANGUAGE="es:fr" is specified.
Comment 16 Simos Xenitellis 2008-04-12 20:14:11 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > This happens when someone uses the LANGUAGE variable with a value such as
> > "es:fr" which means show me messages in Spanish and if something is
> > untranslated show me in French. If a message has msgid==msgstr for Spanish but
> > not for French, then it would show in French if we go along with the proposed
> > optimisation. 
> 
> Which may be incorrect, as the "translation" in Spanish may be that it is the
> same as the untranslated C locale version, but in French, it is not. For
> instance, the country name "Colombia" is "Colombia" in both C and es, but in
> fr, is "Colombie". With the suggested change, the incorrect translation would
> be used in the case where LANGUAGE="es:fr" is specified.
> 

Thanks for picking this. It was a typo on my side, I should had written "If a message has msgid==msgstr for Spanish but not for French, then it would **not** show in French if we go along with the proposed optimisation". I describe this as a concern; side effect.

The way I position the optimisation of clearing messages when msgid=msgstr, is that is should be a decision by each distribution. When a distribution creates packages, they can choose whether to strip the MO files, either for some languages or for all. It is up to the distribution whether to support locale sequences of the type "non-english:english". As far as I know, there is no GUI tool to select complex locale sequences such as "es:fr:en".

It might become an issue to the distributions when locales such as en_AU, en_NZ, en_CA, and so on start to complete the translations, and reach a level of en_GB (19MB MO files instead of a bit over 2MB when stripped). Ditto for Spanish in Latin America. Currently, when you install the English locale, you get MO files for all English-based locales. Again, same for Spanish.

Currently, Ubuntu (in 8.04) is stripping messages.

Where GNOME would be concerned, is whether to make the job of the distributions easier, if they choose to strip the messages from the final packages.
Comment 17 Ray Strode [halfline] 2008-04-13 04:39:10 UTC
The proposed optimization is either right or wrong.  If it's right then it should be upstream, if it's wrong then it shouldn't be used.

I don't see how it can be too broken for upstream but good enough for distros.  That's just weird.
Comment 18 Simos Xenitellis 2008-04-13 12:01:46 UTC
(In reply to comment #17)
> The proposed optimization is either right or wrong.  If it's right then it
> should be upstream, if it's wrong then it shouldn't be used.
> 
> I don't see how it can be too broken for upstream but good enough for distros. 
> That's just weird.
> 

Let's take a similar example. Now, GNOME provides tarballs for distributions which include all available translations, whether translations are completed up to 1%, 80% or 100% for a language. There is no policy which says, when we produce tarballs, we exclude translations that are less than something like 70% completed. Of course, this is a decision that the distribution will make, whether they are happy to give to their users partial translations for some packages/languages.

In the above example, what GNOME could do to make the job of the distributions easier, is to provide some options such as "./autogen.sh --translation-minimum=70", which trickles down to some support in intltool to produce the translation level.

In our case, and in this bug report, it is again up to the distribution whether to support locale sequences such as "es:fr:de:en", or give the option to the end-user they are using a single language at a time.

If GNOME wanted to be friendly to the distributions, we could provide some options such as 

./autogen.sh --strip-all-translations

or 

./autogen.sh --strip-translations=en,es

The benefits for a distribution to perform the optimisation is whether they want to save packaging space/disk space, and whether they want to save some memory when running GNOME (about 2MB). 

An important aspect here is that we do not have someone from a distribution commenting, or requesting for such easier functionality. To the best of my knowledge, only Ubuntu (8.04) does this, but with their custom scripts.

If you still do not see fit in this bug report, you can change the GNOME version to "Unversioned Enhancement", and come back here when distributions start requesting this.
Comment 19 Rodney Dawes 2008-04-13 13:14:41 UTC
(In reply to comment #18)
> If GNOME wanted to be friendly to the distributions, we could provide some
> options such as 
> 
> ./autogen.sh --strip-all-translations
> 
> or 
> 
> ./autogen.sh --strip-translations=en,es
> 
> The benefits for a distribution to perform the optimisation is whether they
> want to save packaging space/disk space, and whether they want to save some
> memory when running GNOME (about 2MB). 

Distributions already have the option of specifying a specific list of translations they want to ship, by using the LINGUAS environment variable during the build. There's no need to add a configure option to do so. If there are issues with LINGUAS support, they should be opened in separate bug reports.

> If you still do not see fit in this bug report, you can change the GNOME
> version to "Unversioned Enhancement", and come back here when distributions
> start requesting this.

I don't think what you are talking about in this comment is related to this report. You seem to be suggesting that we should add support to allow distributors to do something they already have the ability to do.
Comment 20 Rodney Dawes 2008-04-13 13:37:40 UTC
(In reply to comment #17)
> The proposed optimization is either right or wrong.  If it's right then it
> should be upstream, if it's wrong then it shouldn't be used.

Yes. It is either right or wrong. And I don't think that has been determined entirely yet.

> I don't see how it can be too broken for upstream but good enough for distros. 
> That's just weird.

Yes, but since you are the distro, your argument is that it's good enough. So sure, it seems weird to you. And as upstream, my argument is "I don't have enough information to make a concrete decision one way or the other at this point."

If the case I presented in comment #15 is true, then the answer is that this isn't good enough, because it will break valid translations where msgid==msgstr, where falling back to other locales where msgid!=msgstr for the same msgid.
Comment 21 Ray Strode [halfline] 2008-04-14 04:22:45 UTC
If it's not clear, I'm not convinced the change is right either.  It could be the LANGUAGE issue is legitimate and the patch shouldn't go in.

We did ship the patch in Fedora for a while, but ended up dropping it when we realized it didn't look like it was going to get upstream.
Comment 22 Simos Xenitellis 2008-04-14 11:37:36 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > If GNOME wanted to be friendly to the distributions, we could provide some
> > options such as 
> > 
> > ./autogen.sh --strip-all-translations
> > 
> > or 
> > 
> > ./autogen.sh --strip-translations=en,es
> > 
> > The benefits for a distribution to perform the optimisation is whether they
> > want to save packaging space/disk space, and whether they want to save some
> > memory when running GNOME (about 2MB). 
> 
> Distributions already have the option of specifying a specific list of
> translations they want to ship, by using the LINGUAS environment variable
> during the build. There's no need to add a configure option to do so. If there
> are issues with LINGUAS support, they should be opened in separate bug reports.

When I use the term "strip", I mean similar to "man 1 strip". You would perform a "strip" to a PO file which would remove messages when msgid==msgstr, then compile to MO files in order to create the package.
With the LINGUAS environment variable, you specify which translations to include or not. 

I try to summarize what we have up to now,

1. The "optimization" we talk about is to strip messages when msgid==msgstr, when compiling PO files.
2. One would (if they chose to) only perform the optimization when building packages for their distribution. No translation files change within GNOME.
3. The default behavior of the GNOME building tools would not change; one would have to use some special parameter to strip MO files (either a subset or all).
4. The distribution must declare to their users that they support a single language at a time, (with the implicit/explicit fallback to English (en_US)). The UI should not permit the user to set the LANGUAGE variable to a sequence such as es:fr:de:en. 
5. One would start considering all these if they have the burning need to reduce space (for example, to fit packages on a single CDROM, or install on some small device that space is at a premium). Or, when RAM is really at a premium.

One would consider "stripping" translation files if *all 5 items* apply.

If we consider all this concept of "stripping" translation files is a big no-no, then this report should close (maybe INVALID?).

If we consider that someone in the future would need to come back here, we could leave it, marked as enhancement or something.

If there is something else I miss from this comment entry, please say so.
Comment 23 Loïc Minier 2008-04-15 20:50:09 UTC
Wouldn't it be possible to have a new attribute on the string expressing that some languages have the same translation as the original string?

For example:
        <name>Paris</name>
        <name xml:lang="ar">ﺏﺍﺮﻴﺳ</name>
        <name xml:lang="as">প্যাৰিস</name>
        <name xml:lang="az">Paris</name>
...
        <name xml:lang="cy">Paris</name>
        <name xml:lang="da">Paris</name>
        <name xml:lang="de">Paris</name>
...


Would become:
        <name xml:langs="C az cy da de">Paris</name>
        <name xml:lang="ar">ﺏﺍﺮﻴﺳ</name>
        <name xml:lang="as">প্যাৰিস</name>
...
Comment 24 André Klapper 2012-03-16 12:39:50 UTC
intltool has switched from the GNOME to the launchpad.net infrastructure nearly three years ago: https://mail.gnome.org/archives/gnome-i18n/2009-April/msg00275.html
The intltool product in bugzilla.gnome.org has been deprecated and closed for new bug entry since April 2009.

I am now closing all remaining open reports about intltool as NOTGNOME as part of GNOME Bugzilla Housekeeping.

Reporter: If the problem that you reported here is still valid in a recent version of intltool we kindly ask you to report it again to https://bugs.launchpad.net/intltool/ so the intltool developers get notified about it.