Bug 116236 – Use ngettext for handling plurals in GNOME

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 116236 - Use ngettext for handling plurals in GNOME


Summary:	Use ngettext for handling plurals in GNOME


Status:	RESOLVED FIXED

Product:	general
Classification:	Other
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Unknown User
QA Contact:	Unknown User

URL:
Whiteboard:

Depends on:	106695 106696 106697 106698 106699 106700 106701 106702 106703 106704 116235 116986 117997 118049 119790 123847 123849 123857 123866 125175 133142 133942 134340 134458 135077 149525 327106 327109 327116
Blocks:

Reported:	2003-06-29 00:51 UTC by Christian Rose
Modified:	2006-03-03 23:00 UTC

See Also:
GNOME target:	---
GNOME version:	2.7/2.8

Attachments
Preproccess PO files for non-plural-forms capable msgfmt (453 bytes, text/plain) 2003-11-15 17:14 UTC, Danilo Segan	Details

Description Christian Rose 2003-06-29 00:51:12 UTC

As mentioned in
http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#plurals,
the common way of handling plurals is broken for many locales. A way to solve
this is by using ngettext instead, as mentioned in that document.
A simple code example of code using ngettext:

  g_printf (ngettext ("Found %d file.", "Found %d files.", nbr_of_files),
nbr_of_files);

We should make all of GNOME use this when needed.

Comment 1 Christian Rose 2003-07-28 10:59:02 UTC

I think the consensus at the recent GUADEC was that it was alright to
include ngettext calls, since common platforms like both Linux and
Solaris already supports this, and ifdef:s can be used to catch the
cases where ngettext is not supported (something about using
*HAVE_NGETTEXT or something like that).

I admit I'm not very knowledgeable about the details in this proposed
solution though. Jody, Havoc, could you please correct me and fill in
the details?

Comment 2 Matthias Clasen 2003-07-28 11:53:58 UTC

Hmm, I would propose to add a g_ngettext() in glib, but I think the 
real problem is the po format, isn't it ? Will a ngettext()-less 
gettext implemenation understand the ngettext()-enhanced po files ?

Comment 3 Christian Rose 2003-07-29 18:39:57 UTC

I think something like a P_() macro for ngettext () was already
suggested at GUADEC, similar to _() and N_(). P as in plural or
something like that.

As for compatibility on the po format level, that's a very interesting
issue, something we forgot at GUADEC. It shouldn't be hard to figure
out; gnome-games (bug 106697) among others already use ngettext and
have po files with ngettext syntax, so it should be trivial to test.
In fact, I did so now, and it seems that msgfmt on an unpatched,
unupdated Solaris 8 machine issues an error with such a file. So it
seems we're still breaking compatibility here on the po file level for
systems that don't support ngettext.

But in the choice of keeping compatibility with unpatched, unupdated
environments and never moving forward and fixing issues, or
occasionally doing so, I know which one I prefer. ngettext isn't
exactly a brand new thing either, it's been around for a few years now
on Linux, so it's not like we would require bleeding edge stuff.

It's also included in Solaris 9 and available as a Sun patch for
Solaris 8, and mandated in the OpenI18N standard, so other compatible
environments are expected to follow if they aren't already supporting it.

Comment 4 Matthias Clasen 2003-07-29 18:53:02 UTC

Maybe the cleanest solution would be to make glib require ngettext()
like it already requires gettext() so that the rest of the stack can
depend on it being there.

Comment 5 Jody Goldberg 2003-08-03 19:03:28 UTC

Seems like an good idea.  I'd like to see a full set of _(, N_(, L_(
in glib so that we can stop having them pop up in various random places.

Comment 6 Christian Rose 2003-08-13 12:58:04 UTC

Ok, I put that in the glib bug 119790.

Comment 7 Christian Rose 2003-09-30 08:30:25 UTC

As mentioned in glib bug 119790, glib won't require this until GTK+ 2.6.

But we should just bite the bullet and use this for GNOME 2.5
(http://lists.gnome.org/archives/gnome-i18n/2003-August/msg00127.html)
anyway. This is severely needed but has been delayed so much in many
cases that it's just tragic.

Comment 8 Danilo Segan 2003-11-15 17:12:23 UTC

Another reason why's it ok to use ngettext.

Most of the tarballs already include generated MO files (so, there's
no need to generate them from PO files, which might cause problems
with non-GNU msgfmt if they contain entries like msgid_plural, or
msgstr[2]).

The interesting thing here is that MO files are quite simple, and
addition of plural forms to them didn't break backward compatibility.
So, if those MO files worked with other gettext's in the past, they'll
also work in the future.

So, doing
#define ngettext(a,b,c) (a)
should be enough (along with check for HAVE_NGETTEXT, and conditioning
this definition on that) to make *tarballs* compile and install on any
system with gettext (not neccessarily GNU's) support.

However, this leaves the problem of compiling from CVS, and compiling
a couple of packages that don't include MO files in the tarball (I
think Gnumeric is one of them, though, there might be some that are in
the Desktop/Developer platform too).

One approach to solve this problem, besides using definitions as
above, is to preproccess PO files and remove "offending" features
(msgid_plural, and msgstr[N] forms). The Perl program I'll attach
below does this, and can be used for piping (it reads standard input,
writes to standard output).

Concretely, this program replaces any occurence of msgid_plural line
with empty translation (msgstr "") -- ie. this string will be
untranslated for those folks (what means we should *recommend* using
GNU gettext for Gnome, but we won't require it); also, it comments out
all occurences of "^msgstr\[[0-9]+\]" lines. Thus generated file would
(should?) be compilable with any msgfmt out there.

I've tested it on a couple of Serbian translations with plural forms,
but any input is greatly desired (I don't have any other gettext
available other than GNU's, so please test it if you can).

How anyone else feels about this solution?

Comment 9 Danilo Segan 2003-11-15 17:14:17 UTC

Created attachment 21476 [details]
Preproccess PO files for non-plural-forms capable msgfmt

Comment 10 Danilo Segan 2003-11-15 17:45:06 UTC

Just to add some other thoughts (sorry for the spam).

I think all of this can be integrated into intltool autoconf macros,
except maybe for noop ngettext definition, thus making it
straightforward for build maintenance (not requiring every module to
include all the same checks in configure.in/ac).

intltool automake/autoconf macros should define HAVE_NGETTEXT if it's
present. Since these macros are also used to build MO files out of PO
files, it would require a change from something like:
 msgfmt -o $(OUTPUTFILE)
to
 if HAVE_NGETTEXT; then
   msgfmt -o $(OUTPUTFILE) $(INPUTFILE)
 else
   pre-msgfmt.pl < $(INPUTFILE) | msgfmt -o $(OUTPUTFILE) -
 fi

I guess this simple Perl script could also be improved to make use of
environment variable HAVE_NGETTEXT and output file without changes,
which would make the above simply:
 HAVE_NGETTEXT=$(HAVE_NGETTEXT) pre-msgfmt.pl < $(INPUTFILE) | msgfmt
-o $(OUTPUTFILE) -

Of course, all Gnome 2.6 packages with plural forms should require
this "new and improved" intltool version. Since I'm not really a best
friend with Autoconf/Automake, I'll wait for others' comments before
even trying to hack on it.

Comment 11 Danilo Segan 2004-01-28 16:13:25 UTC

External bug this one depends on, regarding Evolution:
http://bugzilla.ximian.com/show_bug.cgi?id=53464

Comment 12 bill.haneman 2004-01-28 21:53:13 UTC

How can we deal with plurals in XML files?  gnopernicus for example
will need to mark as translatable strings which include plural
constructions, which live in XML files.
Its possible for us to do something with our XML parser so that it
calls ngettext at runtime, for instance we can include the 'format
string' in the XML content:

<_ngettext-format>%d items found.
  <some-element-that-evaluates-to-an-int/>
</_ngettext-format>

but will the _ngettext-format element get properly pulled into the .PO
files for translation in a way appropriate to pluralization?  OR do we
need something like:

<ngettext-format _singular="%d item found." _plural="%d items found.">
<some-element-that-evaluates-to-an-int/>
</ngettext-format>

so that both strings get pulled into the .po files, as localized
attributes?

Comment 13 Danilo Segan 2004-01-29 16:29:24 UTC

That wouldn't be enough.  Number of plural forms may be upto 4
(perhaps even more, but I don't know about any such language), and way
to determine which of those is used is expressed in a form of C
expression in PO files.

So, first step would be to introduce something like:
<ngettext locale="C">
  <plural num="0">%d item</plural>
  <plural num="1">%d items</plural>
</ngettext>

and to have translations of the following form integrated:
<ngettext locale="sr@Latn">
  <plural num="0">%d stavka</plural>
  <plural num="1">%d stavke</plural>
  <plural num="2">%d stavki</plural>
</ngettext>

This is just example syntax, but it needs to scale on the number of
plural forms (i.e. for two strings we may get 1--4 or possibly more
translations).

The problem here is how to decide which form to use. GNU gettext
library parses string of the form (Serbian example):
"nplurals=3; plural = n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 &&
(n%100<10 || n%100>=20) ? 1 : 2;"
and evaluates "plural" as index of the plural form. So, if you want to
put plural forms inside XML files, we've got two choices: either to
hardcode language algorithms, or to (re)implement entire C arithmetic
parser and use gettext("") to get header from MO file and extract
field "Plural-Forms".

Number "n" is not needed in the XML file, because string is chosen at
display time, and its choice depends on the number to be used with it. 

So, the best way would probably be to keep this data in PO/MO files
instead of putting them in XML, and using gettext to find out the
needed string.

Comment 14 bill.haneman 2004-01-29 16:36:42 UTC

Danilo, I do not understand your reply.  I do of course understand
that there are potentially more than 2 plural forms - but my question
had to do with the extraction of format strings _suitable for passing
to ngettext_.  Since C code using ngettext only passes two marked
strings in, it seems obvious to me that only two strings need to be
marked in the 'C' locale (i.e. only two msgids are required).

It is required for this data to be in XML; perhaps you could re-read
my previous question.  The issue has to do with:

* a means of extracting strings from XML in a way that allows
ngettext-appropriate plural translation

Your suggestion above does not at all seem workable since it
presupposes that the XML.in file know all the possible plural forms,
which we know is not feasible.

Comment 15 Danilo Segan 2004-01-29 16:51:53 UTC

Ah, I thought you wanted translations to be in XML file as well (what
is done usually using "xml:lang").

This would require adding functionality to intltool, because it
handles extracting string from everything apart from sourcecode --
perhaps it's best to discuss a longterm solution on
intltool@freedesktop.org?

OTOH, short-term solution and probably most painless way to do this
right away, without depending on the latest intltool (which is not
even created yet :), is to extract these strings into a .h file, and
put that in po/POTFILES.in.

Comment 16 bill.haneman 2004-01-29 16:58:06 UTC

Danilo:
intltool-extract already extracts translatable strings from XML files,
including both element content, and attributes.

So if we use either technique I mention, the strings will get pulled
into the .po files.  The issue is, what's "special" about the way
plural-form strings get listed in the PO files that allows
ngettext-type translation?  DO the translators just grep for "%d", or
what?

I think this may be technically feasible without changing
intltool-extract, but I need more info about how the ngettext-type
extraction and translation work (not the internals of ngettext, which
I can find info about, but how translators find stuff in .po files
that needs ngettext-appropriate translation).

- Bill

Comment 17 Danilo Segan 2004-01-29 18:30:38 UTC

They're completely special-cased in PO files, and intltool-extract
would *have* to be extended to support it. And no, intltool-extract
doesn't support it at this time.

Instead of the "regular":
msgid "Original string"
msgstr "Translation of it"

PO file contains something like:

msgid "%d original string"
msgid_plural "%d original strings"
msgstr[0] ""
msgstr[1] ""
...

As far as I could tell, intltool-extract is designed with only one
form per message (it puts all messages in a hash/array in Perl, and
constructs a PO file later on), which means small architectural
changes would be needed (like, allowing arrays/pairs to be keys in a
hash as well, and treating them as plural forms).

As for finding out about this, it's documented in "info gettext" as
well, topic "PO Files" (use "m PO Files[RET]"): two completely
separate styles of "items" are documented: one without, and other with
plural forms.

I hope it's now finally clear that this is *not* possible without
changes to intltool, and that's why I'd really like this discussion to
be moved to intltool@freedesktop.org.

Comment 18 bill.haneman 2004-01-29 18:45:16 UTC

Are you saying that the apps that already use ngettext have
hand-edited .pot files?

Comment 19 Christian Rose 2004-01-29 19:04:26 UTC

No, intltool-update calls xgettext which can extract gettext and
ngettext messages out of C files just fine.

Comment 20 Danilo Segan 2004-01-30 10:54:21 UTC

Bill, look at my suggestion above to extract strings from XML files
and put them in a C header (.h) file, where I implied using xgettext
to extract ngettext calls. You, of course, wouldn't use this .h file
anywhere in your code, except for putting strings into PO file, and
that's why they would have to be the same as those you use in XML
files, and pass to ngettext later on: in that case, you may choose DTD
which suits you best.

The other option, as I already said, is to extend intltool to support
it (which should probably be done anyway, but it seems not to solve
your immediate problem).

Comment 21 bill.haneman 2004-01-30 13:07:22 UTC

intltool-update currently extracts marked-up strings to (temporary) .h
files, then runs xgettext on them, it seems.  The possibility remains
open that intltool-update's current behavior can be leveraged to do
without having to create new Makefile rules to create and update the
.h files manually (and add the persistent .h files to the
POTFILES.in).  It might take a small tweak to intltool-update, not
sure without reading the code more closely.  The point I am
investigating here is leveraging intltool's existing functionality.

Comment 22 Mathias Hasselmann (IRC: tbf) 2006-03-03 09:43:20 UTC

As I face this problem of missing ngettext support in glib right now and once again, I'd like to raise that bug once again.

Maybe we should start with providing the relevant macros in <glib/gi18n.h>.
Once they start to be used, als the other problems will resolved quickly.

Comment 23 Matthias Clasen 2006-03-03 13:57:36 UTC

What macros ? I don't think there are any "standard" macros for ngettext,
or are there ? I any case, glib requires ngettext support now, so 
you can feel free to use ngettext() whereever you need it.

Comment 24 Danilo Segan 2006-03-03 18:14:54 UTC

Regarding comment #22: Mathias, just use ngettext() function call directly, you need no macros.

Anyway, this was just a "container" bug for all the ngettext bugs we found when we finally started supporting ngettext().  Those missing instances are pretty rare now, and I believe we can close this bug (all dependants seem resolved).

Matthias?

Comment 25 Matthias Clasen 2006-03-03 23:00:30 UTC

agreed