Bug 97556 – Eel functions for adding message context markers should be moved to glib

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 97556 - Eel functions for adding message context markers should be moved to glib


Summary:	Eel functions for adding message context markers should be moved to glib


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:	122111
Blocks:	113932

Reported:	2002-11-03 13:50 UTC by Christian Rose
Modified:	2011-02-18 16:07 UTC

See Also:
GNOME target:	---
GNOME version:	Unversioned Enhancement

Description Christian Rose 2002-11-03 13:50:30 UTC

Eel has some functions for parsing messages marked for translation so that
developers can add context in messages, in case the message otherwise can
be interpreted differently or the English word has multiple meanings that
needs to be translated differently in other languages. The context comment
is then removed prior to display in the user interface.

It works like this example (taken from bug 97482):

  Q_("Russian[ charset]");

The word "Russian" is what should be translated, and that inside [] is the
contextual comment. In this example it's so that the message for the
"Russian" language can be differentiated from the message for the "Russian"
character set in Galeon, which needs to be translated differently in other
languages where the words don't happen to be the same.

I suggest this eel feature be moved to glib, since it's sometimes a
requirement for proper localization.

Comment 1 Kenneth Rohde Christiansen 2002-11-03 14:30:59 UTC

This is the function discussed

/* Remove all text in brackets.  Used where context is included in
strings to 
 * be internationalized, to help translators, and to make sure that
strings
 * that may be used in different places with a different meaning may be 
 * translated separately.  If brackets are not even, it will just
return a 
 * copy of the original string. 
 */
char *   eel_str_remove_bracketed_text     (const char    *text);

Comment 2 Owen Taylor 2002-11-19 22:05:52 UTC

Ugh, but maybe as good as it is going to get. Definitely
would need a facility for escaping brackets.

Comment 3 Matthias Clasen 2002-11-20 11:22:54 UTC

This should be done as explained in the gettext manual. The translations should not 
contain the contextual comments in brackets,
thus in the normal case, no stripping 
is necessary. Only in the case
where no translation is found and the original string 
is returned,
the stripping needs to be done. 

See 
http://www.gnu.org/manual/gettext/html_node/gettext_151.html#SEC151

for 
an example.

Comment 4 Owen Taylor 2002-11-20 17:15:09 UTC

My trust that translators can be convinced to translate

 Russian[ charset]

As:

 Russe

Not:

 Russe[ charset]

 
or:

 Russe[ <charset translated into French>]

Is frankly fairly low (from experience with places where there
have been explicit comments telling the translators what
to do...) but maybe if it's standard enough...

Note also that the main advantage of the no-strip-if-translated
approach can only be achieved if you either:

 - Hash the stripped results to avoid the caller having
   to deeallocate.
 - Use form for the comments that can be stripped in place, such
   as "charset|Russian" (which is perhaps less clear to the
   translator that "Russian" is an adjective that should be
   translated to the proper form to apply to Russian).

It certainly would be nice to not have to have the caller
have to deallocate though...

Maybe you could do:

 "Russian[ charset]|Russian"

And accept \| and \\ escapes before the |. intltool could
possibly be convinced to do checks that the translator
translated the string properly.

Comment 5 Kenneth Rohde Christiansen 2002-11-20 19:31:27 UTC

Sounds good Owen, but maybe we should mark what should be translated
instead. That seems more logical to me.

Example string                       Danish translation

Q_("search [after] files|after");  -> "efter"
Q_("[russian] charset|russian");   -> "russisk"
Q_("[view] picture|view");         -> "vis"
Q_("a [view]|view");               -> "visning"

Comment 6 Matthias Clasen 2002-11-20 21:21:39 UTC

I think the gettext manual example very clearly shows how to do this
properly: use a prefix for the context information and avoid any
string copies: "charset|Russian" or, if brackets are considered
necessary, "[charset]Russian"

We obviously need a convention for handling cases where the message
itself starts with a bracketed string. Silly example: "[ and ] are
brackets" would have to be entered as something like "[dummy context][
and ] are brackets".

I would expect translators to quickly internalize the information that
context information must not be translated if they see it come up in
the GUI of their translated apps once. And as you mentioned, Owen, the
tools could easily check for translated context information.

Comment 7 Kenneth Rohde Christiansen 2002-11-20 21:47:29 UTC

I don't see how this:

Q_("search [after] files|after");  -> "efter"

doesn't follow the gettext manual. We use the text after the | to show
in the GUI. Instead of just writing "adjective|blah" or what ever, it
is better to write the exact sentence in which it will be used.

Especially because endings change in heavy conjugated languages (for
instance finnish has 15 cases, and 2 geni afaik). So when writing more
context it is nice to have a standard convention of pointing out the
word to translate. I only suggested using [] for this. The function
doesn't need to know anything about []. The only thing it might needs
to it to do, is to allow to escape |.

Comment 8 Matthias Clasen 2002-11-20 21:59:41 UTC

Sorry Kenneth, my comment was on the inital proposal of using a suffix
for the comments. But let me ask you a question: Why would anybody
want to translate "after" as a separate string when it appears in a
larger context like "search after files" ? Surely you would translate the 
whole string as one unit, putting all available context in the
translatable message. Or did I somehow misunderstand your example ?

Comment 9 Kenneth Rohde Christiansen 2002-11-20 22:10:26 UTC

Oh well, it was just an example...probably not very well thought out :).

but it could for instance have been:

"Put thumbnail [after] filename|after"
"Put thumbnail [before] filename|before"

if there was a pulldown menu. I have seen something like that before.

Comment 10 Christian Rose 2002-11-20 22:15:12 UTC

> Why would anybody want to translate "after" as a separate string
> when it appears in a larger context like "search after files" ?
> Surely you would translate the whole string as one unit, putting
> all available context in the translatable message. Or did I somehow
> misunderstand your example ?

Happens all the time in practice. One example is "Search for [files]
in [        ] where [size] is [larger than] [     ] MB" or stuff like
 that in search dialogs, where the stuff in brackets are text fields
or drop-down boxes. There's a lot more occasions in GUI:s where
sentences (unfortunately) are in no way possible to translate in their
entirety since they have widget elements or the like in them.

Comment 11 Christian Rose 2002-11-20 22:19:52 UTC

Another common example is "Time-out after [      ] seconds." or any
other case where the unit follows.

Comment 12 Matthias Clasen 2002-11-20 22:31:31 UTC

But this is "broken as designed" from an I18N perspective anyway, isn't
it. 
What if the "widget elements" which are effectively part of the
sentence have to be reordered for the sentence to make sense in a
translation ? If you want to embed GUI elements into a sentence,
you should probably translate something like:
"Search for {files} in {location} where {size} is {larger than}
{number} MB"
to
"Suche in {location} nach {files} mit {size} {larger than} {number} MB" 
then post-process the translated string and arrange for the proper GUI
elements to be inserted in place of the {xyz} placeholders.

Comment 13 Christian Rose 2002-11-20 22:41:47 UTC

Please remember that this is just one of the badly needed uses for
this Q syntax with more context, and that this is needed now, not when
all current GUI l10n problems are solved in the future.

Matthias, are you subscribed to gnome-i18n@gnome.org? This has been
discussed for years there and been rehashed over and over, and it
seems terribly redundant to rehash everything and all consensus that
led to the Q thing in eel in this bug report.

Comment 14 Matthias Clasen 2002-11-20 23:23:09 UTC

Christian, I'm not subscribed to gnome-i18n, so I have probably missed
years of interesting discussion... I'm not opposed to the feature, I'm
only concerned that after years of discussion, you still ended up with
an implementation in eel which dups strings, when the right approach
has been explained (with example code) in the gettext manual for even
longer.

Comment 15 Christian Rose 2002-11-21 12:53:48 UTC

Syntactically, the solution proposed in the gettext manual isn't much
different, but I'd argue that the Q syntax is clearer for the translator.

But the big issue is toolkit support. We sometimes have trouble
convincing maintainers about changing even the most trivial message
code so that it will improve the situation for translators, often on
the grounds of "this will make my code ugly/I don't want to add that
much junk code/I like it the way it is/writing it like this will just
introduce memory leaks, no way". And by my own experience, asking
developers to reinvent the wheel every time, and spend time writing
bugfree additional code for something that may seem like a trivial and
nonimportant thing to people that don't directly experience the
localization problems and the impact of those, usually isn't a very
successful task. This kind of stuff needs support directly in the
programming environment.

Comment 16 Matthias Clasen 2002-11-21 14:41:08 UTC

I think the technically cleanest thing would be to introduce two-argument macros like 
Q_(msgkey,context) - no need to invent a syntax 
for encoding of context in msgkey, no 
danger of translating it, no need to strip the context out of the msgkey. This would of 
course need support in the extraction tools, which would have to put the context
as a 
comment in the pot file.

Comment 17 Kenneth Rohde Christiansen 2002-11-21 15:09:09 UTC

We still need the strings to be unique.

Comment 18 Christian Rose 2002-11-23 02:17:22 UTC

We also still need something that doesn't break gettext or any other
tool working with po files. The Q syntax in Eel doesn't do that.

Comment 19 Matthias Clasen 2003-04-03 22:52:58 UTC

I found out that xgettext almost lets you do the two-parameter approach:

with xgettext --add-comment,

 gettext("Russian" /*Russian[charset]*/)

will yield

#. Russian[charset]
msgid "Russian"
msgstr ""

Unfortunately, the comment syntax can't be hidden behind a macro,
since xgettext operates on the unpreprocessed source. And, as you
rightly pointed out, this approach doesn't solve the msgid collision
problem, so we will have to encode the context in the msgid anyway.

Here is a very simple, but efficient implementation:

#define Q_(String) g_sgettext(String)

const char *
g_strip_context (const char *msgid, 
		 const char *msgval)
{
  if (msgval == msgid)
    {
      const char *c = strchr (msgid, '|');
      if (c != NULL)
	return c + 1;
    }
  
  return msgval;
}

const char *
g_sgettext (const char *msgid)
{
  return g_strip_context (msgid, gettext (msgid));
}


Then Q_("Russian[charset]|Russian")

will come out as

msgid "Russian[charset]|Russian"
msgstr ""

and translators can (hopefully) be trained to translate only the part
after the first |. When Q_() is mixed with _(), problems can arise:

_("boolean operators are |&^")

will leave the translator puzzled whether he has to translate the part 
before the |. Possible ways to avoid this confusion are: 

1) always use Q_() (at least use Q_() whenever the message 
contains a |): then the above would have to be coded as
Q_(no context|boolean operators are |&^")

2) give hints to translators like above:
Q_("Russian[charset]|Russian"/* translate after |*/)
_("boolean operators are |&^"/* | is part of the message */)
would come out as

#. translate after |
msgid "Russian[charset]|Russian"
msgstr ""

#. | is part of the message
msgid "boolean operators are |&^"

Comment 20 Owen Taylor 2003-09-12 13:25:00 UTC

'|' occurs infrequently enough in translations, that it's probably
not a big deal what happens there. And I think in those cases,
it should be pretty clear that the part before the | doesn't
look like the a context.

Comment 21 Christian Stimming 2003-11-16 12:24:31 UTC

I see the some functions have been added to glib/gi18n.h a few days ago. Which 
format do they implement? There's no direct API documentation right in these 
header files.  
 
However, this problem would probably be even solved better if support for this is 
moved to libintl/libc. I started a little bit of discussion on 
translation-i18n@lists.sourceforge.net, but it's not yet clear whether the guys there 
can be convinced to include such a solution into gettext/libintl/libc. Here's my 
posting to that list: 
 
The real problem with this is that there is *no standard* for such 
non-ambiguous msgids. What exactly should be the po file format for the 
non-ambiguous msgids? For Qt/KDE it's "_: disambiguating comment\nmsgid", but 
if you follow the proposal in the gettext manual then it would be 
"disambiguating comment|msgid". And the bug report 
http://bugzilla.gnome.org/show_bug.cgi?id=97556 even thinks about yet another 
way and discusses either "[disambiguating comment]msgid" or 
"msgid[disambiguating comment]". And what are the parameters for the 
respective q_gettext call? For Qt/KDE it accepts two strings where one of 
them is the non-ambiguous comment and the other is the msgid. From the 
gettext manual's proposal it would accept one string just like the usual 
gettext call. 
 
So if you keep the position that this problem should be solved by each GUI 
library on their own, then each library will invent its own format for both 
the msgid format in the po file and for the parameter format of the q_gettext 
call. This will only increase the confusion over time. 
 
Instead, if *you* as gettext/libintl/libc project now introduce *one* solution 
for this, then this kind of format will be unified throughout the whole GNU 
translation community. Needless to say, this will also increase the chance 
that translators are going to handle this correctly as opposed to having to 
adapt to each project's non-ambiguous-solution format. This is why I think 
this is really important and should be solved on libintl/libc level. 
 
> > Therefore I would like to ask you, the gettext developers: Are there 
> > plans to include such a prefix_gettext() function into the gettext 
> > library? 
> 
> There is no plan to include such a function in the libintl/libc library. 
> The reason is simply that any project can write this function with 10 
> lines of code. 
 
Again, as I stated above: The problem is not the amount of code. The problem 
is that a standard format is needed. Really needed. 
 
> However, the real limitation is on the xgettext side. xgettext currently 
> can only extract "context" when it comes from a comment. Some other 
> conventions, like 
>            _("msgid", "disambiguating comment") 
> exist in other GUI toolkits (Qt), and we can talk about what can be done 
> on this side. 
 
If gettext agrees on a standard convention, then surely xgettext can provide 
an implementation for extracting these conventions. 
 
Personally I would prefer the proposal from the gettext manual: 
"disambiguating comment|msgid" and that's it. No need to change xgettext. 
Even no need to change any GUI-creation tool like glade/libglade. However,  a 
solution that keeps compatibility to Qt/KDE folks would probably be even 
better. 
 
> But first, can you please brief me on what a "context" or "disambiguating 
> comment" can look like in practice? 
 
Think of any english word that can both be a noun and a verb (e.g. "a file" 
and "to file"). Think of the fact that almost always in at least some 
languages the translation of the verb is [very] different from the 
translation of the noun (e.g. in German the noun is "Datei" and the verb is 
"ablegen"). Now think of a GUI button that is labelled with this word. Now 
think of a case where this button has the meaning of the verb, and another 
case where this button has the meaning of the noun (e.g. "File" meaning "to 
file something somewhere" as opposed to "File" meaning "do something with a 
file"). There you are -- the msgid in both cases is identical, but the msgstr 
should be different. Therefore we need a disambiguating addition in the 
msgids. In the example this can be as simple as (in gettext manual's format) 
"noun|File" and "verb|File", but you could also use the real meaning: "to 
file something somewhere|File" and so on. I hope you get the point.

Comment 22 Christian Rose 2005-01-24 18:06:26 UTC

The current implementation of this in glib has some limitations; for example, it
doesn't work in case the translator translates the context as well. This has
been put in bug 164373.