Bug 569118 – Use C_() instead of Q_() with context

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 569118 - Use C_() instead of Q_() with context


Summary:	Use C_() instead of Q_() with context


Status:	RESOLVED FIXED

Product:	orca
Classification:	Applications
Component:	i18n
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	2.26.0
Assigned To:	Willie Walker
QA Contact:

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-01-25 19:51 UTC by André Klapper
Modified:	2009-03-10 14:25 UTC

See Also:
GNOME target:	2.26.x
GNOME version:	---

Attachments
Patch for the source code (50.58 KB, patch) 2009-01-27 19:01 UTC, Willie Walker	none	Details \| Review
Patch for the po files (just in case I'm allowed to touch these) (990.81 KB, patch) 2009-01-27 19:02 UTC, Willie Walker	committed	Details \| Review
nl.po diff (requested by Wouter) (26.50 KB, patch) 2009-01-27 20:26 UTC, Willie Walker	reviewed	Details \| Review
Test program for the C_ function from Python (407 bytes, text/plain) 2009-01-28 15:02 UTC, Wouter Bolsterlee (uws)		Details
Patch to update C_ definition (50.62 KB, patch) 2009-01-28 15:02 UTC, Willie Walker	committed	Details \| Review

Description André Klapper 2009-01-25 19:51:55 UTC

Orca is the last module using old-fashioned Q_() and we want to get rid of this entirely for 2.26.

See http://live.gnome.org/GnomeGoals/MsgctxtMigration .
See e.g. bug 558407 for a sample patch.

String Freeze is Feb 16, 2009. See http://live.gnome.org/TwoPointTwentyfive .
Getting this in soon and announcing it to gnome-i18n@ highly appreciated.

Comment 1 Willie Walker 2009-01-25 23:02:03 UTC

Do you have an example of how to do this in Python?  The current Orca code does the equivalent of this:

def Q_(s):
    s = _(s)
    s = s[s.find('|')+1:]
    return s

That was easy to do since we controlled the whole string.  I was hoping to find an example somewhere (e.g., http://bugzilla.gnome.org/show_bug.cgi?id=567103), but that bug says "This patch does not affect the relevant messages in python files -- because I don't know how to do it (yet)."  :-(  The bug was then eventually closed as FIXED.  So, I searched all the gnome-games *.py files from trunk and didn't see what appeared to be an answer (i.e., I don't know if any of the gnome-games *.py files ever used Q_ -- I certainly don't see relevant C_ uses in there).

Comment 2 André Klapper 2009-01-25 23:33:12 UTC

Uh. Good question. :-)
Can you ask on gnome-i18n@ about this?

Comment 3 Wouter Bolsterlee (uws) 2009-01-26 12:34:12 UTC

(I've sent this message to gnome-i18n as well.)

This is not yet officially supported in the Python locale/gettext modules. 
See http://bugs.python.org/issue2504 for more information.

In the mean time, you could just stick to using Q_(), since it's
functionally equivalent (but a bit harder for translators to use correctly.)

Comment 4 Wouter Bolsterlee (uws) 2009-01-26 12:37:02 UTC

(In reply to comment #1)
> Do you have an example of how to do this in Python?  The current Orca code does
> the equivalent of this:
> 
> def Q_(s):
>     s = _(s)
>     s = s[s.find('|')+1:]
>     return s

Btw, I have also seen this approach:

  def Q_(s):
      return _(s).split('|', 1)[0]

I'm not sure where it's used. Accerciser perhaps?

Comment 5 Wouter Bolsterlee (uws) 2009-01-26 14:42:51 UTC

(In reply to comment #4)
> Btw, I have also seen this approach:
>   def Q_(s): return _(s).split('|', 1)[0]
> I'm not sure where it's used. Accerciser perhaps?

Yes, it is. See bug #520296.

Comment 6 Willie Walker 2009-01-26 14:55:16 UTC

(In reply to comment #3)
> (I've sent this message to gnome-i18n as well.)
> 
> This is not yet officially supported in the Python locale/gettext modules. 
> See http://bugs.python.org/issue2504 for more information.
> 
> In the mean time, you could just stick to using Q_(), since it's
> functionally equivalent (but a bit harder for translators to use correctly.)

Given this comment, I'm going to remove the 2.26 targets.

Comment 7 Willie Walker 2009-01-27 14:07:02 UTC

From gnome-i18n@:

In MO files contexts are separated by U+0004 from the message proper, so I
used something equivalent to this in a code I work on:

  def C_ (c, s):
      s = _(c + "\x04" + s)
      p = s.find("\x04")
      if p >= 0: # satisfied only if no translation found
          s = s[p + 1:]
      return s

-- 
Chusslove Illich (Часлав Илић)

Comment 8 Wouter Bolsterlee (uws) 2009-01-27 14:13:58 UTC

Hmmm. That does make sense. However, does this mean the context will end up on a line of itself in the .po file genereted by intltool/xgettext, as is the case with pgettext()/C_() in C/GLib code?

Comment 9 Wouter Bolsterlee (uws) 2009-01-27 14:18:23 UTC

(In reply to comment #7)
>   def C_ (c, s):
>       s = _(c + "\x04" + s)
>       p = s.find("\x04")
>       if p >= 0: # satisfied only if no translation found
>           s = s[p + 1:]
>       return s

And wouldn't this do the same, but without the overhead of string allocation/splitting?


def C_(ctx, str):
    translated = _("%s\x04%s" % (ctx, str))
    if "\x04" in translated:
        # no translation found, return input string
        return str
    return translated


Oh, and I dislike string concatenation for various reasons, so I used a printf-style string.

Comment 10 Willie Walker 2009-01-27 14:33:43 UTC

(In reply to comment #8)
> Hmmm. That does make sense. However, does this mean the context will end up on
> a line of itself in the .po file genereted by intltool/xgettext, as is the case
> with pgettext()/C_() in C/GLib code?

As a quick check, I did this:

1) Created a C_ method as shown.
2) Found a Q_ line in Orca and added an extra C_ line that was basically a converted duplicate of the Q_ line
3) Reran intltool-update -po

Here's the relevant lines from the code:

            text = Q_("radiobutton|selected")
            text = C_("radiobutton", "selected")

Here's the resulting lines in the po file:

#: ../src/orca/speechgenerator.py:1201 ../src/orca/where_am_I.py:226
msgid "radiobutton|selected"
msgstr ""

#: ../src/orca/where_am_I.py:227
msgctxt "radiobutton"
msgid "selected"
msgstr ""

As for whether the new C_ method actually works or not, I don't know.  I haven't tested it with orca running and verifying that the C_ form gives me the desired text.

As a quick question: is there a mechanism to quickly convert the *.po files from the Q_ form to the C_ form (i.e., replace strings with "|" in them with separate msgctx and msgid lines)?  If so, is this something I am allowed to do, or should I defer it all to the l10n team?  I've been harangued in the past for touching *.po files, so I shy away from doing that.

Comment 11 Wouter Bolsterlee (uws) 2009-01-27 14:39:35 UTC

(In reply to comment #0)
> Orca is the last module using old-fashioned Q_() and we want to get rid of this
> entirely for 2.26.

Btw, Andre, it seems that Accerciser does it as well.

Comment 12 Wouter Bolsterlee (uws) 2009-01-27 14:45:32 UTC

(In reply to comment #10)
> Here's the resulting lines in the po file:
> msgctxt "radiobutton"
> msgid "selected"
> msgstr ""

Great, that's what it is supposed to do.

> As for whether the new C_ method actually works or not, I don't know.  I
> haven't tested it with orca running and verifying that the C_ form gives me the
> desired text.

If you could check that, that would be great. Perhaps this should be document somewhere (on live.gnome.org?) so that others can use it as well.

> As a quick question: is there a mechanism to quickly convert the *.po files
> from the Q_ form to the C_ form (i.e., replace strings with "|" in them with
> separate msgctx and msgid lines)?  If so, is this something I am allowed to do,
> or should I defer it all to the l10n team?  I've been harangued in the past for
> touching *.po files, so I shy away from doing that.

It's doable using a few magical sed expressions, I suppose. I don't think it is a problem, because you are not changing anything: the translator already had the context before, and still has the same context, so there is no need to review the translations. (Things would've been different if you started adding translation contexts, for instance.) Anyway, I'd suggest asking others on gnome-i18n@ before you proceed with this.

The only thing that would be worth checking is whether translators had previously messed things up by including a translated context string, followed by a | character, in the msgstr. If that is the case, the translator should fix his translation...

Comment 13 Wouter Bolsterlee (uws) 2009-01-27 14:48:00 UTC

(In reply to comment #12)
> If that is the case, the translator should fix his translation...

Eh, s/his/the/. There is no reason at all to assume translators are male (though statistics suggest otherwise), and I want to be politically correct.

(Sorry for the spam ;))

Comment 14 André Klapper 2009-01-27 14:48:50 UTC

(In reply to comment #11)
> Btw, Andre, it seems that Accerciser does it as well.

Yay. Filed bug 569341 and updated the wikipage. Thanks

Comment 15 Willie Walker 2009-01-27 18:19:46 UTC

Here's a shell command that seems to do the trick to convert the *.po files.  It basically matches msgid lines that have a | in them and replaces that with two new lines, where the first is a msgctxt line containing the first match (the stuff in the first set of parentheses) and a msgid line containing the second match (the stuff in the second set of parentheses):

for foo in *.po; do mv $foo $foo.old; sed -r 's/^msgid ["](.+)[|](.+)["]$/msgctxt "\1"\nmsgid "\2"/g' $foo.old > $foo; done

Comment 16 Willie Walker 2009-01-27 18:24:16 UTC

(In reply to comment #15)
> Here's a shell command that seems to do the trick to convert the *.po files. 
> It basically matches msgid lines that have a | in them and replaces that with
> two new lines, where the first is a msgctxt line containing the first match
> (the stuff in the first set of parentheses) and a msgid line containing the
> second match (the stuff in the second set of parentheses):
> 
> for foo in *.po; do mv $foo $foo.old; sed -r 's/^msgid
> ["](.+)[|](.+)["]$/msgctxt "\1"\nmsgid "\2"/g' $foo.old > $foo; done

whoops - also need to change the msgstr lines as well.  coming up...

Comment 17 Willie Walker 2009-01-27 18:36:44 UTC

OK - this includes something to fix up the msgstr's where the context was left in the translation:

for foo in *.po
do 
    mv $foo $foo.old
    sed -r 's/^msgid ["](.+)[|](.+)["]$/msgctxt "\1"\nmsgid "\2"/g' $foo.old | \
    sed -r 's/^msgstr ["](.+)[|](.+)["]$/msgstr "\2"/g' > $foo
done

Comment 18 Willie Walker 2009-01-27 19:01:54 UTC

Created attachment 127348 [details] [review]
Patch for the source code

Comment 19 Willie Walker 2009-01-27 19:02:48 UTC

Created attachment 127349 [details] [review]
Patch for the po files (just in case I'm allowed to touch these)

Comment 20 Willie Walker 2009-01-27 19:08:00 UTC

(In reply to comment #19)
> Created an attachment (id=127349) [edit]
> Patch for the po files (just in case I'm allowed to touch these)

PS - in order for these to compile on OpenSolaris, you need the SUNWgnu-gettext package.

Comment 21 Wouter Bolsterlee (uws) 2009-01-27 19:43:30 UTC

Thanks for figuring out the regexes! Note that sed can also be passed the -i parameter to modify files in place (we have version control for a reason!).

Comment 22 Willie Walker 2009-01-27 20:26:54 UTC

Created attachment 127357 [details] [review]
nl.po diff (requested by Wouter)

Comment 23 Wouter Bolsterlee (uws) 2009-01-27 20:34:39 UTC

Thanks, Willie. This approach to update PO files seems to work quite well. The nl.po patch looks good, please commit nl.po for me. Thanks again :)

Comment 24 Willie Walker 2009-01-27 21:24:11 UTC

(In reply to comment #22)
> Created an attachment (id=127357) [edit]
> nl.po diff (requested by Wouter)

See also http://master.gnome.org/~wwalker/569118/ for the diff of each *.po broken out into separate files.

Comment 25 Eitan Isaacson 2009-01-28 11:50:59 UTC

Hey Will,

I am doing the same thing for Accerciser, should I update all the po files? Let me know what you do!

Comment 26 Willie Walker 2009-01-28 13:19:19 UTC

(In reply to comment #25)
> Hey Will,
> 
> I am doing the same thing for Accerciser, should I update all the po files? Let
> me know what you do!

I'm waiting for comments from gnome-i18n@.  So far, it's looking like I should go ahead and check the *.po files in, but I'm going to wait until this afternoon to give them more time to comment.

The reason I want to get the source code changes and the *.po changes in at the same time is that we have a lot of users that regularly pull from trunk.  So, I'd like this to be a smooth transition for those users.

Comment 27 Wouter Bolsterlee (uws) 2009-01-28 14:04:23 UTC

But then, it's only 2 Q_() strings for Accerciser, and a lot more for Orca. As far as I can see, Eitan has already updated the PO files in Accerciser trunk after I reviewed his patches; see bug #569341 for more information.

Comment 28 Eitan Isaacson 2009-01-28 14:12:42 UTC

Yeah, it wasn't such a big change, I just went ahead and did it. I am usually reluctant at touching po files, but Wouter held my hand.

Comment 29 Wouter Bolsterlee (uws) 2009-01-28 14:36:32 UTC

(In reply to comment #28)
> Yeah, it wasn't such a big change, I just went ahead and did it. I am usually
> reluctant at touching po files, but Wouter held my hand.

Hey, I'm not taking responsibility either! ;)

Anyway, if translators start complaining (which they likely won't), you can at least defend yourself by telling them "someone from the i18n coordination team told me it was alright!" ;)

Okay, to make this message not totally off-topic: the C_ approach for Python programs should probably be documented somewhere on live.gnome.org. Does anyone know a good place for it?

Comment 30 Wouter Bolsterlee (uws) 2009-01-28 14:44:39 UTC

Will,

It would be nice if both Orca and Accerciser used the same function. The C_
function from the patch provided in comment #18  is different from what
Accerciser uses already, which is:

def C_(ctx, s):
    translated = _('%s\x04%s' % (ctx, s))
    if '\x04' in translated:
        # no translation found, return input string
        return s
    return translated

This is cleaner than the original function and voids useless string splitting.

See also
http://svn.gnome.org/viewvc/accerciser/trunk/src/lib/accerciser/i18n.py.in?view=markup

Comment 31 Wouter Bolsterlee (uws) 2009-01-28 15:02:08 UTC

Created attachment 127395 [details]
Test program for the C_ function from Python

I wrote a small test program, which works fine on my machine using the nl_NL.UTF-8 locale. I used the Gnome Control Center translation because it was one of the few .mo files available on my machine with msgctxt specifiers.

Output on my machine:

$ LANG=C python i18n-gettext-context-test.py 
Alert sound
$ LANG=nl_NL.UTF-8 python i18n-gettext-context-test.py 
Waarschuwingsgeluid

Comment 32 Willie Walker 2009-01-28 15:02:25 UTC

Created attachment 127397 [details] [review]
Patch to update C_ definition

(In reply to comment #30)
> It would be nice if both Orca and Accerciser used the same function. The C_
> function from the patch provided in comment #18  is different from what
> Accerciser uses already, which is:
> 
> def C_(ctx, s):
>     translated = _('%s\x04%s' % (ctx, s))
>     if '\x04' in translated:
>         # no translation found, return input string
>         return s
>     return translated
> 
> This is cleaner than the original function and voids useless string splitting.

Whoops!  Sorry I forgot to incorporate your desired form -- I didn't do it on purpose, it just got lost in the shuffle somehow.  This patch removes the inadvertent offense and does things the "Wouter Way"  :-).

Comment 33 Willie Walker 2009-01-28 15:34:41 UTC

(In reply to comment #29)
> Okay, to make this message not totally off-topic: the C_ approach for Python
> programs should probably be documented somewhere on live.gnome.org. Does anyone
> know a good place for it?

http://live.gnome.org/GnomeGoals/MsgctxtMigration

:-)

Comment 34 Willie Walker 2009-01-28 17:55:30 UTC

Pylinted, tested, committed.  Closing.  Thanks everyone!

Comment 35 Willie Walker 2009-02-02 17:14:03 UTC

(In reply to comment #30)
> def C_(ctx, s):
>     translated = _('%s\x04%s' % (ctx, s))
>     if '\x04' in translated:
>         # no translation found, return input string
>         return s
>     return translated

I think it might be better (perhaps even necessary) to change the _ to gettext.gettext.  Otherwise, the '%s\x04%s' string will be marked for translation.  Wouter, what are your thoughts?

Comment 36 Willie Walker 2009-02-02 17:25:39 UTC

(In reply to comment #35)
> I think it might be better (perhaps even necessary) to change the _ to
> gettext.gettext.  Otherwise, the '%s\x04%s' string will be marked for
> translation.  Wouter, what are your thoughts?

Note that this came up because 'make distcheck' was ending up with tons of nasty warnings and such with the use of "_" in the file, but worked fine with the use of gettext.gettext.  So, if we do use gettext.gettext, we also need to remember to update the WIKI to provide the new recommended form.

Comment 37 Wouter Bolsterlee (uws) 2009-02-02 17:30:35 UTC

Hmmm, I think intltool/xgettext is also supposed to extract strings inside gettext() calls, since the _, N_, Q_ and C_ variants are just shorthand notation after all. Could you try something like this to fool it (untested, but trivial)?

> def C_(ctx, s):
>     translate_func = gettext.gettext
>     translated = translate_func('%s\x04%s' % (ctx, s))
>     if '\x04' in translated:
>         # no translation found, return input string
>         return s
>     return translated

Also, I would be interested to see the exact warnings or error messages.

Comment 38 Willie Walker 2009-02-02 17:56:52 UTC

(In reply to comment #37)
> Also, I would be interested to see the exact warnings or error messages.

Here's what I get when using _ in orca_i18n.py, which is the file containing the definition of C_.  The errors go away when I just use gettext.gettext directly:

...
The following files contain translations and are currently not in use. Please consider adding these to the POTFILES.in file, located in the po/ directory.

src/orca/orca_i18n.py
src/orca/orca_i18n.py
src/orca/orca_i18n.py

If some of these files are left out on purpose then please add them to
POTFILES.skip instead of POTFILES.in. A file 'missing' containing this list
of left out files has been written in the current directory.
if [ -r missing -o -r notexist ]; then \
	  exit 1; \
	fi
make[2]: *** [check] Error 1
make[2]: Leaving directory `/export/home/wwalker/work/orca/trunk/orca-2.25.90/_build/po'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/export/home/wwalker/work/orca/trunk/orca-2.25.90/_build'
make: *** [distcheck] Error 2

I suppose I could put orca_i18n.py in POTFILES.skip and then cross my fingers that we never plan on putting strings in the file.  But, that seems like a bad solution to me and I'd rather fix the real problem.

(In reply to comment #37)
> Hmmm, I think intltool/xgettext is also supposed to extract strings inside
> gettext() calls, since the _, N_, Q_ and C_ variants are just shorthand
> notation after all. Could you try something like this to fool it (untested, but trivial)?

I'm not sure there's a need to try to fool it.  The gettext.gettext form seems to work and the associated string does not end up in the *.pot file created when I run intltool-update -po.

My schedule is a bit tight today, so I'll need to postpone the experimentation until after I get the 2.25.90 release out.  For 2.25.90, I've modified the file to use gettext.gettext so as to avoid the distcheck failure.

Comment 39 Wouter Bolsterlee (uws) 2009-02-02 18:04:23 UTC

(In reply to comment #38)
> (In reply to comment #37)
> > Also, I would be interested to see the exact warnings or error messages.
> 
> Here's what I get when using _ in orca_i18n.py, which is the file containing
> the definition of C_.  The errors go away when I just use gettext.gettext
> directly:
> [...]
> I suppose I could put orca_i18n.py in POTFILES.skip and then cross my fingers
> that we never plan on putting strings in the file.  But, that seems like a bad
> solution to me and I'd rather fix the real problem.

Probably intltool/xgettext doesn't recognize "gettext.gettext" as a "function that is used for translation" (which is a bug, since it is). Might be because it has a "." character in front or something like that. Anyway, looks like using gettext.gettext() should also considered to be "fooling". But if it works, just leave it like that for now :)

Comment 40 Willie Walker 2009-02-02 18:14:30 UTC

Thanks!  Reclosing.  :-)