GNOME Bugzilla – Bug 569118
Use C_() instead of Q_() with context
Last modified: 2009-03-10 14:25:33 UTC
Orca is the last module using old-fashioned Q_() and we want to get rid of this entirely for 2.26. See http://live.gnome.org/GnomeGoals/MsgctxtMigration . See e.g. bug 558407 for a sample patch. String Freeze is Feb 16, 2009. See http://live.gnome.org/TwoPointTwentyfive . Getting this in soon and announcing it to gnome-i18n@ highly appreciated.
Do you have an example of how to do this in Python? The current Orca code does the equivalent of this: def Q_(s): s = _(s) s = s[s.find('|')+1:] return s That was easy to do since we controlled the whole string. I was hoping to find an example somewhere (e.g., http://bugzilla.gnome.org/show_bug.cgi?id=567103), but that bug says "This patch does not affect the relevant messages in python files -- because I don't know how to do it (yet)." :-( The bug was then eventually closed as FIXED. So, I searched all the gnome-games *.py files from trunk and didn't see what appeared to be an answer (i.e., I don't know if any of the gnome-games *.py files ever used Q_ -- I certainly don't see relevant C_ uses in there).
Uh. Good question. :-) Can you ask on gnome-i18n@ about this?
(I've sent this message to gnome-i18n as well.) This is not yet officially supported in the Python locale/gettext modules. See http://bugs.python.org/issue2504 for more information. In the mean time, you could just stick to using Q_(), since it's functionally equivalent (but a bit harder for translators to use correctly.)
(In reply to comment #1) > Do you have an example of how to do this in Python? The current Orca code does > the equivalent of this: > > def Q_(s): > s = _(s) > s = s[s.find('|')+1:] > return s Btw, I have also seen this approach: def Q_(s): return _(s).split('|', 1)[0] I'm not sure where it's used. Accerciser perhaps?
(In reply to comment #4) > Btw, I have also seen this approach: > def Q_(s): return _(s).split('|', 1)[0] > I'm not sure where it's used. Accerciser perhaps? Yes, it is. See bug #520296.
(In reply to comment #3) > (I've sent this message to gnome-i18n as well.) > > This is not yet officially supported in the Python locale/gettext modules. > See http://bugs.python.org/issue2504 for more information. > > In the mean time, you could just stick to using Q_(), since it's > functionally equivalent (but a bit harder for translators to use correctly.) Given this comment, I'm going to remove the 2.26 targets.
From gnome-i18n@: In MO files contexts are separated by U+0004 from the message proper, so I used something equivalent to this in a code I work on: def C_ (c, s): s = _(c + "\x04" + s) p = s.find("\x04") if p >= 0: # satisfied only if no translation found s = s[p + 1:] return s -- Chusslove Illich (Часлав Илић)
Hmmm. That does make sense. However, does this mean the context will end up on a line of itself in the .po file genereted by intltool/xgettext, as is the case with pgettext()/C_() in C/GLib code?
(In reply to comment #7) > def C_ (c, s): > s = _(c + "\x04" + s) > p = s.find("\x04") > if p >= 0: # satisfied only if no translation found > s = s[p + 1:] > return s And wouldn't this do the same, but without the overhead of string allocation/splitting? def C_(ctx, str): translated = _("%s\x04%s" % (ctx, str)) if "\x04" in translated: # no translation found, return input string return str return translated Oh, and I dislike string concatenation for various reasons, so I used a printf-style string.
(In reply to comment #8) > Hmmm. That does make sense. However, does this mean the context will end up on > a line of itself in the .po file genereted by intltool/xgettext, as is the case > with pgettext()/C_() in C/GLib code? As a quick check, I did this: 1) Created a C_ method as shown. 2) Found a Q_ line in Orca and added an extra C_ line that was basically a converted duplicate of the Q_ line 3) Reran intltool-update -po Here's the relevant lines from the code: text = Q_("radiobutton|selected") text = C_("radiobutton", "selected") Here's the resulting lines in the po file: #: ../src/orca/speechgenerator.py:1201 ../src/orca/where_am_I.py:226 msgid "radiobutton|selected" msgstr "" #: ../src/orca/where_am_I.py:227 msgctxt "radiobutton" msgid "selected" msgstr "" As for whether the new C_ method actually works or not, I don't know. I haven't tested it with orca running and verifying that the C_ form gives me the desired text. As a quick question: is there a mechanism to quickly convert the *.po files from the Q_ form to the C_ form (i.e., replace strings with "|" in them with separate msgctx and msgid lines)? If so, is this something I am allowed to do, or should I defer it all to the l10n team? I've been harangued in the past for touching *.po files, so I shy away from doing that.
(In reply to comment #0) > Orca is the last module using old-fashioned Q_() and we want to get rid of this > entirely for 2.26. Btw, Andre, it seems that Accerciser does it as well.
(In reply to comment #10) > Here's the resulting lines in the po file: > msgctxt "radiobutton" > msgid "selected" > msgstr "" Great, that's what it is supposed to do. > As for whether the new C_ method actually works or not, I don't know. I > haven't tested it with orca running and verifying that the C_ form gives me the > desired text. If you could check that, that would be great. Perhaps this should be document somewhere (on live.gnome.org?) so that others can use it as well. > As a quick question: is there a mechanism to quickly convert the *.po files > from the Q_ form to the C_ form (i.e., replace strings with "|" in them with > separate msgctx and msgid lines)? If so, is this something I am allowed to do, > or should I defer it all to the l10n team? I've been harangued in the past for > touching *.po files, so I shy away from doing that. It's doable using a few magical sed expressions, I suppose. I don't think it is a problem, because you are not changing anything: the translator already had the context before, and still has the same context, so there is no need to review the translations. (Things would've been different if you started adding translation contexts, for instance.) Anyway, I'd suggest asking others on gnome-i18n@ before you proceed with this. The only thing that would be worth checking is whether translators had previously messed things up by including a translated context string, followed by a | character, in the msgstr. If that is the case, the translator should fix his translation...
(In reply to comment #12) > If that is the case, the translator should fix his translation... Eh, s/his/the/. There is no reason at all to assume translators are male (though statistics suggest otherwise), and I want to be politically correct. (Sorry for the spam ;))
(In reply to comment #11) > Btw, Andre, it seems that Accerciser does it as well. Yay. Filed bug 569341 and updated the wikipage. Thanks
Here's a shell command that seems to do the trick to convert the *.po files. It basically matches msgid lines that have a | in them and replaces that with two new lines, where the first is a msgctxt line containing the first match (the stuff in the first set of parentheses) and a msgid line containing the second match (the stuff in the second set of parentheses): for foo in *.po; do mv $foo $foo.old; sed -r 's/^msgid ["](.+)[|](.+)["]$/msgctxt "\1"\nmsgid "\2"/g' $foo.old > $foo; done
(In reply to comment #15) > Here's a shell command that seems to do the trick to convert the *.po files. > It basically matches msgid lines that have a | in them and replaces that with > two new lines, where the first is a msgctxt line containing the first match > (the stuff in the first set of parentheses) and a msgid line containing the > second match (the stuff in the second set of parentheses): > > for foo in *.po; do mv $foo $foo.old; sed -r 's/^msgid > ["](.+)[|](.+)["]$/msgctxt "\1"\nmsgid "\2"/g' $foo.old > $foo; done whoops - also need to change the msgstr lines as well. coming up...
OK - this includes something to fix up the msgstr's where the context was left in the translation: for foo in *.po do mv $foo $foo.old sed -r 's/^msgid ["](.+)[|](.+)["]$/msgctxt "\1"\nmsgid "\2"/g' $foo.old | \ sed -r 's/^msgstr ["](.+)[|](.+)["]$/msgstr "\2"/g' > $foo done
Created attachment 127348 [details] [review] Patch for the source code
Created attachment 127349 [details] [review] Patch for the po files (just in case I'm allowed to touch these)
(In reply to comment #19) > Created an attachment (id=127349) [edit] > Patch for the po files (just in case I'm allowed to touch these) PS - in order for these to compile on OpenSolaris, you need the SUNWgnu-gettext package.
Thanks for figuring out the regexes! Note that sed can also be passed the -i parameter to modify files in place (we have version control for a reason!).
Created attachment 127357 [details] [review] nl.po diff (requested by Wouter)
Thanks, Willie. This approach to update PO files seems to work quite well. The nl.po patch looks good, please commit nl.po for me. Thanks again :)
(In reply to comment #22) > Created an attachment (id=127357) [edit] > nl.po diff (requested by Wouter) See also http://master.gnome.org/~wwalker/569118/ for the diff of each *.po broken out into separate files.
Hey Will, I am doing the same thing for Accerciser, should I update all the po files? Let me know what you do!
(In reply to comment #25) > Hey Will, > > I am doing the same thing for Accerciser, should I update all the po files? Let > me know what you do! I'm waiting for comments from gnome-i18n@. So far, it's looking like I should go ahead and check the *.po files in, but I'm going to wait until this afternoon to give them more time to comment. The reason I want to get the source code changes and the *.po changes in at the same time is that we have a lot of users that regularly pull from trunk. So, I'd like this to be a smooth transition for those users.
But then, it's only 2 Q_() strings for Accerciser, and a lot more for Orca. As far as I can see, Eitan has already updated the PO files in Accerciser trunk after I reviewed his patches; see bug #569341 for more information.
Yeah, it wasn't such a big change, I just went ahead and did it. I am usually reluctant at touching po files, but Wouter held my hand.
(In reply to comment #28) > Yeah, it wasn't such a big change, I just went ahead and did it. I am usually > reluctant at touching po files, but Wouter held my hand. Hey, I'm not taking responsibility either! ;) Anyway, if translators start complaining (which they likely won't), you can at least defend yourself by telling them "someone from the i18n coordination team told me it was alright!" ;) Okay, to make this message not totally off-topic: the C_ approach for Python programs should probably be documented somewhere on live.gnome.org. Does anyone know a good place for it?
Will, It would be nice if both Orca and Accerciser used the same function. The C_ function from the patch provided in comment #18 is different from what Accerciser uses already, which is: def C_(ctx, s): translated = _('%s\x04%s' % (ctx, s)) if '\x04' in translated: # no translation found, return input string return s return translated This is cleaner than the original function and voids useless string splitting. See also http://svn.gnome.org/viewvc/accerciser/trunk/src/lib/accerciser/i18n.py.in?view=markup
Created attachment 127395 [details] Test program for the C_ function from Python I wrote a small test program, which works fine on my machine using the nl_NL.UTF-8 locale. I used the Gnome Control Center translation because it was one of the few .mo files available on my machine with msgctxt specifiers. Output on my machine: $ LANG=C python i18n-gettext-context-test.py Alert sound $ LANG=nl_NL.UTF-8 python i18n-gettext-context-test.py Waarschuwingsgeluid
Created attachment 127397 [details] [review] Patch to update C_ definition (In reply to comment #30) > It would be nice if both Orca and Accerciser used the same function. The C_ > function from the patch provided in comment #18 is different from what > Accerciser uses already, which is: > > def C_(ctx, s): > translated = _('%s\x04%s' % (ctx, s)) > if '\x04' in translated: > # no translation found, return input string > return s > return translated > > This is cleaner than the original function and voids useless string splitting. Whoops! Sorry I forgot to incorporate your desired form -- I didn't do it on purpose, it just got lost in the shuffle somehow. This patch removes the inadvertent offense and does things the "Wouter Way" :-).
(In reply to comment #29) > Okay, to make this message not totally off-topic: the C_ approach for Python > programs should probably be documented somewhere on live.gnome.org. Does anyone > know a good place for it? http://live.gnome.org/GnomeGoals/MsgctxtMigration :-)
Pylinted, tested, committed. Closing. Thanks everyone!
(In reply to comment #30) > def C_(ctx, s): > translated = _('%s\x04%s' % (ctx, s)) > if '\x04' in translated: > # no translation found, return input string > return s > return translated I think it might be better (perhaps even necessary) to change the _ to gettext.gettext. Otherwise, the '%s\x04%s' string will be marked for translation. Wouter, what are your thoughts?
(In reply to comment #35) > I think it might be better (perhaps even necessary) to change the _ to > gettext.gettext. Otherwise, the '%s\x04%s' string will be marked for > translation. Wouter, what are your thoughts? Note that this came up because 'make distcheck' was ending up with tons of nasty warnings and such with the use of "_" in the file, but worked fine with the use of gettext.gettext. So, if we do use gettext.gettext, we also need to remember to update the WIKI to provide the new recommended form.
Hmmm, I think intltool/xgettext is also supposed to extract strings inside gettext() calls, since the _, N_, Q_ and C_ variants are just shorthand notation after all. Could you try something like this to fool it (untested, but trivial)? > def C_(ctx, s): > translate_func = gettext.gettext > translated = translate_func('%s\x04%s' % (ctx, s)) > if '\x04' in translated: > # no translation found, return input string > return s > return translated Also, I would be interested to see the exact warnings or error messages.
(In reply to comment #37) > Also, I would be interested to see the exact warnings or error messages. Here's what I get when using _ in orca_i18n.py, which is the file containing the definition of C_. The errors go away when I just use gettext.gettext directly: ... The following files contain translations and are currently not in use. Please consider adding these to the POTFILES.in file, located in the po/ directory. src/orca/orca_i18n.py src/orca/orca_i18n.py src/orca/orca_i18n.py If some of these files are left out on purpose then please add them to POTFILES.skip instead of POTFILES.in. A file 'missing' containing this list of left out files has been written in the current directory. if [ -r missing -o -r notexist ]; then \ exit 1; \ fi make[2]: *** [check] Error 1 make[2]: Leaving directory `/export/home/wwalker/work/orca/trunk/orca-2.25.90/_build/po' make[1]: *** [check-recursive] Error 1 make[1]: Leaving directory `/export/home/wwalker/work/orca/trunk/orca-2.25.90/_build' make: *** [distcheck] Error 2 I suppose I could put orca_i18n.py in POTFILES.skip and then cross my fingers that we never plan on putting strings in the file. But, that seems like a bad solution to me and I'd rather fix the real problem. (In reply to comment #37) > Hmmm, I think intltool/xgettext is also supposed to extract strings inside > gettext() calls, since the _, N_, Q_ and C_ variants are just shorthand > notation after all. Could you try something like this to fool it (untested, but trivial)? I'm not sure there's a need to try to fool it. The gettext.gettext form seems to work and the associated string does not end up in the *.pot file created when I run intltool-update -po. My schedule is a bit tight today, so I'll need to postpone the experimentation until after I get the 2.25.90 release out. For 2.25.90, I've modified the file to use gettext.gettext so as to avoid the distcheck failure.
(In reply to comment #38) > (In reply to comment #37) > > Also, I would be interested to see the exact warnings or error messages. > > Here's what I get when using _ in orca_i18n.py, which is the file containing > the definition of C_. The errors go away when I just use gettext.gettext > directly: > [...] > I suppose I could put orca_i18n.py in POTFILES.skip and then cross my fingers > that we never plan on putting strings in the file. But, that seems like a bad > solution to me and I'd rather fix the real problem. Probably intltool/xgettext doesn't recognize "gettext.gettext" as a "function that is used for translation" (which is a bug, since it is). Might be because it has a "." character in front or something like that. Anyway, looks like using gettext.gettext() should also considered to be "fooling". But if it works, just leave it like that for now :)
Thanks! Reclosing. :-)