GNOME Bugzilla – Bug 164373
Glib should support context being included in translations
Last modified: 2011-02-18 16:14:16 UTC
It seems the current implementation of the translation context support in glib does not permit the context to be included in the actual translation as well. E.g. for the original message msgid "keyboard|Key" you can, and will, end up with all the following possible variants in translations: msgstr "keyboard|Tangent" msgstr "tangentbord|Tangent" msgstr "Tangent" According to current glib behavior, only the last one of the above examples is valid. The previous eel implementation was robust enough as to allow for all these variants, by stripping everything before and including the | also in the translation. That strategy is much more robust since things will not break if the translator chooses to keep the context in his or her translation, or even translates it by accident, not realizing that this is simply a context marker. I suggest that glib is changed to strip everything before and including the first | marker also in the translated message.
What if the string after the | contains another |?
Then translation will have to have context marker included as well. Though, such cases would be extremely rare (a message must have marked up context, and contain "|" -- the messages with "|" are rare themselves, which was one of rationales for choosing it for a separator in the first place, right?). Translators would probably end up translating most of the messages with context included (either translated or not), so it won't be a problem in practice, IMO. We don't need to educate translators on anything if we allow translations with context included, since it's a natural thing to translate entire message.
If we can get annotations for messages marked up with Q_() in the po files, then we can make whatever tool is currently used to check that translations behave properly wrt to % also check that translations contain one less | than the prefixed ids.
It is a pity that the comments do not have some kind of prefix. In kde they use _: blah\n. This makes it easy to do PO syntax highlighting eg in vim that marks those strings so that they look like comments giving translators some hint that they shouldn't be translated. In the translate toolkit we can handle special cases so we would implmenet this for the Gnome case. That is we need to eliminate these strings before we can run pofilter checks on them. We will of course have the same problem mentioned in #1 and #2 In our pocount scripts we now also need to introduce a --gnome specific option to trap these comments and not count them. Which is a pitty and could have been avoided with a more distinct delimeter. So those are some reasons to add a prefix. Also a delimeter that is more identifiable then | such as _| or anything really that is unlikely to occur in any message. Combined with a prefix it would be bullet proof.
Just for the recocord, I'm asking Bruno (GNU gettext maintainer) about adding a marker for such messages: http://sourceforge.net/mailarchive/message.php?msg_id=10586243 Dwayne, you used to be subscribed to translation-i18n, so you can probably comment on that there as well.
Danilo, thanks for sending that request upstream. I thought I'd mention that an additional prefix would merely be a convention, no code changes are needed. Q_("[some prefix]|here is the real stuff") Q_("_:some prefix|here is the real stuff") works with the current code
Please keep track of compatibility in any proposed changes made here; we can't make Q_() do something incompatible with what it does currently.
This means that this bug is unfixable: I don't see a way to allow context in translated messages, while also not breaking for messages which already contain "|" and have context. Could this perhaps be fixed as "a bug in specification", or would that still require new API addition? This definitelly affects very small number of cases (any at all in practice?), but it's still a change in semantics. Christian, do you think having "x-glib-context" flag (what Dwayne proposed) on messages is sufficient for translators (provided we get GNU gettext to allow setting it, and then get on with PO editting tools to respect it)?
We could only add a Q_me_harder() macro with different semantics, but I personally think getting the messages marked with a comment in the po file should be enough
I work everyday with translations in a lot of languages as part of my job, and I can testify that translators translating the whle msgid, that is, including any context stuff, is a *very* common thing; in fact almost any new translator is likely to do it; so the requested change will be indeed a big improvement in robustness (and therefore, in usability). There isn't any compatibility problem at all; when the context doesn't appear in the translation, the behaviour is exacly the same. When the context does appear in translation, well, current behaviour displays a wrong text, proposed behaviour will display a correct text. As for the "|" appearing on translations; why would someone add it if it isn't in the msgid? No language on Earth uses the "|" as a normal character to write the language (like apostrophes, dashes, dots, colons, etc. are), a "|" will only appear on msgstr if there is one in msgid, so that is a concern for an unlikely situation; and in the very unlikely case where it is needed, the solution is simple, just put another "|" at the beginning of the string, eg: msgstr "|some odd|string" will display "some odd|string". To Matthias Clasen: using "_:some prefix|blabla" isn't a good idea; the format used by KDE is "_:some prefix\nblabla"; either adopt it exactly, or just stick with the simpler "|" only delimiter having a similar looking yet different style will only add more trouble. (I know technically "some prefix|blabla" and "_:some prefix|blabla" are the same, but the latter should be discouraged, as it will only introduce confusion)
Gdm uses | to other purpose. Do we have take it into account?
No, it doesn't use Q_() on those strings.
See http://mail.gnome.org/archives/gnome-i18n/2005-December/msg00087.html
C_() and msgctxt are the future, so I am going to close this one