GNOME Bugzilla – Bug 85718
UTF-8 in translatable strings
Last modified: 2009-08-15 18:40:50 UTC
The file plugins/numtheory/numtheory.c contains a non-ascii character that causes many warnings from xgettext when I update the sv.po (which is in UTF-8) with "intltool-update sv": xgettext: warning: The following msgid contains non-ASCII characters. This will cause problems to translators who use a character encoding different from yours. Consider using a pure ASCII msgid instead. @FUNCTION=NT_MU @SYNTAX=NT_MU(n) @DESCRIPTION=The NT_MU function (Möbius mu function) returns 0 if @n is divisible by the square of a prime . Otherwise it returns: -1 if @n has an odd number of different prime factors . 1 if @n has an even number of different prime factors . If @n=1 it returns 1 @EXAMPLES= @SEEALSO=NT_D, ITHPRIME, NT_PHI xgettext: invalid multibyte sequence xgettext: invalid multibyte sequence xgettext: invalid multibyte sequence xgettext: invalid multibyte sequence Also, the character (ö) will not be displayed in the msgid in the po file, it will simply be "Mbius".
I don't see this as a gnumeric problem. That is properly encoded utf8 (at least it should be) if gettext has a problem with it in version 0.11.1 then someone needs to explain what magic incantation is necessary to include utf8 in translatable text.
We're starting to expand our use of utf8 in translated text. This will need to work.
It won't work without changes to gettext.
Well, we could make the changes to intltools and filter out gettext's warnings.
Perhaps, but those changes to intltool will have to be made sooner than later.
See bug 99005.
Anyone mind if I reopen this one, since it largely depends on bug 99005?
Re-opening with changed summary.
This appears to have been addressed. Can we close it ?
Doesn't seem to have been adressed. The messages are still there in gnumeric and intltool doesn't seem to have been fixed either, according to the bug report.
I don't see anything for Gnumeric to fix here. We need the UTF-8 in there and [old versions of] gettext setting an ASCII-only policy is just plain misguided.
So explicitly require a newer version of (GNU) gettext.
That's not a gnumeric issue -- maybe a intltool issue. (We don't require anything specific about gettext beyong what other dependencies do. In particular, I don't think we require gnu gettext.)
In any case, requesting a specific behavior/feature that is known to be only present in GNU gettext from a certain version upwards, without making that an explicit requirement at the same time, seems broken to me. I'm not sure intltool can be blamed for this -- it's only doing what it's intended to do and can't be blamed for limitations in certain gettext versions. GNU gettext cannot be blamed, since it has been fixed to allow for this feature in newer versions (UTF-8-only source files and po files), and the responsibility for requesting this feature to be enabled lies in the software using it. The logical conclusion can only be that it's up to gnumeric developers to explicitly enable the GNU gettext feature in gnumeric, and possibly also provide intltool patches for doing the gettext feature triggering via intltool, and provide a requirement for GNU gettext in gnumeric. Just putting Unicode symbols in gnumeric source file messages and assuming (or pretending) that it should just magically work or fix itself when it is reportedly not so, that's the seriously broken part, IMHO. Translators don't need to be kept hostage that way.
hostage ? Lets avoid pointless invective. The utf8 in the translated text is going to stay, these strings are not show stoppers and translators can ignore warnings. eventually intltool will support this because this is the right solution. I'm not clear where the gettext depend belongs. This is an extraction issue, not runtime.
I'm happy if you should really believe the accusation is pointless. Although your past comments, as well as this one, strongly confirms that this is not the case. These messages have been present in gnumeric in their ASCII forms for a long time. There's nothing that would prevent these messages to have been kept that way in gnumeric until gettext/intltool would have been fixed to allow for non-ASCII messages, and not other way around (i.e. changing only after fixing, and not the other way around, changing long before even thinking about fixing). It's true that the messages are not show stoppers, but on the same time, our goal should be to fully localize gnumeric. Regressions on that road are bad, and should be kept very temporary if they are at all necessary. It's clear that this regression wasn't intended to be temporary as there was, and seemingly still is, no incentive to fix the problem, or get the problem fixed, by gnumeric developers. Also, repeatedly trying to denounce the problem as only an issue of "translators can ignore warnings" misses the basic problem, and isn't helpful. It's not an issue of just ignoring warnings. It's an issue of translations not working. In addition, if these messages aren't showstoppers, one could wonder why there was this urgent and obviously premature need to change them to UTF-8. If the messages aren't important, they could have been kept in ASCII for the time being. Noone is questioning that UTF-8 is the absolutely correct road for the future, and what should be used long-term throughout messages and everywhere. The questioning is about introducing it in messages without checking the current support for it and in addition ignoring the problems currently caused, and providing no incentive to fix those, thus introducing an unnecessary long-time regression with no end in sight. If that's done on purpose, that's the precise definition of "keeping hostage".
Umm, if I remember correctly, all utf-8 strings in gnumeric are proper names (Möbius, Kåre and so on). Maybe split those strings up and not mark the names for translation? Or maybe have a small comment so translators know what character is supposed to be there?
I'll reiterate politely, please cut down on the invective. It is no longer pointless, it has graduated into insulting. This is absolutely nothing like hostage taking and I take issue with the abuse of langauge and moral relativism this type of usage entails. You did answer the real question I asked. To me this seems like an extraction not a run time issue. Why do you feel we need to create a library versioning dependency ? The strings are in their correct form so that they'll start to work when the tools do. Without our having to monitor all the myriad mailing lists of the various toolasets involved checking each release. The strings have oscillated between ascii -> latin1/utf8 many times as dueling commits have tried to 'fix' the problem. I'll be explicit. They will stay in utf8 and one day in the fullness of time the toolset will support it smoothly. In the mean time translators can continue to translate without restraint, and translations will continue to work relatively smoothly. Janne : It looks like there are comments there. I'll tack on a 'xgettext' to ensure that they get extracted.
You are right, the comments do show up - my bad. It was a while since I actually looked at the entries.
> The strings are in their correct form so that they'll start to > work when the tools do. I find it quite astonishing that the procedure now, as you explain it, seems to be to put in incompatible changes, and then wait for it to magically work some day wrt underlying tools and libraries, just in order not to forget to do the changes later when there should be support for the changes available. Is this standard procedure for all things Gnumeric these days? Why not use Bugzilla instead to keep track of currently incompatible changes that will need to be done later when underlying tools and libraries support them? This is an honest question; it really puzzles my mind why someone would put in incompatible changes directly into CVS and for an indefinite amount of time in order to not forget to do the changes later, instead of tracking the changes needed in a tracker in the meantime. Tracking currently incompatible changes in a tracker, until there is support for those changes and they can safely be committed to CVS, seems to be standard procedure for most other modules. I wonder why that's not the case for Gnumeric.
Its the situation for this particular issue. The risks and costs are insignificant to having it there. At the very worst some minor strings may not get translated fully and a few warnings are generated. We're using bugzilla too, as evidence by this bug. However, its easy for bugs to fall below the attention threshold of maintaners. Forcing us to periodically ping to see if its been fixed and to monitor the gettext and intltool mailing lists. Would we use this approach in all situations, no. In my estimation it is the best fir for this one.
*** Bug 115720 has been marked as a duplicate of this bug. ***
FWIW, for someone who has the latest gettext installed (0.12.1), intltool-update <lang> will result in errors, not warnings, an abort of the pot construction and a failure to obtain an updated po to work on. This is what made me file #115720. Just for the record, I fully support Christian's arguments regarding this problem. To get an updated po, the only thing I have been able to do is to manually remove the 3 utf strings from the sources before intltool-updating. Most of the translators won't know how to get that far or will just move to the next module thinking "it'll get fixed". To fix this as Jody proposes (fixing intltool) will probably make GNOME depend on a quite bleeding edge GNU gettext version. It doesn't sound like something Sun or other non GNU platforms would expect...
It's not as if intltool cares too much for Sun's xgettext right now... troll:~/private/gnome/gnumeric/po> PATH=/usr/bin:$PATH intltool-update de xgettext: illegal option -- - xgettext: illegal option -- - xgettext: illegal option -- - xgettext: illegal option -- - xgettext: illegal option -- k xgettext: illegal option -- e xgettext: illegal option -- y xgettext: illegal option -- w xgettext: illegal option -- o xgettext: illegal option -- r xgettext: illegal option -- - xgettext: illegal option -- k xgettext: illegal option -- e xgettext: illegal option -- y xgettext: illegal option -- w xgettext: illegal option -- o xgettext: illegal option -- r xgettext: illegal option -- - xgettext: illegal option -- k xgettext: illegal option -- e xgettext: illegal option -- y xgettext: illegal option -- w xgettext: illegal option -- o xgettext: illegal option -- r xgettext: illegal option -- - xgettext: illegal option -- f xgettext: illegal option -- i xgettext: illegal option -- l xgettext: illegal option -- e xgettext: illegal option -- - xgettext: illegal option -- f xgettext: illegal option -- r xgettext: illegal option -- o Usage: xgettext [-a [-x exclude-file]] [-jns][-c comment-tag] [-d default-domain] [-m prefix] [-M suffix] [-p pathname] files ... xgettext -h WARNING: It seems that none of the files in POTFILES.in contain marked strings
This *really* isn't a gnumeric problem. The solution is to upgrade to gnu gettext 0.12.1 and to fix intltool- update. (To hack it, change the installed intltool-update's call to xgettext by adding "--from-code=UTF-8".) See also bug 99005.
The solutions discussed at guadec were to either 1) Use the --from-code=NAME flag of gnu gettext 0.12.1 and add a check for it in intltool If (1) had no portable implementation we can fall back to 2) put ascii in the message with a comment (not in utf8 due to bsd) explaining what character it should be. Then to have an english translation with the utf8
But I think this bug should be reopened as a reminder that problem has not been fixed yet. Once intltool is fixed, gnumeric will need to require the newest version of intltool.
intltool has been fixed. The remaining issue is that gnumeric have to require intltool 0.27: AC_PROG_INTLTOOL([0.27]) otherwise, older version of intltool would just stop working with GNU gettext 0.12.
Carlos has committed it. Closing bug as resolved.