Bug 85718 – UTF-8 in translatable strings

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 85718 - UTF-8 in translatable strings


Summary:	UTF-8 in translatable strings


Status:	VERIFIED FIXED

Product:	Gnumeric
Classification:	Applications
Component:	Analytics
Version:	git master
Hardware:	Other other

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Jody Goldberg
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Duplicates:	115720 (view as bug list)
Depends on:	99005
Blocks:

Reported:	2002-06-17 22:27 UTC by Christian Rose
Modified:	2009-08-15 18:40 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Christian Rose 2002-06-17 22:27:39 UTC

The file plugins/numtheory/numtheory.c contains a non-ascii character that
causes many warnings from xgettext when I update the sv.po (which is in
UTF-8) with "intltool-update sv":

xgettext: warning: The following msgid contains non-ASCII characters.
                   This will cause problems to translators who use a
character encoding
                   different from yours. Consider using a pure ASCII msgid
instead.
                   @FUNCTION=NT_MU
                   @SYNTAX=NT_MU(n)
                   @DESCRIPTION=The NT_MU function (MÃ¶bius mu function)
returns 
                   0  if @n is divisible by the square of a prime .
                   Otherwise it returns: 
                   -1 if @n has an odd  number of different prime factors .
                   1  if @n has an even number of different prime factors .
                   If @n=1 it returns 1
                   
                   @EXAMPLES=
                   @SEEALSO=NT_D, ITHPRIME, NT_PHI
xgettext: invalid multibyte sequence
xgettext: invalid multibyte sequence
xgettext: invalid multibyte sequence
xgettext: invalid multibyte sequence


Also, the character (ö) will not be displayed in the msgid in the po file,
it will simply be "Mbius".

Comment 1 Jody Goldberg 2002-06-18 00:24:19 UTC

I don't see this as a gnumeric problem.  That is properly encoded utf8 (at
least it should be) if gettext has a problem with it in version 0.11.1 then
someone needs to explain what magic incantation is necessary to include utf8 in
translatable text.

Comment 2 Jody Goldberg 2002-11-15 15:01:23 UTC

We're starting to expand our use of utf8 in translated text.
This will need to work.

Comment 3 Christian Rose 2002-11-15 16:51:13 UTC

It won't work without changes to gettext.

Comment 4 Morten Welinder 2002-11-17 19:16:08 UTC

Well, we could make the changes to intltools and filter out gettext's
warnings.

Comment 5 Christian Rose 2002-11-17 19:56:35 UTC

Perhaps, but those changes to intltool will have to be made sooner
than later.

Comment 6 Morten Welinder 2002-11-19 15:21:35 UTC

See bug 99005.

Comment 7 Christian Rose 2002-11-19 16:20:45 UTC

Anyone mind if I reopen this one, since it largely depends on bug 99005?

Comment 8 Morten Welinder 2002-11-22 14:19:02 UTC

Re-opening with changed summary.

Comment 9 Jody Goldberg 2003-06-04 03:31:27 UTC

This appears to have been addressed.  Can we close it ?

Comment 10 Christian Rose 2003-06-04 10:19:26 UTC

Doesn't seem to have been adressed. The messages are still there in
gnumeric and intltool doesn't seem to have been fixed either,
according to the bug report.

Comment 11 Morten Welinder 2003-06-08 01:57:37 UTC

I don't see anything for Gnumeric to fix here.  We need the UTF-8
in there and [old versions of] gettext setting an ASCII-only policy
is just plain misguided.

Comment 12 Christian Rose 2003-06-08 20:10:14 UTC

So explicitly require a newer version of (GNU) gettext.

Comment 13 Morten Welinder 2003-06-09 18:31:24 UTC

That's not a gnumeric issue -- maybe a intltool issue.
(We don't require anything specific about gettext beyong what other
dependencies do.  In particular, I don't think we require gnu gettext.)

Comment 14 Christian Rose 2003-06-09 18:52:23 UTC

In any case, requesting a specific behavior/feature that is known to
be only present in GNU gettext from a certain version upwards, without
making that an explicit requirement at the same time, seems broken to me.

I'm not sure intltool can be blamed for this -- it's only doing what
it's intended to do and can't be blamed for limitations in certain
gettext versions. GNU gettext cannot be blamed, since it has been
fixed to allow for this feature in newer versions (UTF-8-only source
files and po files), and the responsibility for requesting this
feature to be enabled lies in the software using it.

The logical conclusion can only be that it's up to gnumeric developers
to explicitly enable the GNU gettext feature in gnumeric, and possibly
also provide intltool patches for doing the gettext feature triggering
via intltool, and provide a requirement for GNU gettext in gnumeric.

Just putting Unicode symbols in gnumeric source file messages and
assuming (or pretending) that it should just magically work or fix
itself when it is reportedly not so, that's the seriously broken part,
IMHO. Translators don't need to be kept hostage that way.

Comment 15 Jody Goldberg 2003-06-10 03:02:09 UTC

hostage ?  Lets avoid pointless invective.

The utf8 in the translated text is going to stay, these strings are
not show stoppers and translators can ignore warnings.  eventually
intltool will support this because this is the right solution.

I'm not clear where the gettext depend belongs.  This is an extraction
issue, not runtime.

Comment 16 Christian Rose 2003-06-10 06:32:17 UTC

I'm happy if you should really believe the accusation is pointless.
Although your past comments, as well as this one, strongly confirms
that this is not the case.

These messages have been present in gnumeric in their ASCII forms for
a long time. There's nothing that would prevent these messages to have
been kept that way in gnumeric until gettext/intltool would have been
fixed to allow for non-ASCII messages, and not other way around (i.e.
changing only after fixing, and not the other way around, changing
long before even thinking about fixing).

It's true that the messages are not show stoppers, but on the same
time, our goal should be to fully localize gnumeric. Regressions on
that road are bad, and should be kept very temporary if they are at
all necessary. It's clear that this regression wasn't intended to be
temporary as there was, and seemingly still is, no incentive to fix
the problem, or get the problem fixed, by gnumeric developers.

Also, repeatedly trying to denounce the problem as only an issue of
"translators can ignore warnings" misses the basic problem, and isn't
helpful. It's not an issue of just ignoring warnings. It's an issue of
translations not working.

In addition, if these messages aren't showstoppers, one could wonder
why there was this urgent and obviously premature need to change them
to UTF-8. If the messages aren't important, they could have been kept
in ASCII for the time being.

Noone is questioning that UTF-8 is the absolutely correct road for the
future, and what should be used long-term throughout messages and
everywhere. The questioning is about introducing it in messages
without checking the current support for it and in addition ignoring
the problems currently caused, and providing no incentive to fix
those, thus introducing an unnecessary long-time regression with no
end in sight. If that's done on purpose, that's the precise definition
of "keeping hostage".

Comment 17 Janne 2003-06-10 06:50:14 UTC

Umm, if I remember correctly, all utf-8 strings in gnumeric are proper
names (Möbius, Kåre and so on). Maybe split those strings up and not
mark the names for translation? Or maybe have a small comment so
translators know what character is supposed to be there?

Comment 18 Jody Goldberg 2003-06-10 14:03:04 UTC

I'll reiterate politely, please cut down on the invective.  It is no
longer pointless, it has graduated into insulting.  This is absolutely
nothing like hostage taking and I take issue with the abuse of
langauge and moral relativism this type of usage entails.

You did answer the real question I asked.
To me this seems like an extraction not a run time issue.  Why do you
feel we need to create a library versioning dependency ?

The strings are in their correct form so that they'll start to work
when the tools do.  Without our having to monitor all the myriad
mailing lists of the various toolasets involved checking each release.
 The strings have oscillated between ascii -> latin1/utf8 many times
as dueling commits have tried to 'fix' the problem.

I'll be explicit.  They will stay in utf8 and one day in the fullness
of time the toolset will support it smoothly.  In the mean time
translators can continue to translate without restraint, and
translations will continue to work relatively smoothly.

Janne : It looks like there are comments there.  I'll tack on a
'xgettext' to ensure that they get extracted.

Comment 19 Janne 2003-06-10 14:08:09 UTC

You are right, the comments do show up - my bad. It was a while since
I actually looked at the entries.

Comment 20 Christian Rose 2003-06-10 15:00:47 UTC

> The strings are in their correct form so that they'll start to
> work when the tools do.

I find it quite astonishing that the procedure now, as you explain 
it, seems to be to put in incompatible changes, and then wait for it 
to magically work some day wrt underlying tools and libraries, just 
in order not to forget to do the changes later when there should be 
support for the changes available. Is this standard procedure for all 
things Gnumeric these days?

Why not use Bugzilla instead to keep track of currently incompatible 
changes that will need to be done later when underlying tools and 
libraries support them? This is an honest question; it really puzzles 
my mind why someone would put in incompatible changes directly into 
CVS and for an indefinite amount of time in order to not forget to do 
the changes later, instead of tracking the changes needed in a 
tracker in the meantime. Tracking currently incompatible changes in a 
tracker, until there is support for those changes and they can safely 
be committed to CVS, seems to be standard procedure for most other 
modules. I wonder why that's not the case for Gnumeric.

Comment 21 Jody Goldberg 2003-06-10 16:53:19 UTC

Its the situation for this particular issue.  The risks and costs are
insignificant to having it there.  At the very worst some minor strings may not
get translated fully and a few warnings are generated.  We're using bugzilla
too, as evidence by this bug.  However, its easy for bugs to fall below 
the attention threshold of maintaners.  Forcing us to periodically ping to see
if its been fixed and to monitor the gettext and intltool mailing lists.

Would we use this approach in all situations, no.
In my estimation it is the best fir for this one.

Comment 22 Andreas J. Guelzow 2003-06-22 17:04:19 UTC

*** Bug 115720 has been marked as a duplicate of this bug. ***

Comment 23 Jordi Mallach 2003-06-22 19:00:30 UTC

FWIW, for someone who has the latest gettext installed (0.12.1),
intltool-update <lang> will result in errors, not warnings, an abort
of the pot construction and a failure to obtain an updated po to work on.
This is what made me file #115720. Just for the record, I fully
support Christian's arguments regarding this problem. To get an
updated po, the only thing I have been able to do is to manually
remove the 3 utf strings from the sources before intltool-updating.
Most of the translators won't know how to get that far or will just
move to the next module thinking "it'll get fixed".

To fix this as Jody proposes (fixing intltool) will probably make
GNOME depend on a quite bleeding edge GNU gettext version. It doesn't
sound like something Sun or other non GNU platforms would expect...

Comment 24 Morten Welinder 2003-06-23 13:58:21 UTC

It's not as if intltool cares too much for Sun's xgettext right now...


troll:~/private/gnome/gnumeric/po> PATH=/usr/bin:$PATH intltool-update de
xgettext: illegal option -- -
xgettext: illegal option -- -
xgettext: illegal option -- -
xgettext: illegal option -- -
xgettext: illegal option -- k
xgettext: illegal option -- e
xgettext: illegal option -- y
xgettext: illegal option -- w
xgettext: illegal option -- o
xgettext: illegal option -- r
xgettext: illegal option -- -
xgettext: illegal option -- k
xgettext: illegal option -- e
xgettext: illegal option -- y
xgettext: illegal option -- w
xgettext: illegal option -- o
xgettext: illegal option -- r
xgettext: illegal option -- -
xgettext: illegal option -- k
xgettext: illegal option -- e
xgettext: illegal option -- y
xgettext: illegal option -- w
xgettext: illegal option -- o
xgettext: illegal option -- r
xgettext: illegal option -- -
xgettext: illegal option -- f
xgettext: illegal option -- i
xgettext: illegal option -- l
xgettext: illegal option -- e
xgettext: illegal option -- -
xgettext: illegal option -- f
xgettext: illegal option -- r
xgettext: illegal option -- o
Usage:  xgettext [-a [-x exclude-file]] [-jns][-c comment-tag]
        [-d default-domain] [-m prefix] [-M suffix] [-p pathname]
files ...
        xgettext -h
WARNING: It seems that none of the files in POTFILES.in contain marked
strings

Comment 25 Morten Welinder 2003-06-23 14:51:37 UTC

This *really* isn't a gnumeric problem.

The solution is to upgrade to gnu gettext 0.12.1 and to fix intltool-
update.  (To hack it, change the installed intltool-update's call to
xgettext by adding "--from-code=UTF-8".)

See also bug 99005.

Comment 26 Jody Goldberg 2003-06-24 02:16:48 UTC

The solutions discussed at guadec were to either

1) Use the  --from-code=NAME  flag of gnu gettext 0.12.1 and add a
check for it in intltool

If (1) had no portable implementation we can fall back to
2) put ascii in the message with a comment (not in utf8 due to bsd)
explaining what character it should be.  Then to have an english
translation with the utf8

Comment 27 Abel Cheung 2003-08-04 01:39:50 UTC

But I think this bug should be reopened as a reminder that problem has
not been fixed yet. Once intltool is fixed, gnumeric will need to
require the newest version of intltool.

Comment 28 Abel Cheung 2003-08-11 20:42:17 UTC

intltool has been fixed. The remaining issue is that gnumeric have to
require intltool 0.27:

AC_PROG_INTLTOOL([0.27])

otherwise, older version of intltool would just stop working with GNU
gettext 0.12.

Comment 29 Abel Cheung 2003-08-11 21:28:52 UTC

Carlos has committed it. Closing bug as resolved.