Bug 124857 – GnomeMeeting shouldn't include UTF-8 symbols in msgids

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 124857 - GnomeMeeting shouldn't include UTF-8 symbols in msgids


Summary:	GnomeMeeting shouldn't include UTF-8 symbols in msgids


Status:	RESOLVED FIXED

Product:	ekiga
Classification:	Applications
Component:	general
Version:	GIT master
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Damien Sandras
QA Contact:	Ekiga maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-10-17 18:13 UTC by Christian Rose
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Christian Rose 2003-10-17 18:14:09 UTC

From src/codec_info.h:63:

  {"G.711-uLaw-64k", _("G.711 is the international standard for encoding
telephone audio on 64 kbps channel. It is a pulse code modulation (PCM)
scheme operating at 8 kHz sample rate, with 8 bits per sample, fully
meeting ITU-T recommendations. This standard has two forms, A-Law and
µ-Law. µ-Law G.711 PCM encoder converts 14 bit linear PCM samples into 8
bit compressed PCM (logarithmic form) samples, and the decoder does the
conversion vice versa."), _("Excellent"), _("64 Kbps")},

This message includes 'µ' symbols that aren't valid ASCII, but only present
in iso-8859-1 and UTF-8 and other non-ASCII charsets. Including non-ASCII
symbols in msgid:s haven't been possible nor allowed in the past (see
http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#use-ascii
for details).

However, starting with the most recent versions of GNU gettext and intltool
(see bug 99005), it's possible to use UTF-8 in msgids, provided it's
properly marked as such in POTFILES.in. Still, I'm not sure adding a hard
requirement for these things:

GNU gettext >= 0.12
intltool >= 0.27

so that GnomeMeeting (and consequently GNOME) can still be fully translated
is a very good idea. Basically no stable distribution is shipping these
things yet. Instead, I propose the 'µ' symbols be replaced by 'u' so that
we don't need the above requirements.

Comment 1 Christian Rose 2003-10-17 18:23:03 UTC

Of course, this problem also applies to this message:

  {"G.711-ALaw-64k", _("G.711 is the international standard for
encoding telephone audio on 64 kbps channel. It is a pulse code
modulation (PCM) scheme operating at 8 kHz sample rate, with 8 bits
per sample, fully meeting ITU-T recommendations. This standard has two
forms, A-Law and µ-Law. A-Law G.711 PCM encoder converts 13 bit linear
PCM samples into 8 bit compressed PCM (logarithmic form) samples, and
the decoder does the conversion vice versa."), _("Excellent"), _("64
Kbps")},

Comment 2 Damien Sandras 2003-10-17 22:49:28 UTC

Hello Christian, I fear the µ symbol is really needed. uLaw doesn't make sense. Also, 
Danilo added UTF-8 as requirement. I don't know if this kind of fix fixes your problem, 
but that is exactly the kind of bug you shouldn't report to me. Please do what is 
required to fix the problem, you have better knowledge for that kind of things than me :)

Notice that GnomeMeeting includes UTF-8 in msg id's since about 1 year.

Comment 3 Christian Rose 2003-10-18 00:27:35 UTC

> Hello Christian, I fear the µ symbol is really needed. uLaw doesn't
> make sense.

I'm not sure I follow you. Are you saying GNOME should have a hard
requirement on GNU gettext >= 0.12 and intltool >= 0.27 just because
two GnomeMeeting messages look more correct with the 'µ' symbol?


> Also, Danilo added UTF-8 as requirement. I don't know if this kind
> of fix fixes your problem,

This bug report is about GnomeMeeting, by including non-ASCII
characters in a few msgids, adding at this point of time entirely
unreasonable dependencies to GNOME. And no, what Danilo did did not
solve *this* problem, it only fixed so that using the UTF-8 character
would work in the first place.


> but that is exactly the kind of bug you shouldn't report to me.
> Please do what is required to fix the problem, you have better
> knowledge for that kind of things than me :)

This isn't a trivial typo bug or anything of that sort. This bug is a
matter of policy -- do we accept these dependencies or not. Thus,
anyone but the maintainer isn't really the right person to answer
that. Thus, I still believe you're the right bug owner, and I don't
think this is properly resolved by ignoring the fundamental question
and problem by shoving it to someone else.


> Notice that GnomeMeeting includes UTF-8 in msg id's since about 1
> year.

That is simply not true -- there are no non-ASCII msgids in the
gnome_2_4 branch of GnomeMeeting, and these 'µ' symbols were added to
HEAD on 2003-09-22, less than a month ago, by you:
http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting/src&command=DIFF_FRAMESET&file=codec_info.h&rev2=1.2&rev1=1.1

Comment 4 Damien Sandras 2003-10-18 09:06:57 UTC

> I'm not sure I follow you. Are you saying GNOME should have a hard
> requirement on GNU gettext >= 0.12 and intltool >= 0.27 just because
> two GnomeMeeting messages look more correct with the 'µ' symbol?

Who said that? I would be grateful if you could stop having your usual ironic tone when 
writing messages to me.

The codec name is µLaw, not uLaw. µ is a symbol that exists in ISO-8859-1 as I have 
written that code with emacs in ISO-8859-1. So I don't understand how it could be an 
invalid UTF-8 symbol as it is also an ISO-8859-1 one just like é or è.

> This bug report is about GnomeMeeting, by including non-ASCII
> characters in a few msgids, adding at this point of time entirely
> unreasonable dependencies to GNOME. And no, what Danilo did did not
> solve *this* problem, it only fixed so that using the UTF-8 character
> would work in the first place.

ok

> This isn't a trivial typo bug or anything of that sort. This bug is a
> matter of policy -- do we accept these dependencies or not. Thus,
> anyone but the maintainer isn't really the right person to answer
> that. Thus, I still believe you're the right bug owner, and I don't
> think this is properly resolved by ignoring the fundamental question
> and problem by shoving it to someone else.

Well, either you are the i18n maintainer, or you are not. But I'm certainly not. That is an 
area where I'm clueless.

> That is simply not true -- there are no non-ASCII msgids in the
> gnome_2_4 branch of GnomeMeeting, and these 'µ' symbols were added to

Yes that is simply true! 
Iif you look at :
http://cvs.gnome.org/bonsai/cvsview2.cgi?
diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting%
2Fsrc&command=DIFF_FRAMESET&root=/cvs/gnome&file=callbacks.cpp&rev1=1.
69&rev2=1.70
in the function about_callback, you will see that Santiago Garcia Mantinan (and others)l 
have UTF-8 encoded names. So yes that is true, or I don't understand the problem. 
Actually those names have been UTF-8 encoded on purpose a very long time ago (the 
1.70 diff above is from migrax  "2002-09-21 13:22". So it seems UTF-8 was supported 
by that time but it is not anymore now?

So my two questions simply are:
- Why encoding people names in UTF-8 in the source doesn't give problems since more 
than 1 year, but using the ISO-8859-1 µ gives a problem?
- Have you actually tested it was giving a problem with older gettext and intltool?

As I understand things, keeping the µ is not a problem and will work with older gettext 
versions. But keeping people names in UTF-8 is a problem for older gettext versions, 
but we have UTF-8 encoded names since 1 year. So why is it a problem now and not 
before? gettext regression?

Sorry but I'm confused.

Comment 5 Damien Sandras 2003-10-18 09:14:11 UTC

Actually I just checked http://www.asciitable.com/, µ has an extended ASCII code 
(230) and the n appearing in Santiago's name also has an extended ASCII code (164).

So both are ASCII chars. Why is ASCII char 230 giving a problem and not ASCII char 
164?

Comment 6 Christian Rose 2003-10-18 11:05:58 UTC

> Who said that? I would be grateful if you could stop having your
> usual ironic tone when writing messages to me.

Sorry if you were offended, it was just so blatantly obvious that you
hadn't entirely read or understood this report in the first place when
writing the first reply.


> µ is a symbol that exists in ISO-8859-1 as I have written that code
> with emacs in ISO-8859-1. So I don't understand how it could be an
> invalid UTF-8 symbol as it is also an ISO-8859-1 one just like é or
> è.

I said it was an invalid ASCII character, not an invalid UTF-8 symbol.


> > This isn't a trivial typo bug or anything of that sort. This bug
> > is a matter of policy -- do we accept these dependencies or not.
> > Thus, anyone but the maintainer isn't really the right person to
> > answer that. Thus, I still believe you're the right bug owner,
> > and I don't think this is properly resolved by ignoring the
> > fundamental question and problem by shoving it to someone else.
> 
> Well, either you are the i18n maintainer, or you are not. But I'm
> certainly not. That is an area where I'm clueless.

That's why I'm trying to educate you, so that we can get a decision.


> > That is simply not true -- there are no non-ASCII msgids in the
> > gnome_2_4 branch of GnomeMeeting, and these 'µ' symbols were
> > added to
> 
> Yes that is simply true! 
> Iif you look at :
>
http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting%2Fsrc&command=DIFF_FRAMESET&root=/cvs/gnome&file=callbacks.cpp&rev1=1.69&rev2=1.70
> in the function about_callback, you will see that Santiago Garcia
> Mantinan (and others)l have UTF-8 encoded names. So yes that is
> true, or I don't understand the problem.

Again, as stated a hundred times in this bug report already, this bug
report is about using UTF-8 in *MSGIDS*, i.e. strings marked for
translation. The strings you quote aren't marked for translation, so
how would they be msgids?

In strings not marked for translation, you can use UTF-8 all you want
because those strings aren't handled by gettext, they're only rendered
by Pango. In msgids however the strings are handled by gettext and
used as keys in lookups, and thus, because of limitations in all older
gettext versions, they need to be in ASCII only. This is all explained
in detail on
http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#use-ascii
which is a link that was already cited in the initial bug report.
Again, actually reading bug reports and their referenced links is a
good practice.


> So my two questions simply are:
> - Why encoding people names in UTF-8 in the source doesn't give
> problems since more than 1 year, but using the ISO-8859-1 µ gives
> a problem?

This problem has actually nothing to do with UTF-8 versus ISO-8859-1.

It has everything to do with everything non-ASCII, though. You can use
any UTF-8 character you want (well, in theory) everywhere you want,
but not in msgids.


> - Have you actually tested it was giving a problem with older
> gettext and intltool?

Yes, these messages cannot be translated or be built with translations
for these messages on a platform that doesn't have the already stated
software requirements. This has been extensively tested with other
software for many years. It's not a new problem (although this is the
first time it occurred in GnomeMeeting). See bug 101420, bug 101796,
bug 102010, bug 107056, bug 112960, bug 114190, bug 117449, bug
118682, bug 118960, bug 119008, and bug 119019.


> Actually I just checked http://www.asciitable.com/, µ has an
> extended ASCII code (230) and the n appearing in Santiago's name
> also has an extended ASCII code (164).
> So both are ASCII chars. Why is ASCII char 230 giving a problem and
> not ASCII char 164?

There is no such thing as "extended ASCII". If anyone calls characters
"extended ASCII" it's just a very unfortunate misnomer for some other
character set that is != ASCII. ASCII is the original, 7-bit character
encoding with characters A-Z and some line drawing characters and not
much else. While it's true that almost all encodings in existance are
extensions of the original ASCII encoding, calling any such extension
(that is not the original 7-bit ASCII encoding) for "ASCII" or
"extended ASCII" or some such is just a misnomer and a very
unfortunate recipe for confusion. If you meet someone that does that,
slap them in their head, because they obviously don't know what
they're talking about.
http://www.joelonsoftware.com/articles/Unicode.html is a *very* useful
 article about encodings and aimed for programmers. I strongly
recommend you to read it.

Comment 7 Damien Sandras 2003-10-18 15:07:08 UTC

I really wonder why you persist being ironic and aggressive. That must
be the result of a bad education or of a deep frustration. I prefer to
ignore now.

Anyway, let's go back to the fact. If I put the codec name in a
non-translated string and use g_strdup_printf, it will solve the
problem. Would that solution suits you?

Comment 8 Christian Rose 2003-10-18 15:37:39 UTC

> I really wonder why you persist being ironic and aggressive. That
> must be the result of a bad education or of a deep frustration.

As I've already explained, it *is* indeed a deep frustration. A
frustration of my detailed explanations being ignored. Most stuff in
this bug report I've had to repeat several times, which is amazing in
itself.


> Anyway, let's go back to the fact. If I put the codec name in a
> non-translated string and use g_strdup_printf, it will solve the
> problem. Would that solution suits you?

Yes, or preferrably only the µ character. Please also add a translator
comment
(http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#use-comments)
when doing that explaining that the %s will be replaced by the Greek
mu character, so that the translators will know that.

Comment 9 Damien Sandras 2003-10-19 09:44:24 UTC

I finally decided to use uLaw instead of µLaw as I found other
software doing the same, so it seemed to be a common thing.

That is fixed in CVS. Thanks for the report.

Comment 10 Christian Rose 2003-10-19 15:06:17 UTC

#: src/codec_info.h:63
msgid ""
"G.711 is the international standard for encoding telephone audio on
64 kbps "
"channel. It is a pulse code modulation (PCM) scheme operating at 8 kHz "
"sample rate, with 8 bits per sample, fully meeting ITU-T
recommendations. "
"This standard has two forms, A-Law and µ-Law. A-Law G.711 PCM encoder "
"converts 13 bit linear PCM samples into 8 bit compressed PCM
(logarithmic "
"form) samples, and the decoder does the conversion vice versa."

It seems this message wasn't fixed at the same time; it still contains
a µ character.

Comment 11 Damien Sandras 2003-10-20 21:52:03 UTC

I think it is really fixed now, please reopen if it is not the case!

Comment 12 Bugzilla Maintainers 2004-04-01 23:44:57 UTC

The URL field has been removed from bugzilla.gnome.org. This URL was in the old URL field, and is being added as a comment so that the data is not lost. Please email bugmaster@gnome.org if you have any questions.

URL: 
http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting/src&command=DIFF_FRAMESET&file=codec_info.h&rev2=1.2&rev1=1.1