GNOME Bugzilla – Bug 124857
GnomeMeeting shouldn't include UTF-8 symbols in msgids
Last modified: 2004-12-22 21:47:04 UTC
From src/codec_info.h:63: {"G.711-uLaw-64k", _("G.711 is the international standard for encoding telephone audio on 64 kbps channel. It is a pulse code modulation (PCM) scheme operating at 8 kHz sample rate, with 8 bits per sample, fully meeting ITU-T recommendations. This standard has two forms, A-Law and µ-Law. µ-Law G.711 PCM encoder converts 14 bit linear PCM samples into 8 bit compressed PCM (logarithmic form) samples, and the decoder does the conversion vice versa."), _("Excellent"), _("64 Kbps")}, This message includes 'µ' symbols that aren't valid ASCII, but only present in iso-8859-1 and UTF-8 and other non-ASCII charsets. Including non-ASCII symbols in msgid:s haven't been possible nor allowed in the past (see http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#use-ascii for details). However, starting with the most recent versions of GNU gettext and intltool (see bug 99005), it's possible to use UTF-8 in msgids, provided it's properly marked as such in POTFILES.in. Still, I'm not sure adding a hard requirement for these things: GNU gettext >= 0.12 intltool >= 0.27 so that GnomeMeeting (and consequently GNOME) can still be fully translated is a very good idea. Basically no stable distribution is shipping these things yet. Instead, I propose the 'µ' symbols be replaced by 'u' so that we don't need the above requirements.
Of course, this problem also applies to this message: {"G.711-ALaw-64k", _("G.711 is the international standard for encoding telephone audio on 64 kbps channel. It is a pulse code modulation (PCM) scheme operating at 8 kHz sample rate, with 8 bits per sample, fully meeting ITU-T recommendations. This standard has two forms, A-Law and µ-Law. A-Law G.711 PCM encoder converts 13 bit linear PCM samples into 8 bit compressed PCM (logarithmic form) samples, and the decoder does the conversion vice versa."), _("Excellent"), _("64 Kbps")},
Hello Christian, I fear the µ symbol is really needed. uLaw doesn't make sense. Also, Danilo added UTF-8 as requirement. I don't know if this kind of fix fixes your problem, but that is exactly the kind of bug you shouldn't report to me. Please do what is required to fix the problem, you have better knowledge for that kind of things than me :) Notice that GnomeMeeting includes UTF-8 in msg id's since about 1 year.
> Hello Christian, I fear the µ symbol is really needed. uLaw doesn't > make sense. I'm not sure I follow you. Are you saying GNOME should have a hard requirement on GNU gettext >= 0.12 and intltool >= 0.27 just because two GnomeMeeting messages look more correct with the 'µ' symbol? > Also, Danilo added UTF-8 as requirement. I don't know if this kind > of fix fixes your problem, This bug report is about GnomeMeeting, by including non-ASCII characters in a few msgids, adding at this point of time entirely unreasonable dependencies to GNOME. And no, what Danilo did did not solve *this* problem, it only fixed so that using the UTF-8 character would work in the first place. > but that is exactly the kind of bug you shouldn't report to me. > Please do what is required to fix the problem, you have better > knowledge for that kind of things than me :) This isn't a trivial typo bug or anything of that sort. This bug is a matter of policy -- do we accept these dependencies or not. Thus, anyone but the maintainer isn't really the right person to answer that. Thus, I still believe you're the right bug owner, and I don't think this is properly resolved by ignoring the fundamental question and problem by shoving it to someone else. > Notice that GnomeMeeting includes UTF-8 in msg id's since about 1 > year. That is simply not true -- there are no non-ASCII msgids in the gnome_2_4 branch of GnomeMeeting, and these 'µ' symbols were added to HEAD on 2003-09-22, less than a month ago, by you: http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting/src&command=DIFF_FRAMESET&file=codec_info.h&rev2=1.2&rev1=1.1
> I'm not sure I follow you. Are you saying GNOME should have a hard > requirement on GNU gettext >= 0.12 and intltool >= 0.27 just because > two GnomeMeeting messages look more correct with the 'µ' symbol? Who said that? I would be grateful if you could stop having your usual ironic tone when writing messages to me. The codec name is µLaw, not uLaw. µ is a symbol that exists in ISO-8859-1 as I have written that code with emacs in ISO-8859-1. So I don't understand how it could be an invalid UTF-8 symbol as it is also an ISO-8859-1 one just like é or è. > This bug report is about GnomeMeeting, by including non-ASCII > characters in a few msgids, adding at this point of time entirely > unreasonable dependencies to GNOME. And no, what Danilo did did not > solve *this* problem, it only fixed so that using the UTF-8 character > would work in the first place. ok > This isn't a trivial typo bug or anything of that sort. This bug is a > matter of policy -- do we accept these dependencies or not. Thus, > anyone but the maintainer isn't really the right person to answer > that. Thus, I still believe you're the right bug owner, and I don't > think this is properly resolved by ignoring the fundamental question > and problem by shoving it to someone else. Well, either you are the i18n maintainer, or you are not. But I'm certainly not. That is an area where I'm clueless. > That is simply not true -- there are no non-ASCII msgids in the > gnome_2_4 branch of GnomeMeeting, and these 'µ' symbols were added to Yes that is simply true! Iif you look at : http://cvs.gnome.org/bonsai/cvsview2.cgi? diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting% 2Fsrc&command=DIFF_FRAMESET&root=/cvs/gnome&file=callbacks.cpp&rev1=1. 69&rev2=1.70 in the function about_callback, you will see that Santiago Garcia Mantinan (and others)l have UTF-8 encoded names. So yes that is true, or I don't understand the problem. Actually those names have been UTF-8 encoded on purpose a very long time ago (the 1.70 diff above is from migrax "2002-09-21 13:22". So it seems UTF-8 was supported by that time but it is not anymore now? So my two questions simply are: - Why encoding people names in UTF-8 in the source doesn't give problems since more than 1 year, but using the ISO-8859-1 µ gives a problem? - Have you actually tested it was giving a problem with older gettext and intltool? As I understand things, keeping the µ is not a problem and will work with older gettext versions. But keeping people names in UTF-8 is a problem for older gettext versions, but we have UTF-8 encoded names since 1 year. So why is it a problem now and not before? gettext regression? Sorry but I'm confused.
Actually I just checked http://www.asciitable.com/, µ has an extended ASCII code (230) and the n appearing in Santiago's name also has an extended ASCII code (164). So both are ASCII chars. Why is ASCII char 230 giving a problem and not ASCII char 164?
> Who said that? I would be grateful if you could stop having your > usual ironic tone when writing messages to me. Sorry if you were offended, it was just so blatantly obvious that you hadn't entirely read or understood this report in the first place when writing the first reply. > µ is a symbol that exists in ISO-8859-1 as I have written that code > with emacs in ISO-8859-1. So I don't understand how it could be an > invalid UTF-8 symbol as it is also an ISO-8859-1 one just like é or > è. I said it was an invalid ASCII character, not an invalid UTF-8 symbol. > > This isn't a trivial typo bug or anything of that sort. This bug > > is a matter of policy -- do we accept these dependencies or not. > > Thus, anyone but the maintainer isn't really the right person to > > answer that. Thus, I still believe you're the right bug owner, > > and I don't think this is properly resolved by ignoring the > > fundamental question and problem by shoving it to someone else. > > Well, either you are the i18n maintainer, or you are not. But I'm > certainly not. That is an area where I'm clueless. That's why I'm trying to educate you, so that we can get a decision. > > That is simply not true -- there are no non-ASCII msgids in the > > gnome_2_4 branch of GnomeMeeting, and these 'µ' symbols were > > added to > > Yes that is simply true! > Iif you look at : > http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting%2Fsrc&command=DIFF_FRAMESET&root=/cvs/gnome&file=callbacks.cpp&rev1=1.69&rev2=1.70 > in the function about_callback, you will see that Santiago Garcia > Mantinan (and others)l have UTF-8 encoded names. So yes that is > true, or I don't understand the problem. Again, as stated a hundred times in this bug report already, this bug report is about using UTF-8 in *MSGIDS*, i.e. strings marked for translation. The strings you quote aren't marked for translation, so how would they be msgids? In strings not marked for translation, you can use UTF-8 all you want because those strings aren't handled by gettext, they're only rendered by Pango. In msgids however the strings are handled by gettext and used as keys in lookups, and thus, because of limitations in all older gettext versions, they need to be in ASCII only. This is all explained in detail on http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#use-ascii which is a link that was already cited in the initial bug report. Again, actually reading bug reports and their referenced links is a good practice. > So my two questions simply are: > - Why encoding people names in UTF-8 in the source doesn't give > problems since more than 1 year, but using the ISO-8859-1 µ gives > a problem? This problem has actually nothing to do with UTF-8 versus ISO-8859-1. It has everything to do with everything non-ASCII, though. You can use any UTF-8 character you want (well, in theory) everywhere you want, but not in msgids. > - Have you actually tested it was giving a problem with older > gettext and intltool? Yes, these messages cannot be translated or be built with translations for these messages on a platform that doesn't have the already stated software requirements. This has been extensively tested with other software for many years. It's not a new problem (although this is the first time it occurred in GnomeMeeting). See bug 101420, bug 101796, bug 102010, bug 107056, bug 112960, bug 114190, bug 117449, bug 118682, bug 118960, bug 119008, and bug 119019. > Actually I just checked http://www.asciitable.com/, µ has an > extended ASCII code (230) and the n appearing in Santiago's name > also has an extended ASCII code (164). > So both are ASCII chars. Why is ASCII char 230 giving a problem and > not ASCII char 164? There is no such thing as "extended ASCII". If anyone calls characters "extended ASCII" it's just a very unfortunate misnomer for some other character set that is != ASCII. ASCII is the original, 7-bit character encoding with characters A-Z and some line drawing characters and not much else. While it's true that almost all encodings in existance are extensions of the original ASCII encoding, calling any such extension (that is not the original 7-bit ASCII encoding) for "ASCII" or "extended ASCII" or some such is just a misnomer and a very unfortunate recipe for confusion. If you meet someone that does that, slap them in their head, because they obviously don't know what they're talking about. http://www.joelonsoftware.com/articles/Unicode.html is a *very* useful article about encodings and aimed for programmers. I strongly recommend you to read it.
I really wonder why you persist being ironic and aggressive. That must be the result of a bad education or of a deep frustration. I prefer to ignore now. Anyway, let's go back to the fact. If I put the codec name in a non-translated string and use g_strdup_printf, it will solve the problem. Would that solution suits you?
> I really wonder why you persist being ironic and aggressive. That > must be the result of a bad education or of a deep frustration. As I've already explained, it *is* indeed a deep frustration. A frustration of my detailed explanations being ignored. Most stuff in this bug report I've had to repeat several times, which is amazing in itself. > Anyway, let's go back to the fact. If I put the codec name in a > non-translated string and use g_strdup_printf, it will solve the > problem. Would that solution suits you? Yes, or preferrably only the µ character. Please also add a translator comment (http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html#use-comments) when doing that explaining that the %s will be replaced by the Greek mu character, so that the translators will know that.
I finally decided to use uLaw instead of µLaw as I found other software doing the same, so it seemed to be a common thing. That is fixed in CVS. Thanks for the report.
#: src/codec_info.h:63 msgid "" "G.711 is the international standard for encoding telephone audio on 64 kbps " "channel. It is a pulse code modulation (PCM) scheme operating at 8 kHz " "sample rate, with 8 bits per sample, fully meeting ITU-T recommendations. " "This standard has two forms, A-Law and µ-Law. A-Law G.711 PCM encoder " "converts 13 bit linear PCM samples into 8 bit compressed PCM (logarithmic " "form) samples, and the decoder does the conversion vice versa." It seems this message wasn't fixed at the same time; it still contains a µ character.
I think it is really fixed now, please reopen if it is not the case!
The URL field has been removed from bugzilla.gnome.org. This URL was in the old URL field, and is being added as a comment so that the data is not lost. Please email bugmaster@gnome.org if you have any questions. URL: http://cvs.gnome.org/bonsai/cvsview2.cgi?diff_mode=context&whitespace_mode=show&root=/cvs/gnome&subdir=gnomemeeting/src&command=DIFF_FRAMESET&file=codec_info.h&rev2=1.2&rev1=1.1