GNOME Bugzilla – Bug 58195
alias for iconv module names in g_iconv_open
Last modified: 2011-02-18 15:47:43 UTC
When glib is configured without with-libiconv option, g_iconv_open() quite often fails on Solaris, since iconv module names which are given to g_iconv_open() from pango x11 module don't match with what exists on Solaris. To fix this, we want to create an aliasing mechanim in g_iconv_open() so that if opening with the initial name fails, we try known equivalent names.
Created a patch to try other known names when iconv_open() failes with initial codeset names. (will create a new attachment for the diff after this comment.) I'm afraid if creaing a new file is allowed, but want to introduce $prefix/etc/glib-2.0/giconv.aliases file as an external database to define codeset names and their aliases. When iconv_open() first fails in g_iconv_open(), the file is opened and read once to get codeset names and their aliases pairs. Note that most of the code to open and read the file is created from pango's pango-utils.c and pangox-fontmap.c as reference, and file format is similar to $prefix/etc/pango/pangox.aliases.
Created attachment 796 [details] [review] try aliases when opening with given code set fails in g_iconv_open
Created attachment 797 [details] [review] the same patch as the previous but in diff -u output
Created attachment 798 [details] a sample alias file for testing
I think the approach of reading an external file is OK. Some quick comments: - g_iconv_read_line() would need to be static, but I think it would be better to use g_iochannel_read_line() here. - Code in GLib needs to use static variables in a thread safe fashion, so the initialization and use of 'iconv_modules' needs to be guarded by a mutex. Search for G_LOCK in glib for examples. - You should g_warning in the case of hitting an error when reading the alias file. - You leak the GIconvModule structure if you encounter an error. - You always leak the two buffers used in g_iconv_read_alias_file - I don't think it makes sense to have quoting _outside_ the list of aliases. If we want to allow quoting here (and I'm not completely sure it's necessary) then I think the quoting should be "UTF-8" "UTF8", "utf,8 ", utF8 Otherwise, we lose all the benefits of allowing quoting except for allowing \n and \t, since spaces and commas will be lost anyways. - You should be using g_ascii_isspace() not isspace() here, since we don't want this file to be locale dependent.
Created a new patch according to the comments: (will create a new patch attachment) - use g_iochannel - G_LOCK 'iconv_modules' - g_warning in the case of hitting error in opening or reading the alias file - free GIconvModule structure in the case of hitting an error. - fix memory leak of two GString in g_iconv_read_alias_file - use g_unichar_isspace() instead of isspace() - no quoting outside of list of the aliases But the new patch gets in a infinite loop which shows: testtext (pid:13955): GLib-CRITICAL **: file gstring.c: line 623 (g_string_erase): assertion `pos + len <= string->len' failed This is from g_io_channel_read_line_string, and the same error happens in glib/tests/iochannel-test. I'll have a further look..
Created attachment 837 [details] [review] a new patch but with g_io_channel_read_line_string issue
Created attachment 838 [details] alias file without quating outside of list of aliases
Created attachment 840 [details] [review] 2nd patch using g_io_channel
Created a patch again. This works in the testtext together with giochannel.c patch for 58472, and the alias file in the 08/02/01 17:14 patch.
Created new patch. This time, 'iconv_modules' is not static and the alias file is read every time g_iconv_open fails with opening with the original charset.
Created attachment 843 [details] [review] the 3rd patch - iconv_modules is not static
Note: The current patch does not use g_ascii_isspace but g_unichar_isspace as I could not find it in the current glib. I've just noticed that it will be included in Alex Larsson's patch which is under code review in the mailing list.(subject: URIs vs. half-baked URIs [glib PATCH]) Once it is in, I'll modify the patch using it. On that condition, how do you think about the other parts of the patch? I'd prefer to go with the last patch to make iconv_module non-static.
May I commit the 08/03/01 15:21 patch? If I'm allowed, I'd include g_ascii_isspace in glib/gstrfuncs.c and commit together. The fix of this bug is critical as this blucks cut&pate problem and also blocks us from building glib with Solaris native iconv.
I just noticed GNU gettext, fileutils, and libiconv take different way to solve the similar problem. Shoule glib also follow the same way by having intl/config.charset in the source tree, and install lib/charset.alias?
I've been planning for a while to use the libcharset code that the charset.alias comes from for a real implemention of g_get_charset() so this might be the right approach; just using the libcharset code would fix the cut-and-paste problem. However, I'm not sure that charset.alias will do what we need for the reverse mapping; it's purpose is to map: nl_langinfo name => canonical name And thus it might not have all entries you need to do: canonical name => iconv name For a particular system. It probably will be _close_, so it might be best to just go ahead, integrate libcharset, then if necessry, augment charset.config to match our needs. (One question is: what charsets do we guarantee to support? -- if we only guarantee UTF-8 and the result of g_get_charset() then this might be sufficient.)
I have most of the libcharset integration now, need to finish a bit more cleanup before committing. Then I'll look at using the aliases from that in the reverse direction for iconv,
I've checked in the libcharset based patch to CVS now. I'd appreciate it if you could look at and see how close it is to fixing the problems (and also just give it some basic testing ... on Linux I have no real way of testing this code so it could be completely broken and segfaulting right now.)
My apologies for being late. I tested this on today's cvs HEAD build on Solaris8 sparc. Initially it did not work, but with a one-line patch below, things start working well. I've so far verified locales including C, en_US, zh, zh_TW, ko and ja locales for both EUC and UTF-8 encodings. In ja_JP.PCK(SJIS) and zh_TW.BIG5, it works, too. No warning messages of iconv failure is seen, and cut&paste operation worked well, too. Here is the patch. Please review and check in. Index: gconvert.c =================================================================== RCS file: /cvs/gnome/glib/glib/gconvert.c,v retrieving revision 1.23 diff -u -r1.23 gconvert.c --- gconvert.c 2001/10/01 20:40:05 1.23 +++ gconvert.c 2001/10/12 17:41:02 @@ -111,7 +111,7 @@ if (!try_conversion (to_codeset, from_codeset, &cd)) { const char **to_aliases = _g_charset_get_aliases (to_codeset); - const char **from_aliases = _g_charset_get_aliases (to_codeset); + const char **from_aliases = _g_charset_get_aliases (from_codeset); if (from_aliases) {
Verified the fix on my local build. Marking the bug "FIXED".