After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 502951 - g_convert / g_iconv support for transliteration
g_convert / g_iconv support for transliteration
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: i18n
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
: 333312 752257 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2007-12-10 23:50 UTC by Behdad Esfahbod
Modified: 2018-05-24 11:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (643 bytes, patch)
2008-12-30 06:05 UTC, Behdad Esfahbod
none Details | Review

Description Behdad Esfahbod 2007-12-10 23:50:04 UTC
In vte I just committed a change to first try g_iconv_open "targetcharset//translit" first and try "targetcharset" if that fails.  With GLibc and GNU iconv, that means translation never fails and it does a very nice job of transliteration.  For example, converting from UTF-8 to Latin1//translit, Arabic chars are replaced by question marks, but "Ňň" will be converted to "Nn".

Not sure about other iconv implementations, but that's an extremely useful feature.
Comment 1 Matthias Clasen 2008-12-30 00:07:15 UTC
Intuitively, I would try target first and if that fails, fall back to target//translit. Are there any cases where both target//translit and target succeed but yield different results ?
Comment 2 Behdad Esfahbod 2008-12-30 02:21:04 UTC
Oh, you mean trying the conversion with one and then trying with the other?  I was talking about trying to open the target//translit one and fall back to target if opening that one fails, which would happen if the system iconv doesn't support transliteration.

As for your question, no, I don't think there's any case that conversion under both succeeds but yields different results.  If we are to try target first and if conversion fails fall back to target//translit, we may as well do target//translit from the beginning.
Comment 3 Matthias Clasen 2008-12-30 04:24:29 UTC
Sounds like a good idea to do this, then. Just needs someone to produce a patch an test cases...
Comment 4 Behdad Esfahbod 2008-12-30 06:05:08 UTC
Created attachment 125500 [details] [review]
patch

Not tested.
Comment 5 Matthias Clasen 2009-01-05 01:18:13 UTC
Seems to work fine, in brief testing. Needs a documentation update, I guess, pointing out that

a) g_convert tries transliteration now

b) if transliteration is not appropriate for you, use g_convert_with_iconv
Comment 6 Christian Persch 2009-01-05 15:15:40 UTC
I think it should only append "//translit" if to_charset doesn't already have //translit.
Comment 7 Behdad Esfahbod 2009-01-05 22:46:32 UTC
Feel free to do that.  Appending it unconditionally is safe: either the iconv implementation is fine with "whatever//translit//translit" and works (glibc's does), or we fall back to to_charset which is "whatever//translit".  End result is the same, and I didn't want to bother thinking about performance there.
Comment 8 Matthias Clasen 2009-01-06 17:10:46 UTC
So is this a compatible enough change ? 
Comment 9 Behdad Esfahbod 2009-01-06 19:49:38 UTC
Basically, g_convert doesn't fail anymore for "can't convert" reasons.  I can't imagine any users relying on that particular behavior.

That said, the proposed change also affects g_iconv_open.  So there's no way to not get translit.  Maybe we should move the //translit logic to g_convert() only.
Comment 10 Behdad Esfahbod 2009-01-06 19:52:39 UTC
In that case, we should document in g_iconv_open that people can try //translit first if that's desired.  That's what vte is doing for example.

But then again, if transliteration is always desired, I don't know what's the best option forward.

There's g_convert_with_fallback () too.  Passing NULL as fallback there is documented as using \uxxxx notation though, so I don't think we can change that.

There's three options really:

  - Add new API (g_convert_with_translit and g_iconv_open_with_translit?)

  - Make g_convert and g_iconv_open both try translit first

  - Make g_convert try translit first, document how to do it with g_iconv_open
Comment 11 Matthias Clasen 2009-01-07 23:51:42 UTC
Actually the docs for g_convert_with_fallback already mention the possibility that it may use translitation instead of honouring the fallback.
Comment 12 Behdad Esfahbod 2009-01-08 00:03:42 UTC
Ah, cool.  ut if we do translit there, we get '?' for most unknown chars instead of \uxxxx, which for many uses is more useful anyway.  Not sure what the best plan is.
Comment 13 Philip Withnall 2018-02-01 22:19:43 UTC
*** Bug 752257 has been marked as a duplicate of this bug. ***
Comment 14 Philip Withnall 2018-02-01 22:28:12 UTC
*** Bug 333312 has been marked as a duplicate of this bug. ***
Comment 15 GNOME Infrastructure Team 2018-05-24 11:10:59 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/117.