GNOME Bugzilla – Bug 502951
g_convert / g_iconv support for transliteration
Last modified: 2018-05-24 11:10:59 UTC
In vte I just committed a change to first try g_iconv_open "targetcharset//translit" first and try "targetcharset" if that fails. With GLibc and GNU iconv, that means translation never fails and it does a very nice job of transliteration. For example, converting from UTF-8 to Latin1//translit, Arabic chars are replaced by question marks, but "Ňň" will be converted to "Nn". Not sure about other iconv implementations, but that's an extremely useful feature.
Intuitively, I would try target first and if that fails, fall back to target//translit. Are there any cases where both target//translit and target succeed but yield different results ?
Oh, you mean trying the conversion with one and then trying with the other? I was talking about trying to open the target//translit one and fall back to target if opening that one fails, which would happen if the system iconv doesn't support transliteration. As for your question, no, I don't think there's any case that conversion under both succeeds but yields different results. If we are to try target first and if conversion fails fall back to target//translit, we may as well do target//translit from the beginning.
Sounds like a good idea to do this, then. Just needs someone to produce a patch an test cases...
Created attachment 125500 [details] [review] patch Not tested.
Seems to work fine, in brief testing. Needs a documentation update, I guess, pointing out that a) g_convert tries transliteration now b) if transliteration is not appropriate for you, use g_convert_with_iconv
I think it should only append "//translit" if to_charset doesn't already have //translit.
Feel free to do that. Appending it unconditionally is safe: either the iconv implementation is fine with "whatever//translit//translit" and works (glibc's does), or we fall back to to_charset which is "whatever//translit". End result is the same, and I didn't want to bother thinking about performance there.
So is this a compatible enough change ?
Basically, g_convert doesn't fail anymore for "can't convert" reasons. I can't imagine any users relying on that particular behavior. That said, the proposed change also affects g_iconv_open. So there's no way to not get translit. Maybe we should move the //translit logic to g_convert() only.
In that case, we should document in g_iconv_open that people can try //translit first if that's desired. That's what vte is doing for example. But then again, if transliteration is always desired, I don't know what's the best option forward. There's g_convert_with_fallback () too. Passing NULL as fallback there is documented as using \uxxxx notation though, so I don't think we can change that. There's three options really: - Add new API (g_convert_with_translit and g_iconv_open_with_translit?) - Make g_convert and g_iconv_open both try translit first - Make g_convert try translit first, document how to do it with g_iconv_open
Actually the docs for g_convert_with_fallback already mention the possibility that it may use translitation instead of honouring the fallback.
Ah, cool. ut if we do translit there, we get '?' for most unknown chars instead of \uxxxx, which for many uses is more useful anyway. Not sure what the best plan is.
*** Bug 752257 has been marked as a duplicate of this bug. ***
*** Bug 333312 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/117.