GNOME Bugzilla – Bug 341341
Compose mechanism in simple input method doesn't support decomposed forms
Last modified: 2018-03-14 22:01:52 UTC
gtk/gtkimcontextsimple.h contains a table derived from en-US.UTF-8/Compose list.
However, it doesn't support deadkey combinations resolving to more than one Unicode character, which is needed for decomposed forms (especially when there are no precomposed forms, as is the case for Serbian Cyrillic).
Created attachment 65204 [details]
Serbian ~/.Xcompose file
This file lists the combinations needed for Serbian. With recent X.Org or XFree, it can be put into ~/.Xcompose, and by selecting XIM as the Gtk+ input method.
I did not manage to find decomposed example forms in the existing Compose file,
Are there any used already?
Is there a document that could show for which characters there are no precomposed forms but you can only use decomposed forms? In Latin, Greek and Cyrillic?
AFAIK, there are no precomposed glyphs since about Unicode 4.x+ (looking for reference).
I know at least that Coptic has no precomposed glyphs, so it needs this functionality in GTK+.
(this is all new to me, trying to learn.)
Simos, there are none in en_US.UTF-8/Compose file because nobody bothered to add them or push for them. At one point, I simply lost time to chase all the things we've needed for Serbian (you can see that I am the author of Serbian GNU libc locale, Serbian XKB layouts, did many Serbian translations, worked on DejaVu Cyrillic... at one point you simply lose energy to chase all the maintainers around ;).
The attached example, which I used to append to en_US.UTF-8/Compose file on my systems (actually, I usually added my own sr_CS.UTF-8/Compose) before ~/.Xcompose support was available, contains several such decomposed forms (using Cyrillic).
Also, Unicode decided to stop including precomposed forms even earlier than 4.x (I know people have asked for precomposed accented Serbian Cyrillic at least 3 years ago, only to be denied because they could get it as a combination).
I believe that there is no existing function in glib that can determine if a sequence of Unicode characters is valid for a language (for example, CYRILLIC SMALL A WITH ACUTE).
I believe this information is available at ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt and a subset can be extracted for the affected languages.
That file is only a test for normalization.
As said in bug 345254, all unicode combinations are supposed to be valid, and there is no real reason why <dead_foo>, <combining_foo> and <Multi_key> <foo> shouldn't be automatically converted into the unicode combining equivalent.
Danilo, would it make sense to use a modified keyboard layout which assigns combining diacritics to keys?
Is your issue that you have a mixed environment (some precomposed already in use)?
I have put together a patch (bug 537457) that can make these compose sequences work with GTK+ IM.
I have tested with Khmer and Arabic compose sequences (already upstream in en_US/Compose.pre), and with these compose sequences (.XCompose file).
If you want to go ahead and add your compose sequences to XOrg, I can make sure that GTK+ IM will work with them.
My guess for a timeline for this, is that you can get it in the next stable release of GTK+ in about six months time. You would have to plan the update to XOrg however earlier.
Hope this helps.
We're moving to gitlab! As part of this move, we are closing bugs that haven't seen activity in more than 5 years. If this issue is still imporant to you and
still relevant with GTK+ 3.22 or master, please consider creating a gitlab issue