GNOME Bugzilla – Bug 345254
dead accents should at least produce combining characters
Last modified: 2018-04-17 07:23:45 UTC
Hi, For trying to type for instance n̈ (n with diaeresis) in an edit widget, I type a dead diaeresis then 'n', but the application beeps instead of producing n̈. I known there is no precombined unicode character for n̈, but in such case GTK should use the combining diaeresis character (U+0308), since GTK is already capable of displaying it as appropriate. This actually applies to any dead accent. Samuel
I very good idea indeed. I once implemented this just for fun, if I recall correcly it was quite trivial. I could see if I still have the code somewhere.
Is n with diaeresis a valid Unicode sequence? If GTK+ knew which are valid Unicode sequences, it would be easy to print them. At ftp.unicode.org/Public/UNIDATA/NormalisationTest.txt there is a list of compose sequences, so an alternative would be to add a cut-down table for the affected languages (and add code in check_algorithmic in gtkimcontextsimple.c).
Unicode says "All combining characters can be applied to any base character and can, in principle, be used with any script. As with other characters, the allocation of a combining character to one block or another identifies only its primary usage; it is not intended to define or limit the range of characters to which it may be applied. In the Unicode Standard, all sequences of character codes are permitted. This does not create an obligation on implementations to support all possible combinations equally well. Thus, while application of an Arabic annotation mark to a Han character or a Devanagari consonant is permitted, it is unlikely to be supported well in rendering or to make much sense." In the case of this bug, putting a diaeresis over a n, though it is not used in common languages, makes full sense (and could even be useful e.g. for Maths), so GTK should support it. Actually, I don't see why you would want to exclude some combining combinations. A combining character "just" adds stuff around some existing glyph, excluding some combinations actually looks like more work to me... And this has nothing to do with Normalization, which only exists because there are precomposed forms and when using several combining characters.
Samuel, thanks for the update. Now I can see that this report is not very similar to bug 341341 (so I am not duping). I believe you need to create a new Input Method module for GTK+ to support this functionality. The page http://gtk-im-extra.sourceforge.net/ should help you in this. A workaround for now might be to press n + Ctrl-Shift-u + 308 + Enter : n̈
Mmmm, to me it looks like it is actually exactly the same goal: automatically support turning "<dead_foo> <bar>" into "Ubar Ucombining_foo", "<combining_foo> <bar>" into "Ubar Ucombining_foo", and "<Multi_key> <foo> <bar>" into "Ubar Ucombining_foo".
Bug 341241 wants to make that behaviour as part of the "Default" GTK+ Input Method module, so when one has a Serbian keyboard layout, they can type those characters. If we enable in the Default GTK+ IM module the option to accept all combinations of diacritic marks on top of any base character, then I believe the typical user might be confused and produce wrong documents. You may want to supply a workflow as to how this should work. Would 'n' + 0x308 + 0x308 + 0x308 + 0x308 be a valid sequence? n̈̈̈ this is n with several 0x308. n̈́̀ this is n with 0x300, 0x301, 0x308 This leads me to believe that a separate im module would be more appropriate. But then again, it depends on how you have in mind such a workflow would look like.
I would also want to make this a default behaviour. And this BTW would enable all those languages, that neither you and I know, to easily put their accents on letters. For instance, iirc the vietnamese support is still not complete. By just implementing what I'm proposing, that would solve this kind of issue for _all_ languages, without having to enumerate cases like was done in the Serbian case. And yes, to my mind 'n' + 0x308 + 0x308 + 0x308 + 0x308 should be a valid sequence. If the user presses several times on the dead diaeresis, he probably expects to see as many diaeresis on the eventual letter. For instance in the vietnamese language, 'u' + 0x300 + 0x323 _is_ needed (and all sorts of such combinations), and people would expect to see <dead_acute> <dead_dot_below> <u> producing it. Do you really want to enumerate all such combinations for all languages in the world?
(In reply to comment #7) > I would also want to make this a default behaviour. And this BTW would enable > all those languages, that neither you and I know, to easily put their accents > on letters. For instance, iirc the vietnamese support is still not complete. > By just implementing what I'm proposing, that would solve this kind of issue > for _all_ languages, without having to enumerate cases like was done in the > Serbian case. > > And yes, to my mind 'n' + 0x308 + 0x308 + 0x308 + 0x308 should be a valid > sequence. If the user presses several times on the dead diaeresis, he probably > expects to see as many diaeresis on the eventual letter. What's the difference between n̈ and n̈̈̈? (answer: the first has a single diairesis, the second has three. on my system these look the same). When you press backspace to erase the character, the first diaeresis goes away. In the second case you would press backspace and nothing would appear to change on screen. User gets confused, files bug report. There are quite a few similar cases of confusion. > For instance in the vietnamese language, 'u' + 0x300 + 0x323 _is_ needed (and > all sorts of such combinations), and people would expect to see <dead_acute> > <dead_dot_below> <u> producing it. > > Do you really want to enumerate all such combinations for all languages in the > world? Not all of them but those that users request. At least at the beginning. These are defined at Unicode.org, so there is no need for guessing.
> > I would also want to make this a default behaviour. And this BTW would > > enable all those languages, that neither you and I know, to easily put their > > accents on letters. For instance, iirc the vietnamese support is still not > > complete. By just implementing what I'm proposing, that would solve this > > kind of issue for _all_ languages, without having to enumerate cases like > > was done in the Serbian case. > > > > And yes, to my mind 'n' + 0x308 + 0x308 + 0x308 + 0x308 should be a valid > > sequence. If the user presses several times on the dead diaeresis, he > > probably expects to see as many diaeresis on the eventual letter. > > What's the difference between n̈ and n̈? > > (answer: the first has a single diairesis, the second has three. on my system > these look the same). > > When you press backspace to erase the character, the first diaeresis goes > away. In the second case you would press backspace and nothing would appear > to change on screen. User gets confused, files bug report. There are quite a > few similar cases of confusion. Then your system is broken. On mine, it correctly displays several diaeresis, which is what I expect. For instance, how ẫ is rendered on your system? If it is a garbage of tilde and circumflex, then your system can't correctly display vietnamese. > > For instance in the vietnamese language, 'u' + 0x300 + 0x323 _is_ needed > > (and all sorts of such combinations), and people would expect to see > > <dead_acute> <dead_dot_below> <u> producing it. > > > > Do you really want to enumerate all such combinations for all languages in > > the world? > > Not all of them but those that users request. Please re-read 341241: it is _very_ tiring for foreign people to update everything in every place so that their language at last works. The temptation to stick to Windows can be very big here just for this. Now we can do something so that this particular issue will be solved for all languages without having people to take the time to fix a list (which may be very big!), why not do it? > These are defined at Unicode.org, so there is no need for guessing. These? Where? Please show me where all the vietnamese accent combinations are enumerateed for instance.
I think that it would be better to carry this discussion to the http://mail.gnome.org/mailman/listinfo/gtk-i18n-list mailing list. It would be good to get input from more people. It is not good to use Bugzilla for discussion at this stage as notifications are send to a big bunch of people. Could you please post a summary on the mailing list and we can pick up from there?
Mmm, it looks like the list moderator doesn't do his job: I sent an e-mail the same day you suggested it, and then asked for subscribtion ~12h later, still no news...
That is not an moderated mailing list; you follow the instructions to become a member of the mailing list, then you send your e-mail to the list address (you get instructions for all these). If you are not a member of the mailing list, then you e-mail falls through moderation which in most cases rejects the email. My recent e-mail http://mail.gnome.org/archives/gtk-i18n-list/2008-February/msg00000.html
That's what I did: I followed the link on the page, which sends an email to the list address, and I got Your mail to 'gtk-i18n-list' with the subject subscribe Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list
Consider that the e-mail that is "held for approval" will probably not make it to the list. 1) Complete the subscription to the list following the section "Subscribing to gtk-i18n-list" at http://mail.gnome.org/mailman/listinfo/gtk-i18n-list You will receive an email to verify your account. Follow the instructions. 2) Then, send the e-mail to the list again.
Then the link on www.gtk.org/mailing-lists.html is bogus and needs to be fixed.
My opinion (now) on combining diacritics is that dead_keys should not produce them; a keyboard layout should be there to produce them. Dead keys appear to fit and probably should stay to pre-composed characters. Please see for details http://blogs.gnome.org/simos/2008/02/20/keyboard-layout-for-combining-diacritics/ Internally, GTK+ uses a table (length: 5) to store the compose sequence. Having a compose sequence with indeterminate length would require some rethinking. Per Unicode, there is no restriction as to how many combining diacritics you can place on a character. Already there are keyboard layouts that define combining diacritics to keys. With this in mind, I would consider this report as NOTGNOME (that means, it can be fixed somewhere else; in the xkeyboard-config project). Samuel, what do you think? Tor, do you have some input here?
dead_keys were never meant to be restricted to pre-composed characters. Unicode (and hence a potential list of precomposed character) even wasn't existing at the time... As explained on the list, defining keys for combining accents is just duplicating existing keys which the user _already_ expect to work as combining accents (since they are dead accents)... The fact that the X11 keysym space has duplications is not new: you can produce a latin lower case a through XK_a or 0x1000061, and _both_ ways are supposed to work the same. Here it's the same: X11 has had a way to put accents on letters for quite a long time: dead accents. With the addition of unicode, another way was added, but there is no reason to make any distinction between them.
We're moving to gitlab! As part of this move, we are closing bugs that haven't seen activity in more than 5 years. If this issue is still imporant to you and still relevant with GTK+ 3.22 or master, please consider creating a gitlab issue for it.
(I have reported the issue on gitlab: https://gitlab.gnome.org/GNOME/gtk/issues/10)