Bug 345254 – dead accents should at least produce combining characters

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 345254 - dead accents should at least produce combining characters


Summary:	dead accents should at least produce combining characters


Status:	RESOLVED OBSOLETE

Product:	gtk+
Classification:	Platform
Component:	Input Methods
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	Hidetoshi Tajima
QA Contact:	gtk-bugs

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-06-18 17:45 UTC by Samuel Thibault
Modified:	2018-04-17 07:23 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Samuel Thibault 2006-06-18 17:45:16 UTC

Hi,

For trying to type for instance n̈ (n with diaeresis) in an edit widget, I type a dead diaeresis then 'n', but the application beeps instead of producing n̈. I known there is no precombined unicode character for n̈, but in such case GTK should use the combining diaeresis character (U+0308), since GTK is already capable of displaying it as appropriate. This actually applies to any dead accent.

Samuel

Comment 1 Tor Lillqvist 2006-06-18 17:54:29 UTC

I very good idea indeed. I once implemented this just for fun, if I recall correcly it was quite trivial. I could see if I still have the code somewhere.

Comment 2 Simos Xenitellis 2008-01-31 02:39:15 UTC

Is n with diaeresis a valid Unicode sequence?

If GTK+ knew which are valid Unicode sequences, it would be easy to print them.

At ftp.unicode.org/Public/UNIDATA/NormalisationTest.txt there is a list of compose sequences, so an alternative would be to add a cut-down table for the affected languages (and add code in check_algorithmic in gtkimcontextsimple.c).

Comment 3 Samuel Thibault 2008-01-31 12:25:24 UTC

Unicode says "All combining characters can be applied to any base character and can, in principle, be used with any script. As with other characters, the allocation of a combining character to one block or another identifies only its primary usage; it is not intended to define or limit the range of characters to which it may be applied.  In the Unicode Standard, all sequences of character codes are permitted.

This does not create an obligation on implementations to support all possible combinations equally well. Thus, while application of an Arabic annotation mark to a Han character or a Devanagari consonant is permitted, it is unlikely to be supported well in rendering or to make much sense."

In the case of this bug, putting a diaeresis over a n, though it is not used in common languages, makes full sense (and could even be useful e.g. for Maths), so GTK should support it.

Actually, I don't see why you would want to exclude some combining combinations. A combining character "just" adds stuff around some existing glyph, excluding some combinations actually looks like more work to me...

And this has nothing to do with Normalization, which only exists because there are precomposed forms and when using several combining characters.

Comment 4 Simos Xenitellis 2008-01-31 15:05:43 UTC

Samuel, thanks for the update.
Now I can see that this report is not very similar to bug 341341 (so I am not duping).

I believe you need to create a new Input Method module for GTK+ to support this functionality. The page http://gtk-im-extra.sourceforge.net/ should help you in this.

A workaround for now might be to press
n + Ctrl-Shift-u + 308 + Enter : n̈

Comment 5 Samuel Thibault 2008-01-31 15:22:06 UTC

Mmmm, to me it looks like it is actually exactly the same goal: automatically support turning "<dead_foo> <bar>" into "Ubar Ucombining_foo", "<combining_foo> <bar>" into "Ubar Ucombining_foo", and "<Multi_key> <foo> <bar>" into "Ubar Ucombining_foo".

Comment 6 Simos Xenitellis 2008-01-31 16:00:56 UTC

Bug 341241 wants to make that behaviour as part of the "Default" GTK+ Input Method module, so when one has a Serbian keyboard layout, they can type those characters.

If we enable in the Default GTK+ IM module the option to accept all combinations of diacritic marks on top of any base character, then I believe the typical user might be confused and produce wrong documents.

You may want to supply a workflow as to how this should work. Would 'n' + 0x308 + 0x308 + 0x308 + 0x308 be a valid sequence?

n̈̈̈  this is n with several 0x308.
n̈́̀  this is n with 0x300, 0x301, 0x308

This leads me to believe that a separate im module would be more appropriate. But then again, it depends on how you have in mind such a workflow would look like.

Comment 7 Samuel Thibault 2008-01-31 16:25:06 UTC

I would also want to make this a default behaviour.  And this BTW would enable all those languages, that neither you and I know, to easily put their accents on letters. For instance, iirc the vietnamese support is still not complete.  By just implementing what I'm proposing, that would solve this kind of issue for _all_ languages, without having to enumerate cases like was done in the Serbian case.

And yes, to my mind 'n' + 0x308 + 0x308 + 0x308 + 0x308 should be a valid sequence. If the user presses several times on the dead diaeresis, he probably expects to see as many diaeresis on the eventual letter.

For instance in the vietnamese language, 'u' + 0x300 + 0x323 _is_ needed (and all sorts of such combinations), and people would expect to see <dead_acute> <dead_dot_below> <u> producing it.

Do you really want to enumerate all such combinations for all languages in the world?

Comment 8 Simos Xenitellis 2008-02-01 00:38:39 UTC

(In reply to comment #7)
> I would also want to make this a default behaviour.  And this BTW would enable
> all those languages, that neither you and I know, to easily put their accents
> on letters. For instance, iirc the vietnamese support is still not complete. 
> By just implementing what I'm proposing, that would solve this kind of issue
> for _all_ languages, without having to enumerate cases like was done in the
> Serbian case.
> 
> And yes, to my mind 'n' + 0x308 + 0x308 + 0x308 + 0x308 should be a valid
> sequence. If the user presses several times on the dead diaeresis, he probably
> expects to see as many diaeresis on the eventual letter.

What's the difference between n̈ and n̈̈̈?

(answer: the first has a single diairesis, the second has three. on my system these look the same).   

When you press backspace to erase the character, the first diaeresis goes away. In the second case you would press backspace and nothing would appear to change on screen. User gets confused, files bug report. There are quite a few similar cases of confusion.

> For instance in the vietnamese language, 'u' + 0x300 + 0x323 _is_ needed (and
> all sorts of such combinations), and people would expect to see <dead_acute>
> <dead_dot_below> <u> producing it.
> 
> Do you really want to enumerate all such combinations for all languages in the
> world?
 
Not all of them but those that users request. At least at the beginning. 
These are defined at Unicode.org, so there is no need for guessing.

Comment 9 Samuel Thibault 2008-02-01 01:02:03 UTC

> > I would also want to make this a default behaviour.  And this BTW would
> > enable all those languages, that neither you and I know, to easily put their
> > accents on letters. For instance, iirc the vietnamese support is still not
> > complete.  By just implementing what I'm proposing, that would solve this
> > kind of issue for _all_ languages, without having to enumerate cases like
> > was done in the Serbian case.
> >
> > And yes, to my mind 'n' + 0x308 + 0x308 + 0x308 + 0x308 should be a valid
> > sequence. If the user presses several times on the dead diaeresis, he
> > probably expects to see as many diaeresis on the eventual letter.
> 
> What's the difference between n̈ and n̈?
> 
> (answer: the first has a single diairesis, the second has three. on my system
> these look the same).
> 
> When you press backspace to erase the character, the first diaeresis goes
> away.  In the second case you would press backspace and nothing would appear
> to change on screen. User gets confused, files bug report. There are quite a
> few similar cases of confusion.

Then your system is broken.  On mine, it correctly displays several diaeresis, which is what I expect. For instance, how ẫ is rendered on your system? If it is a garbage of tilde and circumflex, then your system can't correctly display vietnamese.

> > For instance in the vietnamese language, 'u' + 0x300 + 0x323 _is_ needed
> > (and all sorts of such combinations), and people would expect to see
> > <dead_acute> <dead_dot_below> <u> producing it.
> >
> > Do you really want to enumerate all such combinations for all languages in
> > the world?
> 
> Not all of them but those that users request.

Please re-read 341241: it is _very_ tiring for foreign people to update everything in every place so that their language at last works. The temptation to stick to Windows can be very big here just for this. Now we can do something so that this particular issue will be solved for all languages without having people to take the time to fix a list (which may be very big!), why not do it?

> These are defined at Unicode.org, so there is no need for guessing.

These? Where? Please show me where all the vietnamese accent combinations are enumerateed for instance.

Comment 10 Simos Xenitellis 2008-02-01 10:30:48 UTC

I think that it would be better to carry this discussion to the
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
mailing list. 
It would be good to get input from more people.

It is not good to use Bugzilla for discussion at this stage as notifications are send to a big bunch of people.

Could you please post a summary on the mailing list and we can pick up from there?

Comment 11 Samuel Thibault 2008-02-07 01:47:53 UTC

Mmm, it looks like the list moderator doesn't do his job: I sent an e-mail the same day you suggested it, and then asked for subscribtion ~12h later, still no news...

Comment 12 Simos Xenitellis 2008-02-07 02:22:59 UTC

That is not an moderated mailing list; you follow the instructions to become a member of the mailing list, then you send your e-mail to the list address (you get instructions for all these).

If you are not a member of the mailing list, then you e-mail falls through moderation which in most cases rejects the email.

My recent e-mail
http://mail.gnome.org/archives/gtk-i18n-list/2008-February/msg00000.html

Comment 13 Samuel Thibault 2008-02-07 02:25:11 UTC

That's what I did: I followed the link on the page, which sends an email to the list address, and I got

Your mail to 'gtk-i18n-list' with the subject

    subscribe

Is being held until the list moderator can review it for approval.

The reason it is being held:

    Post by non-member to a members-only list

Comment 14 Simos Xenitellis 2008-02-07 10:27:46 UTC

Consider that the e-mail that is "held for approval" will probably not make it to the list.

1) Complete the subscription to the list following the section "Subscribing to gtk-i18n-list" at
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

You will receive an email to verify your account. Follow the instructions.

2) Then, send the e-mail to the list again.

Comment 15 Samuel Thibault 2008-02-07 10:53:07 UTC

Then the link on www.gtk.org/mailing-lists.html is bogus and needs to be fixed.

Comment 16 Simos Xenitellis 2008-02-20 23:50:50 UTC

My opinion (now) on combining diacritics is that dead_keys should not produce them; a keyboard layout should be there to produce them. Dead keys appear to fit and probably should stay to pre-composed characters.

Please see for details
http://blogs.gnome.org/simos/2008/02/20/keyboard-layout-for-combining-diacritics/

Internally, GTK+ uses a table (length: 5) to store the compose sequence. Having a compose sequence with indeterminate length would require some rethinking. Per Unicode, there is no restriction as to how many combining diacritics you can place on a character.

Already there are keyboard layouts that define combining diacritics to keys.

With this in mind, I would consider this report as NOTGNOME (that means, it can be fixed somewhere else; in the xkeyboard-config project).

Samuel, what do you think?

Tor, do you have some input here?

Comment 17 Samuel Thibault 2008-02-21 21:08:21 UTC

dead_keys were never meant to be restricted to pre-composed characters. Unicode (and hence a potential list of precomposed character) even wasn't existing at the time...

As explained on the list, defining keys for combining accents is just duplicating existing keys which the user _already_ expect to work as combining accents (since they are dead accents)...

The fact that the X11 keysym space has duplications is not new: you can produce a latin lower case a through XK_a or 0x1000061, and _both_ ways are supposed to work the same. Here it's the same: X11 has had a way to put accents on letters for quite a long time: dead accents.  With the addition of unicode, another way was added, but there is no reason to make any distinction between them.

Comment 18 Matthias Clasen 2018-02-10 03:25:23 UTC

We're moving to gitlab! As part of this move, we are closing bugs that haven't seen activity in more than 5 years. If this issue is still imporant to you and
still relevant with GTK+ 3.22 or master, please consider creating a gitlab issue
for it.

Comment 19 Samuel Thibault 2018-04-17 07:23:45 UTC

(I have reported the issue on gitlab: https://gitlab.gnome.org/GNOME/gtk/issues/10)