GNOME Bugzilla – Bug 345066
backspace changes independent indic characters
Last modified: 2012-08-09 06:35:22 UTC
Please describe the problem: Opened by Jatin Nansi (jnansi@redhat.com) on 2005-01-18 08:03 EST [reply] Private Description of problem: when backspace key is hit after a consonant with a dot at the bottom, the dot vanishes and the result is an independent and unrelated alphabet. The dot in this case is not a vowel sign, it is a part of the consonant itself. The 1st example is of Bengali Yaa - <09DF>. A backspace changes it to a bengali Ya - <09AF>. See attached image for example. The image uses the bengali probhat keyboard layout. Version-Release number of selected component (if applicable): 1.6.0-7 How reproducible: Every time Steps to Reproduce: 1. Start gedit in bengali locale 2. Ctrl+space, F6 3. press 'z' then bkspace Actual results: The 'dot' below the yaa character gets deleted, and it becomes a ya. Expected results: The complete yaa character should get deleted. Additional info: Tested on RHEL4-RC-0107.0 WS Steps to reproduce: Actual results: Expected results: Does this happen every time? Other information:
Created attachment 67457 [details] screenshot of gedit screenshot of gedit
the same bug in RedHat bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=145431
Yeah, this should be fixed in the Indic Pango module by correctly setting the backspace_deletes_character (or similar named) bit.
I debug this bug and I find that this bug is not the bug of Pango, but the bug of gtk. gtk_entry_backspace( ) and gtk_text_buffer_backspace( ) need be modified. I will write a patch for this bug. This bug can be closed. This is not a bug of Pango. I filed a new bug of gtk. http://bugzilla.gnome.org/show_bug.cgi?id=348107
I wrote a patch for this bug. The patch is below: http://bugzilla.gnome.org/show_bug.cgi?id=348107
See Owens comment on the other bug. This should be fixed in the indic module, not by special-casing inside some widgets.
This bug is not relative to pango, the reason of creating this bug is that calling fg_utf8_normalize( ) in gtk_entry_backspace( ) or gtk_text_buffer_backspace( ). When 0x09df passes on to g_utf8_normalize( ), it returns 0x09af and 0x09bc. Then it becomes two glyphs. It looks like that this bug is relative to glib, because g_utf8_normalize( ) finds the conjuctions in decomp_table[ ] of gunidecomp.h .
Sorry, I found that this bug is not relative to glib, but still is relative to gtk. And I wrote a new patch for this bug. http://bugzilla.gnome.org/show_bug.cgi?id=348107
*** Bug 348107 has been marked as a duplicate of this bug. ***
Seems like an Indic language engine is needed.
Ok, we now have an Arabic lang engine in HEAD that implements the exact same feature requested in this bug but for Arabic. LingNing, can you write the Indic module? See bug 350132 for the Arabic module.
The Indic lang engine is there. Just see what the Arabic lang engine is doing and do similarly in the Indic engine.
Created attachment 90121 [details] [review] Fix based on the arabic lang engine The attached patch handles the following characters * Bengali RRA (U+09DC) * Bengali RHA (U+09DD) * Bengali YYA (U+09DF). It is based on the arabic lang engine fix, as suggested in comment #12.
I think this is a bug with normalization and should be fixed there only. Bengali Yaa need not be normalized to a nukta form if it an independent character. But this is defined in Unicode Character Database and need to be fixed there first. 09DF;09AF 09BC;09AF 09BC;09AF 09BC;09AF 09BC; # (য়; য◌়; য◌়; য◌়; য◌়; ) BENGALI LETTER YYA http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
Created attachment 150160 [details] patch for handling indic NFC behdad as per your comment at https://bugzilla.gnome.org/show_bug.cgi?id=350132#c20 attaching here just for review, since already same kind of bug but somehow its not working for split matras (IS_SPLIT_MATRA_BRAHMI), since (0995 + 09cb ) after NFC it becomes (09c7 + 0995+ 09be) and single backspace key deletes all(0995 + 09cb) :( it will be nice if we can keep this going here now :)
Created attachment 150162 [details] total indic characters require fix this is a list of total characters required backspace fix.
Thanks. I have created patch that will cover all these SPLIT_MATRAS and COMPOSITE characters which need correct backspace behavior. one can test Fedora 12 build of pango from http://koji.fedoraproject.org/koji/taskinfo?taskID=1885360
Created attachment 150272 [details] [review] patch to fix backspace behaviour for Indic characters
Patch committed.
Somehow we missed character U+0929 in this patch. Should i provide patch for adding that character? https://bugzilla.redhat.com/show_bug.cgi?id=501900 Or does harfbuzz-ng will deprecate all these fixes?
▼ Hide quoted text Problem here is, what if the user intends to delete only the nukta (dot sign for which normalization is done)? In such case it does not make a good experience if the whole character is deleted. In any case most of such nukta added characters are typed using two keystrokes and most keylayouts do not have a single direct key to input these. Hence to support normalization and still not create too much of user experience glitch, a good tradeoff would be to keep it as it is and deprecate these fixes at least for the nukta cases.
(In reply to comment #21) > ▼ Hide quoted text > > Problem here is, what if the user intends to delete only the nukta (dot sign > for which normalization is done)? In such case it does not make a good > experience if the whole character is deleted. In any case most of such nukta > added characters are typed using two keystrokes and most keylayouts do not have > a single direct key to input these. With applied patch things happening as you said/expects i.e. backspace deleting characters as per users input.
HarfBuzz doesn't fix cursoring and deletion issues, but now that I have a better understanding of the Indic scripts, I expect to rewrite the Pango Indic language module in a few months...
that is good to know. thanks you.