Bug 345066 – backspace changes independent indic characters

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 345066 - backspace changes independent indic characters


Summary:	backspace changes independent indic characters


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	indic
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Pango Indic
QA Contact:	Pango Indic

URL:
Whiteboard:

Duplicates:	348107 (view as bug list)
Depends on:
Blocks:

Reported:	2006-06-16 02:06 UTC by LingNing Zhang
Modified:	2012-08-09 06:35 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
screenshot of gedit (23.44 KB, image/png) 2006-06-16 02:07 UTC, LingNing Zhang		Details
Fix based on the arabic lang engine (1.05 KB, patch) 2007-06-17 06:39 UTC, Sayamindu Dasgupta	none	Details \| Review
patch for handling indic NFC (1.46 KB, application/octet-stream) 2009-12-21 09:07 UTC, Pravin Satpute		Details
total indic characters require fix (276 bytes, text/plain) 2009-12-21 09:14 UTC, Pravin Satpute		Details
patch to fix backspace behaviour for Indic characters (2.63 KB, patch) 2009-12-23 05:00 UTC, Parag AN	none	Details \| Review

Description LingNing Zhang 2006-06-16 02:06:34 UTC

Please describe the problem:
Opened by Jatin Nansi (jnansi@redhat.com)  	 on 2005-01-18 08:03 EST  	[reply]  	   Private

Description of problem:
when backspace key is hit after a consonant with a dot at the bottom,
the dot vanishes and the result is an independent and unrelated
alphabet. The dot in this case is not a vowel sign, it is a part of
the consonant itself. 
The 1st example is of Bengali Yaa - <09DF>. A backspace changes it to
a bengali Ya - <09AF>.
See attached image for example. The image uses the bengali probhat
keyboard layout.


Version-Release number of selected component (if applicable):
1.6.0-7


How reproducible:
Every time


Steps to Reproduce:
1. Start gedit in bengali locale
2. Ctrl+space, F6
3. press 'z' then bkspace
  
Actual results:
The 'dot' below the yaa character gets deleted, and it becomes a ya.


Expected results:
The complete yaa character should get deleted.


Additional info:
Tested on RHEL4-RC-0107.0 WS

Steps to reproduce:



Actual results:


Expected results:


Does this happen every time?


Other information:

Comment 1 LingNing Zhang 2006-06-16 02:07:22 UTC

Created attachment 67457 [details]
screenshot of gedit

screenshot of gedit

Comment 2 LingNing Zhang 2006-06-16 02:17:33 UTC

the same bug in RedHat bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=145431

Comment 3 Behdad Esfahbod 2006-06-17 01:45:58 UTC

Yeah, this should be fixed in the Indic Pango module by correctly setting the backspace_deletes_character (or similar named) bit.

Comment 4 LingNing Zhang 2006-07-20 09:50:04 UTC

I debug this bug and I find that this bug is not the bug of Pango, but the bug of gtk. 
gtk_entry_backspace( ) and gtk_text_buffer_backspace( ) need be modified.
I will write a patch for this bug.
This bug can be closed. This is not a bug of Pango.
I filed a new bug of gtk.
http://bugzilla.gnome.org/show_bug.cgi?id=348107

Comment 5 LingNing Zhang 2006-07-21 03:13:48 UTC

I wrote a patch for this bug.
The patch is below:
http://bugzilla.gnome.org/show_bug.cgi?id=348107

Comment 6 Matthias Clasen 2006-07-22 13:46:14 UTC

See Owens comment on the other bug. This should be fixed in the indic module,
not by special-casing inside some widgets.

Comment 7 LingNing Zhang 2006-07-24 03:00:56 UTC

This bug is not relative to pango, the reason of creating this bug is that
calling fg_utf8_normalize( ) in gtk_entry_backspace( ) or
gtk_text_buffer_backspace( ). When 0x09df passes on to g_utf8_normalize( ), it
returns 0x09af and 0x09bc. Then it becomes two glyphs. 
It looks like that this bug is relative to glib, because g_utf8_normalize( )
finds the conjuctions in decomp_table[ ] of gunidecomp.h .

Comment 8 LingNing Zhang 2006-07-24 08:06:14 UTC

Sorry, I found that this bug is not relative to glib, but still is relative to
gtk.
And I wrote a new patch for this bug.
http://bugzilla.gnome.org/show_bug.cgi?id=348107

Comment 9 Behdad Esfahbod 2006-09-08 17:39:12 UTC

*** Bug 348107 has been marked as a duplicate of this bug. ***

Comment 10 Behdad Esfahbod 2006-09-08 17:39:58 UTC

Seems like an Indic language engine is needed.

Comment 11 Behdad Esfahbod 2006-09-18 22:14:48 UTC

Ok, we now have an Arabic lang engine in HEAD that implements the exact same feature requested in this bug but for Arabic.

LingNing, can you write the Indic module?

See bug 350132 for the Arabic module.

Comment 12 Behdad Esfahbod 2006-10-12 18:50:54 UTC

The Indic lang engine is there.  Just see what the Arabic lang engine is doing and do similarly in the Indic engine.

Comment 13 Sayamindu Dasgupta 2007-06-17 06:39:45 UTC

Created attachment 90121 [details] [review]
Fix based on the arabic lang engine

The attached patch handles the following characters 
* Bengali RRA (U+09DC)
* Bengali RHA (U+09DD) 
* Bengali YYA (U+09DF).
It is based on the arabic lang engine fix, as suggested in comment #12.

Comment 14 Rahul Bhalerao 2008-06-16 09:37:15 UTC

I think this is a bug with normalization and should be fixed there only. Bengali Yaa need not be normalized to a nukta form if it an independent character. But this is defined in Unicode Character Database and need to be fixed there first.

09DF;09AF 09BC;09AF 09BC;09AF 09BC;09AF 09BC; # (য়; য◌়; য◌়; য◌়; য◌়; ) BENGALI LETTER YYA

http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt

Comment 15 Pravin Satpute 2009-12-21 09:07:40 UTC

Created attachment 150160 [details]
patch for handling indic NFC

behdad as per your comment at 
https://bugzilla.gnome.org/show_bug.cgi?id=350132#c20

attaching here just for review, since already same kind of bug 
 
but somehow its not working for split matras (IS_SPLIT_MATRA_BRAHMI), since
 
(0995 + 09cb ) after NFC it becomes (09c7 + 0995+ 09be)  
and single backspace key deletes all(0995 + 09cb) :(

it will be nice if we can keep this going here now :)

Comment 16 Pravin Satpute 2009-12-21 09:14:50 UTC

Created attachment 150162 [details]
total indic characters require fix

this is a list of total characters required backspace fix.

Comment 17 Parag AN 2009-12-23 04:57:20 UTC

Thanks. I have created patch that will cover all these SPLIT_MATRAS and COMPOSITE characters which need correct backspace behavior.

one can test Fedora 12 build of pango from http://koji.fedoraproject.org/koji/taskinfo?taskID=1885360

Comment 18 Parag AN 2009-12-23 05:00:54 UTC

Created attachment 150272 [details] [review]
patch to fix backspace behaviour for Indic characters

Comment 19 Behdad Esfahbod 2010-03-04 01:44:09 UTC

Patch committed.

Comment 20 Pravin Satpute 2012-08-08 07:08:02 UTC

Somehow we missed character U+0929 in this patch. Should i provide patch for adding that character?

https://bugzilla.redhat.com/show_bug.cgi?id=501900 

Or does harfbuzz-ng will deprecate all these fixes?

Comment 21 Rahul Bhalerao 2012-08-08 10:17:14 UTC

▼ Hide quoted text

Problem here is, what if the user intends to delete only the nukta (dot sign for which normalization is done)? In such case it does not make a good experience if the whole character is deleted. In any case most of such nukta added characters are typed using two keystrokes and most keylayouts do not have a single direct key to input these. Hence to support normalization and still not create too much of user experience glitch, a good tradeoff would be to keep it as it is and deprecate these fixes at least for the nukta cases.

Comment 22 Pravin Satpute 2012-08-08 10:34:59 UTC

(In reply to comment #21)
> ▼ Hide quoted text
> 
> Problem here is, what if the user intends to delete only the nukta (dot sign
> for which normalization is done)? In such case it does not make a good
> experience if the whole character is deleted. In any case most of such nukta
> added characters are typed using two keystrokes and most keylayouts do not have
> a single direct key to input these. 

With applied patch things happening as you said/expects i.e. backspace deleting characters as per users input.

Comment 23 Behdad Esfahbod 2012-08-08 17:51:05 UTC

HarfBuzz doesn't fix cursoring and deletion issues, but now that I have a better understanding of the Indic scripts, I expect to rewrite the Pango Indic language module in a few months...

Comment 24 Pravin Satpute 2012-08-09 06:35:22 UTC

that is good to know. thanks you.