GNOME Bugzilla – Bug 451682
Cursor positioning for Sinhala is broken
Last modified: 2007-08-02 16:17:53 UTC
Hi, The code added to support cursor positioning for Sinhala in Bug 353877 is incorrect. I'll attached a patch containing: * BUGFIX: Correct the broken Repaya handling code. * BUGFIX: Consonant clusters do NOT result in implicit conjuncts in SINHALA orthography. * The cursor should treat as a single glyph: SINHALA CONS + 0x0DCA + 0x200D + SINHALA CONS SINHALA CONS + 0x200D + 0x0DCA + SINHALA CONS Originally reported at: http://mail.gnome.org/archives/gtk-i18n-list/2007-June/msg00031.html Regards, Harshula
Created attachment 90768 [details] [review] Patch to correct Sinhala cursor positioning * BUGFIX: Correct the broken Repaya handling code. * BUGFIX: Consonant clusters do NOT result in implicit conjuncts in SINHALA orthography. * The cursor should treat as a single glyph: SINHALA CONS + 0x0DCA + 0x200D + SINHALA CONS SINHALA CONS + 0x200D + 0x0DCA + SINHALA CONS
Created attachment 90769 [details] Testcase
Why do you need to condition on being Sinhala? Isn't the character checks enough? And even if you need to check, isn't checking analysis.script == PANGO_SCRIPT_SINHALA easier? While at it, please first submit a patch to move this repeated four lines into a function: attrs[i].is_cursor_position = FALSE; attrs[i].is_char_break = FALSE; attrs[i].is_line_break = FALSE; attrs[i].is_mandatory_break = FALSE; In fact, they probably should just modify is_cursor_position and leave the rest as is.
> ------- Comment #3 from Behdad Esfahbod 2007-07-26 17:25 UTC ------- > Why do you need to condition on being Sinhala? Isn't the character checks > enough? Unfortunately not, Sinhala orthography is quite different to North Indian orthographies. The problem is 0x200D is used by all the scripts and it is not possible to control the behaviour of the Sinhala script without changing/breaking the behaviour of the Indic scripts. This should also avoid regressions in the Sinhala script when someone makes changes to the behaviour of the Indic scripts. That's the biggest problem at the moment. > And even if you need to check, isn't checking analysis.script == > PANGO_SCRIPT_SINHALA easier? Can you do that? PangoAnalysis (pango/pango-item.h) looks like: struct _PangoAnalysis { PangoEngineShape *shape_engine; PangoEngineLang *lang_engine; PangoFont *font; guint8 level; guint8 gravity; /* PangoGravity */ guint8 flags; PangoLanguage *language; GSList *extra_attrs; }; If there's a better way, I'm more than willing to change it. > While at it, please first submit a patch to move this repeated four lines into > a function: > > attrs[i].is_cursor_position = FALSE; > attrs[i].is_char_break = FALSE; > attrs[i].is_line_break = FALSE; > attrs[i].is_mandatory_break = FALSE; I followed the conventions already used by the code you have checked in. There's a lot of existing code that does exactly what you describe above. Are you asking me to clean up up the file in general? > In fact, they probably should just modify is_cursor_position and leave the > rest as is. What's your preference? I can create a macro which sets all 4 flags or if you are sure it is safe to delete all but the is_cursor_position flag, I can do that. cya, #
(In reply to comment #4) > > ------- Comment #3 from Behdad Esfahbod 2007-07-26 17:25 UTC ------- > > Why do you need to condition on being Sinhala? Isn't the character checks > > enough? > > Unfortunately not, Sinhala orthography is quite different to North Indian > orthographies. The problem is 0x200D is used by all the scripts and it is not > possible to control the behaviour of the Sinhala script without > changing/breaking the behaviour of the Indic scripts. > > This should also avoid regressions in the Sinhala script when someone makes > changes to the behaviour of the Indic scripts. That's the biggest problem at > the moment. > > > And even if you need to check, isn't checking analysis.script == > > PANGO_SCRIPT_SINHALA easier? > > Can you do that? PangoAnalysis (pango/pango-item.h) looks like: > > struct _PangoAnalysis > { > PangoEngineShape *shape_engine; > PangoEngineLang *lang_engine; > PangoFont *font; > > guint8 level; > guint8 gravity; /* PangoGravity */ > guint8 flags; > > PangoLanguage *language; > GSList *extra_attrs; > }; > > If there's a better way, I'm more than willing to change it. Your Pango is too old. Master has: struct _PangoAnalysis { PangoEngineShape *shape_engine; PangoEngineLang *lang_engine; PangoFont *font; guint8 level; guint8 gravity; /* PangoGravity */ guint8 flags; guint8 script; /* PangoScript */ PangoLanguage *language; GSList *extra_attrs; }; > > While at it, please first submit a patch to move this repeated four lines into > > a function: > > > > attrs[i].is_cursor_position = FALSE; > > attrs[i].is_char_break = FALSE; > > attrs[i].is_line_break = FALSE; > > attrs[i].is_mandatory_break = FALSE; > > I followed the conventions already used by the code you have checked in. > There's a lot of existing code that does exactly what you describe above. Are > you asking me to clean up up the file in general? > > > In fact, they probably should just modify is_cursor_position and leave the > > rest as is. > > What's your preference? I can create a macro which sets all 4 flags or if you > are sure it is safe to delete all but the is_cursor_position flag, I can do > that. Yes, remove it all and just leave the is_cursor_position lines. > cya, > #
> > > While at it, please first submit a patch to move this repeated four lines into > > > a function: > > > > > > attrs[i].is_cursor_position = FALSE; > > > attrs[i].is_char_break = FALSE; > > > attrs[i].is_line_break = FALSE; > > > attrs[i].is_mandatory_break = FALSE; > > > > I followed the conventions already used by the code you have checked in. > > There's a lot of existing code that does exactly what you describe above. Are > > you asking me to clean up up the file in general? > > > > > In fact, they probably should just modify is_cursor_position and leave the > > > rest as is. > > > > What's your preference? I can create a macro which sets all 4 flags or if you > > are sure it is safe to delete all but the is_cursor_position flag, I can do > > that. > > Yes, remove it all and just leave the is_cursor_position lines. Actually reading the code, it's easier to just set all four to false. Just move it to a function and should be fine.
Created attachment 92809 [details] [review] Updated patch to correct Sinhala cursor positioning * Create a new function bind_chars to set the flags of a PangoLogAttr. * Port patch from 16.4 to 17.5 * Use the new field 'script' in PangoLogAttr.
Created attachment 92812 [details] [review] committed patch I'm not sure that you are correctly handling is_conjunct. To me, you should be setting it to FALSE in more places. (in the else to the entire block maybe?)
2007-07-31 Behdad Esfahbod <behdad@gnome.org> Bug 451682 – Cursor positioning for Sinhala is broken Based on patch from Harshula * modules/indic/indic-lang.c (not_cursor_position), (indic_engine_break): Clean up cursor position stuff.
------- Comment #8 from Behdad Esfahbod 2007-07-31 18:59 UTC ------- > Created an attachment (id=92812) > --> (http://bugzilla.gnome.org/attachment.cgi?id=92812&action=view) > committed patch The purpose of my patch was to: /* * Consonant clusters do NOT result in implicit conjuncts * in SINHALA orthography. */ else if (!is_conjunct && prev_wc == 0x0DCA && this_wc != 0x200D) { attrs[i].is_cursor_position = TRUE; } Note that I'm setting is_cursor_position to *TRUE*. The modified patch you committed sets it to *FALSE*. > I'm not sure that you are correctly handling is_conjunct. To me, you should be > setting it to FALSE in more places. (in the else to the entire block maybe?) For valid sequences of chars the handling of is_conjunct is correct. For invalid sequences of chars the cursor positioning is undefined and is the least of our problems. i.e. The rendering will be a mess. BTW, your changes to the non-Sinhala Indic cursor positioning code was brave! ;-) cya, #
(In reply to comment #10) > ------- Comment #8 from Behdad Esfahbod 2007-07-31 18:59 UTC ------- > > Created an attachment (id=92812) [edit] > > --> (http://bugzilla.gnome.org/attachment.cgi?id=92812&action=view) > > committed patch > > The purpose of my patch was to: > > /* > * Consonant clusters do NOT result in implicit conjuncts > * in SINHALA orthography. > */ > else if (!is_conjunct && prev_wc == 0x0DCA && this_wc != 0x200D) > { > attrs[i].is_cursor_position = TRUE; > } > > Note that I'm setting is_cursor_position to *TRUE*. The modified patch you > committed sets it to *FALSE*. That slipped my eyes. Still, I'm not very happy about that. Is default rules setting cursor position to FALSE that you need to set it to TRUE here? If not, you better just leave it as is. I don't mind the change though. Committing the fix.
2007-08-01 Behdad Esfahbod <behdad@gnome.org> Bug 451682 – Cursor positioning for Sinhala is broken Based on patch from Harshula * modules/indic/indic-lang.c (indic_engine_break): Fix bug that I introduced in the patch.
Closing again.
------- Comment #11 from Behdad Esfahbod 2007-08-01 18:34 UTC ------- > Is default rules setting cursor position to FALSE that you need to set it > to TRUE here? Yep, that's right. The orthography rules for Sinhala are very different to the North Indian scripts. Indic support tends to be North Indian script centric. cya, #