GNOME Bugzilla – Bug 150883
Unicode LRO defect
Last modified: 2007-09-29 03:15:31 UTC
When LRO is used PangoLayout shaping fails. Compile and run the snippet, notice the shaping is wrong. The unicode sequence is: \u202d\u0637\u0627\u0644\u0628\u0020\u0633\u0644\u0627\u0645\u0020\u0645\u062d\u0645\u062f Where \u202d == LRO
Created attachment 30873 [details] snippet Snippet Relate to https://bugs.eclipse.org/bugs/show_bug.cgi?id=72413
Created attachment 30874 [details] bad shaping
Created attachment 30875 [details] good shaping
I used the same font (Tahoma) in both screenshots so it is easier for us (english speaker) to see the differences.
Relate to https://bugs.eclipse.org/bugs/show_bug.cgi?id=72413
Unicode standard reference: http://www.unicode.org/reports/tr9/#Shaping To get the behavior specified there, I think the right approach is to, in arabic_engine_shape(), if the direction of the run is LTR, just reverse the characters in it before feeding it to the rest of the shaping process. Then you'd have to fix up the logical clusters afterwards. (You might want to use character offsets not indices as the clusters input to pango_ot_buffer_add_glyph() and then map back to character indices later... might be easier.) What wont' work, and would be very hard to get working within the Pango framework, is to have shaping across multiple directional runs.
I'm supposed to fix this. Fortunately the consensus is to limit shaping to directional runs, but it still is problematic, since ZWJ needs special handling. In normal day-to-day text you may have adjacent ZWJ and Arabic text that ZWJ gets an even embedding level, and Arabic text gets odd... Unicode says that ZWJ and ZWNJ should affect the adjacent base letters, no matter what the embedding level is. Any idea?
To be exact, ZWJ (and other BN chars) are removed in the bidi process. What happens right now is that FriBidi assigns some sane embedding level values to this chars, so what we get. Since they are Boundary Neutral characters, by definition, any assignment of an embedding level can break the adjacency to either previous or next char. In current FriBidi CVS code, I do shaping on the whole paragraph in one pass, using the embedding levels as input. Do you think there's a way to do this in Pango in near future? [I get all reports on pango, no need to CC, just that I don't have internet at home this week :(]
I don't think shaping across directional levels is in the future for Pango; it would require a major change in the shaping pipeline. And, after all, you might have: Indic text ZWJ Arabic text You can't pass that all to the Arabic shaper! What might be possible to do is have an extended version of the script_shape() virtual function that tags a "flags" argument with a ZWJ_BEFORE flag, or something like that. (This might be useful for dealing with special behavior at the end of lines like hanging punctuation as well.)
I started testing Bidi on the Mac and ATSUI got this right. Behdad/Owen no progress here ?
2007-07-24 Behdad Esfahbod <behdad@gnome.org> Bug 150883 – Unicode LRO defect * modules/arabic/arabic-fc.c (arabic_engine_shape): * modules/arabic/arabic-ot.c (Get_Joining_Class), (Arabic_Assign_Properties): * modules/arabic/arabic-ot.h: Correctly handle Arabic shaping in left-to-right runs.
Thanks Behdad
behdad, could you please close the same bug in RedHat bugzilla ? https://bugzilla.redhat.com/show_bug.cgi?id=185490 Thank you, :)