GNOME Bugzilla – Bug 127176
Arabic: Show NSMs applied to dotted circle
Last modified: 2006-07-08 03:51:41 UTC
Same as the way they are shown in the Unicode book.
What if not every font has dotted circle? It will look ugly... btw, the pango indic shapers do this automatically for combining characters that they handle. So I could claim that it should be done at that level. But I'm happy to listen to your views. :)
Well, it's the effect I like to see. If the glyph is missing, should be simply ignored, otherwise, I prefer it to be handled here, than in pango. If everyone agrees that it should be done in pango, I would go implement ours (Arabic), but it's not the cleanest way to handle it multiple times IMHO.
You are talking about two different things here. The first is a malformed sequence, like a combining mark at the beginning of a line. But the second is an impossible sequence, like two Fathas following an Arabic letter. The first should be implemented in the main pango, the second in modules that have such issues. Behdad, if pango should do that dotted circle thing for the first case, please check the Unicode book about recommendations on such cases. The dotted circle thing was only recommended in OpenType specification, and for the second case here, IIRC.
Roozbeh: I think pango can handle orphan marks quite well (those starting at the beginning of a a bidi run). AFAIR Unicode has no recommendation there. BTW, it's not quite clean to say, put the circle there, but not when a mark is applied to a space char. So now we want to cover spaces too, but then, should we forbid marks on digits? What about punctuation? and it goes on and on... So, better forget about that, and humm, gucharmap may take care of that, but not if the dotted circle can be selected. BTW, try the dotted circle in gucharmap, seems that my non-bitmap fonts do not have the glyph. Better forget that for the moment I think... Owen: Do you like Pango to handle that?
Behdad: I think this cannot be done in gucharmap or elsewhere that wants to show just one glyph, and it must be done in a renderer engines like pango. For Arabic, As OpenType specification says, if a font has dotted circle glyph (That mapped through "cmap" table), renderers (They mean UNISCRIBE) will render conflicting marks on a dotted circle glyph. But if it does not have any, renderers will not do "fall-back rendering" (Rendering marks on a Dotted Circle). Now, in current version of UNISCRIBE they have these policies for fall-back rendering: 1. Orphan marks will not be rendered on a dotted circle. 2. Marks at the end of the run ( i mean an item ) will not be rendered on a dotted circle. 3. Marks after space will not be rendered on a dotted circle ( because space if just a base glyph ) 4. Marks between spaces will not be rendered on a dotted circle. 5. If there are two confilicting, and 5.1. the first is not rendered on a base glyph, both of them will be rendered on a dotted circle. 5.2. the first is rendered on a base glyph, the second mark will be rendered on a dotted circle. 6. If there is a marks after a mark rendered on a dotted circle 6.1. If the last two marks are not confilcting, both will be rendered on just one dotted circle. 6.2. If the last two marks are confilicting, the last one will be rendered on a new dotted circle. By the way, a dotted circle is treated like other base glyphs.
I have discussed about "Fall-Back Rendering" with Mehran, and we have concluded that for "Fall-Back Rendering" just two polocies are enough: 1. For any orphan mark (Marks at the beginning of the paragraph), put a "Dotted Circle" before. 2. For any other conflictings in the paragraph, put a "Dotted Circle" between conflicting marks. And, we have found that uniscribe has a bug for rendering marks at the end of the run.
See also bug 103938. Is this a dup?
Created attachment 23803 [details] [review] Fall-Back Rendering for Arabic Language
This needs some redoing with the current arabic-fc.c; hopefully things will come out cleaner with the current code in any case. The main thing that I find unattractive about the patch is things like: if( a[i] ) properties[i + dotted_circle_count++] = isolated_p; This rather should be smething like: if( a[in_pos] ) properties[out_pos++] = isolated_p; But why the name 'a'? Why is it an array of gushort, if is used to hold boolean values? (What possibly could be done is just to put a "add_dotted_circle" into the properties values, since there are spare bits. A little bit hackish but should lead to more readable code.
For reference, here is the spec that the patch implements: http://www.microsoft.com/typography/otfntdev/arabicot/other.htm Please note that in the implementation too. I believe Roozbeh is going to take a look at the patch.
Marking as dup. *** This bug has been marked as a duplicate of 103938 ***