GNOME Bugzilla – Bug 127176
Arabic: Show NSMs applied to dotted circle
Last modified: 2006-07-08 03:51:41 UTC
Same as the way they are shown in the Unicode book.
What if not every font has dotted circle? It will look ugly...
btw, the pango indic shapers do this automatically for combining
characters that they handle. So I could claim that it should be done
at that level. But I'm happy to listen to your views. :)
Well, it's the effect I like to see. If the glyph is missing, should
be simply ignored, otherwise, I prefer it to be handled here, than in
pango. If everyone agrees that it should be done in pango, I would go
implement ours (Arabic), but it's not the cleanest way to handle it
multiple times IMHO.
You are talking about two different things here. The first is a
malformed sequence, like a combining mark at the beginning of a line.
But the second is an impossible sequence, like two Fathas following an
The first should be implemented in the main pango, the second in
modules that have such issues.
Behdad, if pango should do that dotted circle thing for the first
case, please check the Unicode book about recommendations on such
cases. The dotted circle thing was only recommended in OpenType
specification, and for the second case here, IIRC.
Roozbeh: I think pango can handle orphan marks quite well (those
starting at the beginning of a a bidi run). AFAIR Unicode has no
recommendation there. BTW, it's not quite clean to say, put the
circle there, but not when a mark is applied to a space char. So now
we want to cover spaces too, but then, should we forbid marks on
digits? What about punctuation? and it goes on and on... So, better
forget about that, and humm, gucharmap may take care of that, but not
if the dotted circle can be selected. BTW, try the dotted circle in
gucharmap, seems that my non-bitmap fonts do not have the glyph.
Better forget that for the moment I think...
Owen: Do you like Pango to handle that?
Behdad: I think this cannot be done in gucharmap or elsewhere that
wants to show just one glyph, and it must be done in a renderer
engines like pango.
For Arabic, As OpenType specification says, if a font has dotted
circle glyph (That mapped through "cmap" table), renderers (They mean
UNISCRIBE) will render conflicting marks on a dotted circle glyph. But
if it does not have any, renderers will not do "fall-back rendering"
(Rendering marks on a Dotted Circle).
Now, in current version of UNISCRIBE they have these policies for
1. Orphan marks will not be rendered on a dotted circle.
2. Marks at the end of the run ( i mean an item ) will not be rendered
on a dotted circle.
3. Marks after space will not be rendered on a dotted circle ( because
space if just a base glyph )
4. Marks between spaces will not be rendered on a dotted circle.
5. If there are two confilicting, and
5.1. the first is not rendered on a base glyph, both of them
will be rendered on a dotted circle.
5.2. the first is rendered on a base glyph, the second mark
will be rendered on a dotted circle.
6. If there is a marks after a mark rendered on a dotted circle
6.1. If the last two marks are not confilcting, both will be
rendered on just one dotted circle.
6.2. If the last two marks are confilicting, the last one will
be rendered on a new dotted circle.
By the way, a dotted circle is treated like other base glyphs.
I have discussed about "Fall-Back Rendering" with Mehran, and we have
concluded that for "Fall-Back Rendering" just two polocies are enough:
1. For any orphan mark (Marks at the beginning of the paragraph), put
a "Dotted Circle" before.
2. For any other conflictings in the paragraph, put a "Dotted Circle"
between conflicting marks.
And, we have found that uniscribe has a bug for rendering marks at the
end of the run.
See also bug 103938. Is this a dup?
Created attachment 23803 [details] [review]
Fall-Back Rendering for Arabic Language
This needs some redoing with the current arabic-fc.c; hopefully
things will come out cleaner with the current code in any
The main thing that I find unattractive about the patch is
if( a[i] )
properties[i + dotted_circle_count++] = isolated_p;
This rather should be smething like:
if( a[in_pos] )
properties[out_pos++] = isolated_p;
But why the name 'a'? Why is it an array of gushort, if is
used to hold boolean values?
(What possibly could be done is just to put a "add_dotted_circle"
into the properties values, since there are spare bits. A little
bit hackish but should lead to more readable code.
For reference, here is the spec that the patch implements:
Please note that in the implementation too. I believe Roozbeh is going to take
a look at the patch.
Marking as dup.
*** This bug has been marked as a duplicate of 103938 ***