GNOME Bugzilla – Bug 103938
feature request: better handling of invalid combinations
Last modified: 2012-08-18 17:10:26 UTC
Synopsis: feature request: better handling of invalid combinations
OpenType documentation seems to say that invalid combinations should
be handled thusly:
- If a combining character is preceded by a space and a zero width
joiner, render it all by itself.
- Otherwise, render it on top of U+25CC DOTTED CIRCLE. This way it will
look the way it does in the unicode.org code charts.
Also worth noting is that the Unicode Standard 3.2, section 2.6, says:
"By convention, diacritical marks used by the Unicode Standard may be
exhibited in (apparent) isolation by applying them to U+0020 SPACE or to
U+00A0 NO BREAK SPACE."
------- Bug moved to this database by email@example.com 2003-01-19 22:42 -------
Reassigning to the default owner of the component, firstname.lastname@example.org.
Created attachment 18445 [details] [review]
patch to improve the current behavior somewhat
The patch above doesn't do anything with dotted circles. It seems like
that'd be pretty hard. Instead, it attempts to allot an appropriate
amount of room for a sequence of combining characters applied to a
space or not applied to anything (at the beginning of a line for example).
Incidentally, as I read the standard, a space followed by a combining
character isn't technically an invalid combination. Section 3.6 D17a:
"Defective combining character sequence: a combining character
sequence that does not start with a base character. - Defective
combining character sequences occur when a sequence of combining
characters appears at the start of a string or follows a control or
My patch above is wrong. It doesn't handle the case where the base
character and combining character, or different combining characters,
come from different fonts.
What was the conclusion of the discussion about this
on the Unicode mailing list? I thought some people were
arguing that the OpenType interpretation was clearly
incompatible with the Unicode spec.
(Though it may well be the case that we should deviate
from the Unicode spec as well if that is going ot make
things better for our users.)
In my opinion, the conclusion of the thread was that space+diacritic
should show the diacritic in isolation, and a diacritic in isolation
should be shown on a dotted circle. It's not crystal clear though.
John Cowan does say, "This is a clear demonstration that Uniscribe
fails to implement a standard correctly, a property unique neither to
Microsoft nor to the Unicode Standard," in reference to the space+ZWJ
*** Bug 121095 has been marked as a duplicate of this bug. ***
*** Bug 132378 has been marked as a duplicate of this bug. ***
*** Bug 127176 has been marked as a duplicate of this bug. ***
Bug 127176 contains patch to do dotted-circle for invalid mark sequences in the Arabic shaper.
Red Hat bug about Punjabi and dotted circle (which is implemented in Qt):
Closing obsolete. Should be addressed in HarfBuzz if ever.