GNOME Bugzilla – Bug 103938
feature request: better handling of invalid combinations
Last modified: 2012-08-18 17:10:26 UTC
Package: pango Severity: enhancement Version: 1.2.x Synopsis: feature request: better handling of invalid combinations Bugzilla-Product: pango Bugzilla-Component: general Description: OpenType documentation[1] seems to say that invalid combinations should be handled thusly: - If a combining character is preceded by a space and a zero width joiner, render it all by itself. - Otherwise, render it on top of U+25CC DOTTED CIRCLE. This way it will look the way it does in the unicode.org code charts. Also worth noting is that the Unicode Standard 3.2, section 2.6, says: "By convention, diacritical marks used by the Unicode Standard may be exhibited in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO BREAK SPACE." [1] http://microsoft.com/typography/otfntdev/indicot/other.htm, http://microsoft.com/typography/otfntdev/arabicot/other.htm ------- Bug moved to this database by unknown@bugzilla.gnome.org 2003-01-19 22:42 ------- Reassigning to the default owner of the component, otaylor@redhat.com.
Created attachment 18445 [details] [review] patch to improve the current behavior somewhat
The patch above doesn't do anything with dotted circles. It seems like that'd be pretty hard. Instead, it attempts to allot an appropriate amount of room for a sequence of combining characters applied to a space or not applied to anything (at the beginning of a line for example). Incidentally, as I read the standard, a space followed by a combining character isn't technically an invalid combination. Section 3.6 D17a: "Defective combining character sequence: a combining character sequence that does not start with a base character. - Defective combining character sequences occur when a sequence of combining characters appears at the start of a string or follows a control or format character."
My patch above is wrong. It doesn't handle the case where the base character and combining character, or different combining characters, come from different fonts.
What was the conclusion of the discussion about this on the Unicode mailing list? I thought some people were arguing that the OpenType interpretation was clearly incompatible with the Unicode spec. (Though it may well be the case that we should deviate from the Unicode spec as well if that is going ot make things better for our users.)
In my opinion, the conclusion of the thread was that space+diacritic should show the diacritic in isolation, and a diacritic in isolation should be shown on a dotted circle. It's not crystal clear though. John Cowan does say, "This is a clear demonstration that Uniscribe fails to implement a standard correctly, a property unique neither to Microsoft nor to the Unicode Standard," in reference to the space+ZWJ thing.
*** Bug 121095 has been marked as a duplicate of this bug. ***
*** Bug 132378 has been marked as a duplicate of this bug. ***
*** Bug 127176 has been marked as a duplicate of this bug. ***
Bug 127176 contains patch to do dotted-circle for invalid mark sequences in the Arabic shaper.
Red Hat bug about Punjabi and dotted circle (which is implemented in Qt): https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202181
Closing obsolete. Should be addressed in HarfBuzz if ever.