After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 787229 - g_unichar_iszerowidth does not handle Prepended_Concatenation_Mark correctly
g_unichar_iszerowidth does not handle Prepended_Concatenation_Mark correctly
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: i18n
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2017-09-03 23:14 UTC by Mike Frysinger
Modified: 2018-05-24 19:47 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Mike Frysinger 2017-09-03 23:14:06 UTC
glib currently marks all Cf (Format Character) as zero width, but this ignores Prepended_Concatenation_Mark codepoints.  i guess gen-unicode-tables.pl should be consulting PropList.txt from the Unicode releases.

specifically these should all return false w/g_unichar_iszerowidth:
0600..0605 ; Prepended_Concatenation_Mark # Cf  ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE
06DD       ; Prepended_Concatenation_Mark # Cf  ARABIC END OF AYAH
070F       ; Prepended_Concatenation_Mark # Cf  SYRIAC ABBREVIATION MARK
08E2       ; Prepended_Concatenation_Mark # Cf  ARABIC DISPUTED END OF AYAH
110BD      ; Prepended_Concatenation_Mark # Cf  KAITHI NUMBER SIGN

Unicode 10.0.0 chapter 9 section 2 page 377-378 [1] states:
Signs Spanning Numbers. Several other special signs are written in association with numbers in the Arabic script. All of these signs can span multiple-digit numbers, rather than just a single digit. They are not formally considered combining marks in the sense used by the Unicode Standard, although they clearly interact graphically with their associated sequence of digits. In the text representation they precede the sequence of digits that they span, rather than follow a base character, as would be the case for a combining mark. Their General_Category value is Cf (format character). Unlike most other format characters, however, they should be rendered with a visible glyph, even in circumstances where no suitable digit or sequence of digits follows them in logical order. The characters have the Bidi_Class value of Arabic_Number to make them appear in the same run as the numbers following them.

A few similar signs spanning numbers or letters are associated with scripts other than Arabic. See the discussion of U+070F syriac abbreviation mark in Section 9.3, Syriac, and the discussion of U+110BD kaithi number sign in Section 15.2, Kaithi. All of these prefixed format controls, including the non-Arabic ones, are given the property value Prepended_Concatenation_Mark=True, to identify them as a class. They also have special behavior in text segmentation. (See Unicode Standard Annex #29, “Unicode Text Segmentation.”)

[1] http://unicode.org/versions/Unicode10.0.0/ch09.pdf
Comment 1 GNOME Infrastructure Team 2018-05-24 19:47:08 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1286.