After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 541507 - Ambiguous description of assigned characters in the Glib Unicode Manipulation reference
Ambiguous description of assigned characters in the Glib Unicode Manipulation...
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: docs
2.17.x
Other Linux
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2008-07-04 07:52 UTC by Eugene Shatokhin
Modified: 2008-07-04 18:05 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Eugene Shatokhin 2008-07-04 07:52:25 UTC
The description of g_unichar_isdefined function states:

Returns : TRUE if the character has an assigned value

What is called "character" here corresponds to "code point" in the Unicode standard version 5.0 (http://www.unicode.org/versions/Unicode5.0.0/) and later. This standard states the following concerning assignment of values to the code points (http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf, Chapter 2 "General Structure", section 2.4 "Code Points and Characters"):

Not all assigned code points represent abstract characters; only Graphic, Format, Control and Private-use do. Surrogates and Noncharacters are assigned code points but are not assigned to abstract characters.

The meaning of the term "assigned" is rather unclear in the description of g_unichar_isdefined function.

If "assigned" corresponds to the code points assigned to abstract characters, g_unichar_isdefined should return FALSE for the code points from "Surrogates" and "Noncharacters" groups.

If "assigned" implies just assigned code points, g_unichar_isdefined should return TRUE for the code points from "Surrogates" and "Noncharacters" groups.

However, g_unichar_isdefined returns TRUE for "Surrogates", but FALSE for "Noncharacters" code points in glib up to 2.17.3 inclusive.

For example, one may check the return value of g_unichar_isdefined when it is called for the following code points: 
- 0xD800 (U+D800) - "surrogate" code point, the function returns TRUE 
- 0xFDD0 (U+FDD0) - "noncharacter" code point, the function returns FALSE

That is, the actual behaviour of g_unichar_isdefined function corresponds to neither of the meanings of the term "assigned" specified in the Unicode standard.

If this is intentional and "has an assigned value" means something different in the description of g_unichar_isdefined than in the Unicode standard, it should be stated explicitly to avoid confusion.
Comment 1 Behdad Esfahbod 2008-07-04 18:05:26 UTC
2008-07-04  Behdad Esfahbod  <behdad@gnome.org>

        Bug 541507 – Ambiguous description of assigned characters in the Glib
        Unicode Manipulation reference

        * glib/guniprop.c
        (g_unichar_isgraph): Return true for PrivateUse too.
        (g_unichar_isprint): Return true for PrivateUse too.
        (g_unichar_isdefined): Return false for Surrogate.