After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 107427 - some invalid characters considered valid, in g_unichar_validate and elsewhere
some invalid characters considered valid, in g_unichar_validate and elsewhere
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 109378 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2003-03-03 00:21 UTC by Noah Levitt
Modified: 2011-02-18 15:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
I think this fixes it. (644 bytes, patch)
2003-03-03 00:23 UTC, Noah Levitt
none Details | Review
Sorry, use this one. Should have the parens around (Char) (644 bytes, patch)
2003-03-03 00:26 UTC, Noah Levitt
none Details | Review
argh, that was the same as the first one. This one is fixed, promise. (648 bytes, patch)
2003-03-03 00:28 UTC, Noah Levitt
none Details | Review

Description Noah Levitt 2003-03-03 00:21:46 UTC
According to http://www.unicode.org/reports/tr27/

  There are 34 specific code points in Unicode 3.0 that are characterized as
  noncharacters. Unicode 3.1 adds an additional 32 noncharacters. To clarify
  the status of all 66, a definition (page 41) is added, and conformance
  rules C5 and C10 (pages 38, 39) are amended as follows:

    D7b 	Noncharacter: a code point that is permanently reserved for
                internal use, and that should never be interchanged. In
                Unicode 3.1, these consist of the values U+nFFFE and U+nFFFF
                (where n is from 0 to 10_16) and the values U+FDD0..U+FDEF.

g_unicode_validate() should return false for all of these, in addition to
the surrogates.
Comment 1 Noah Levitt 2003-03-03 00:23:09 UTC
Created attachment 14740 [details] [review]
I think this fixes it.
Comment 2 Noah Levitt 2003-03-03 00:26:54 UTC
Created attachment 14741 [details] [review]
Sorry, use this one. Should have the parens around (Char)
Comment 3 Noah Levitt 2003-03-03 00:28:29 UTC
Created attachment 14742 [details] [review]
argh, that was the same as the first one. This one is fixed, promise.
Comment 4 Owen Taylor 2003-03-28 18:57:27 UTC
Actually, it's only 2 in 3.0 - the nFFFE/FFFF weren't
added until 3.0.1.

The patch looks OK to me to commit (both glib-2-2 and
HEAD.) 

Since performance here is actually quite important, 
I'll suggest one optimization

(a & 0xffff) != 0xfffe && (a & 0xffff) != 0xffff

Is the same as:

(a & 0xfffe) != 0xfffe.

(Hmmm, I guess the surrogate check could also
be done like that:

 ((Char) < 0xd800 || (Char) > 0xe000)

is, if I'm not mistaken, the same as:

 ((Char) & 0xfffff800) != 0xd800

You'd have to time that to see if it is 
a performance win or not.)
Comment 5 Matthias Clasen 2003-03-30 21:52:31 UTC
Committed to both branches with the changed surrogate check, which
seemed to be a small performance win. 
Comment 6 Noah Levitt 2003-05-20 23:34:54 UTC
*** Bug 109378 has been marked as a duplicate of this bug. ***