GNOME Bugzilla – Bug 107427
some invalid characters considered valid, in g_unichar_validate and elsewhere
Last modified: 2011-02-18 15:57:04 UTC
According to http://www.unicode.org/reports/tr27/
There are 34 specific code points in Unicode 3.0 that are characterized as
noncharacters. Unicode 3.1 adds an additional 32 noncharacters. To clarify
the status of all 66, a definition (page 41) is added, and conformance
rules C5 and C10 (pages 38, 39) are amended as follows:
D7b Noncharacter: a code point that is permanently reserved for
internal use, and that should never be interchanged. In
Unicode 3.1, these consist of the values U+nFFFE and U+nFFFF
(where n is from 0 to 10_16) and the values U+FDD0..U+FDEF.
g_unicode_validate() should return false for all of these, in addition to
Created attachment 14740 [details] [review]
I think this fixes it.
Created attachment 14741 [details] [review]
Sorry, use this one. Should have the parens around (Char)
Created attachment 14742 [details] [review]
argh, that was the same as the first one. This one is fixed, promise.
Actually, it's only 2 in 3.0 - the nFFFE/FFFF weren't
added until 3.0.1.
The patch looks OK to me to commit (both glib-2-2 and
Since performance here is actually quite important,
I'll suggest one optimization
(a & 0xffff) != 0xfffe && (a & 0xffff) != 0xffff
Is the same as:
(a & 0xfffe) != 0xfffe.
(Hmmm, I guess the surrogate check could also
be done like that:
((Char) < 0xd800 || (Char) > 0xe000)
is, if I'm not mistaken, the same as:
((Char) & 0xfffff800) != 0xd800
You'd have to time that to see if it is
a performance win or not.)
Committed to both branches with the changed surrogate check, which
seemed to be a small performance win.
*** Bug 109378 has been marked as a duplicate of this bug. ***