After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 694669 - consider unicode corrigendum #9
consider unicode corrigendum #9
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 690531 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2013-02-25 13:49 UTC by Christian Persch
Modified: 2013-11-04 10:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (1.92 KB, patch)
2013-02-25 13:49 UTC, Christian Persch
accepted-commit_now Details | Review
tests: clean up for Unicode corrigendum #9 (1.84 KB, patch)
2013-03-12 16:39 UTC, Allison Karlitskaya (desrt)
committed Details | Review
tests: unicode-encoding: Update for unicode corrigendum #9 (807 bytes, patch)
2013-03-18 22:23 UTC, Christian Persch
none Details | Review

Description Christian Persch 2013-02-25 13:49:37 UTC
Created attachment 237356 [details] [review]
patch

From gutf8.c:

/*
 * Check whether a Unicode (5.2) char is in a valid range.
 *
 * The first check comes from the Unicode guarantee to never encode
 * a point above 0x0010ffff, since UTF-16 couldn't represent it.
 * 
 * The second check covers surrogate pairs (category Cs).
 * 
 * The last two checks cover "Noncharacter": defined as:
 *   "A code point that is permanently reserved for
 *    internal use, and that should never be interchanged. In
 *    Unicode 3.1, these consist of the values U+nFFFE and U+nFFFF
 *    (where n is from 0 to 10_16) and the values U+FDD0..U+FDEF."
 *
 * @param Char the character
 */
#define UNICODE_VALID(Char)                   \
    ((Char) < 0x110000 &&                     \
     (((Char) & 0xFFFFF800) != 0xD800) &&     \
     ((Char) < 0xFDD0 || (Char) > 0xFDEF) &&  \
     ((Char) & 0xFFFE) != 0xFFFE)
   

Unicode Corrigendum #9 [http://www.unicode.org/versions/corrigendum9.html] strikes the "and that should never be interchanged" clause, so IMHO we should update this code to allow the noncharacters through.
Comment 1 Matthias Clasen 2013-03-02 01:56:10 UTC
Review of attachment 237356 [details] [review]:

seems right
Comment 2 Christian Persch 2013-03-05 16:28:37 UTC
Pushed to master.
Comment 3 Allison Karlitskaya (desrt) 2013-03-12 16:00:37 UTC
This regresses the test suite:

/utf8/validate/29: **
GLib:ERROR:utf8-validate.c:285:do_test: assertion failed: (result == test->valid)
Comment 4 Allison Karlitskaya (desrt) 2013-03-12 16:22:15 UTC
After reading the corrigendum, it is utterly clear that the unexpected passing of this testcase is the entire point of the change.  I'll update the test.
Comment 5 Allison Karlitskaya (desrt) 2013-03-12 16:39:50 UTC
Created attachment 238709 [details] [review]
tests: clean up for Unicode corrigendum #9

Unicode corrigendum #9 spells out in no uncertain terms that on
conversion interfaces we should not reject characters like U+FFFE and
U+FFFF which we were doing before.

Commit f91ef4ef15d220f6899c97aaf5b1c0a8f68cfe9a started accepting these
characters, but we had some testcases that were checking that strings
containing these characters should be rejected.

Update the tests.
Comment 6 Christian Persch 2013-03-12 16:45:49 UTC
Comment on attachment 238709 [details] [review]
tests: clean up for Unicode corrigendum #9

Looks good to me, thanks for catching the problem. I only ran the 'unicode' test (computer trouble).
Comment 7 Allison Karlitskaya (desrt) 2013-03-12 16:47:24 UTC
Attachment 238709 [details] pushed as e359bc0 - tests: clean up for Unicode corrigendum #9
Comment 8 Christian Persch 2013-03-18 22:23:30 UTC
Created attachment 239212 [details] [review]
tests: unicode-encoding: Update for unicode corrigendum #9
Comment 9 Behdad Esfahbod 2013-11-04 10:40:01 UTC
*** Bug 690531 has been marked as a duplicate of this bug. ***