GNOME Bugzilla – Bug 694669
consider unicode corrigendum #9
Last modified: 2013-11-04 10:40:01 UTC
Created attachment 237356 [details] [review] patch From gutf8.c: /* * Check whether a Unicode (5.2) char is in a valid range. * * The first check comes from the Unicode guarantee to never encode * a point above 0x0010ffff, since UTF-16 couldn't represent it. * * The second check covers surrogate pairs (category Cs). * * The last two checks cover "Noncharacter": defined as: * "A code point that is permanently reserved for * internal use, and that should never be interchanged. In * Unicode 3.1, these consist of the values U+nFFFE and U+nFFFF * (where n is from 0 to 10_16) and the values U+FDD0..U+FDEF." * * @param Char the character */ #define UNICODE_VALID(Char) \ ((Char) < 0x110000 && \ (((Char) & 0xFFFFF800) != 0xD800) && \ ((Char) < 0xFDD0 || (Char) > 0xFDEF) && \ ((Char) & 0xFFFE) != 0xFFFE) Unicode Corrigendum #9 [http://www.unicode.org/versions/corrigendum9.html] strikes the "and that should never be interchanged" clause, so IMHO we should update this code to allow the noncharacters through.
Review of attachment 237356 [details] [review]: seems right
Pushed to master.
This regresses the test suite: /utf8/validate/29: ** GLib:ERROR:utf8-validate.c:285:do_test: assertion failed: (result == test->valid)
After reading the corrigendum, it is utterly clear that the unexpected passing of this testcase is the entire point of the change. I'll update the test.
Created attachment 238709 [details] [review] tests: clean up for Unicode corrigendum #9 Unicode corrigendum #9 spells out in no uncertain terms that on conversion interfaces we should not reject characters like U+FFFE and U+FFFF which we were doing before. Commit f91ef4ef15d220f6899c97aaf5b1c0a8f68cfe9a started accepting these characters, but we had some testcases that were checking that strings containing these characters should be rejected. Update the tests.
Comment on attachment 238709 [details] [review] tests: clean up for Unicode corrigendum #9 Looks good to me, thanks for catching the problem. I only ran the 'unicode' test (computer trouble).
Attachment 238709 [details] pushed as e359bc0 - tests: clean up for Unicode corrigendum #9
Created attachment 239212 [details] [review] tests: unicode-encoding: Update for unicode corrigendum #9
*** Bug 690531 has been marked as a duplicate of this bug. ***