GNOME Bugzilla – Bug 123156
Unicode NFKC normalization doesn't do algorithmic hangul composition
Last modified: 2004-12-22 21:47:04 UTC
Hello, GNU Libidn uses code from GLIB to implement Unicode NFKC normalization. During interoperability testing, it was discovered that the NFKC operation may have a problem. I have reproduced the problem in GLIB standalone as well. The problem is that Hangul composition is not performed. See: http://www.unicode.org/reports/tr15/tr15-22.html#Hangul for the details how to implement this. Unicode Inc. actually has test vectors for Unicode normalization, I think it may be useful to test GLIB against it. I suspect this would have catched this problem. See: http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt A simple test case is the following code snippet. If I can be of any assistance in tracking down or analyzing this problem, let me know, I'm currently working on a fix, but thought I should let you know ASAP. #include <stdio.h> #include <glib.h> /* Unicode NFKC of algorithmic Hangul composition, by Simon Josefsson */ int main () { const gchar *in = "\xef\xbf\xbd\xc3\xad\x64\x6e"; gchar *out; size_t i; printf("in: "); for (i = 0; in[i]; i++) printf("%02x ", in[i] & 0xFF); printf("\n"); out = g_utf8_normalize (in, strlen(in), G_NORMALIZE_NFKC); /* Should (?) result in \x2e\xea\b0\x81. */ printf("out: "); for (i = 0; out[i]; i++) printf("%02x ", out[i] & 0xFF); printf("\n"); return 0; }
*** This bug has been marked as a duplicate of 100456 ***
There's a patch attached to bug 100456.