After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 123156 - Unicode NFKC normalization doesn't do algorithmic hangul composition
Unicode NFKC normalization doesn't do algorithmic hangul composition
Status: RESOLVED DUPLICATE of bug 100456
Product: glib
Classification: Platform
Component: general
2.0.x
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2003-09-25 00:18 UTC by Simon Josefsson
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Simon Josefsson 2003-09-25 00:18:12 UTC
Hello,

GNU Libidn uses code from GLIB to implement Unicode NFKC normalization. 
During interoperability testing, it was discovered that the NFKC operation
may have a problem.  I have reproduced the problem in GLIB standalone as
well.  The problem is that Hangul composition is not performed.  See:

http://www.unicode.org/reports/tr15/tr15-22.html#Hangul

for the details how to implement this.  Unicode Inc. actually has test
vectors for Unicode normalization, I think it may be useful to test GLIB
against it.  I suspect this would have catched this problem.  See:

http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt

A simple test case is the following code snippet.

If I can be of any assistance in tracking down or analyzing this problem,
let me know, I'm currently working on a fix, but thought I should let you
know ASAP.

#include <stdio.h>
#include <glib.h>

/* Unicode NFKC of algorithmic Hangul composition, by Simon Josefsson */
int main ()
{
  const gchar *in = "\xef\xbf\xbd\xc3\xad\x64\x6e";
  gchar *out;
  size_t i;

  printf("in: ");
  for (i = 0; in[i]; i++)
    printf("%02x ", in[i] & 0xFF);
  printf("\n");

  out = g_utf8_normalize (in, strlen(in), G_NORMALIZE_NFKC);

  /* Should (?) result in \x2e\xea\b0\x81. */

  printf("out: ");
  for (i = 0; out[i]; i++)
    printf("%02x ", out[i] & 0xFF);
  printf("\n");

  return 0;
}
Comment 1 Owen Taylor 2003-09-25 02:14:17 UTC

*** This bug has been marked as a duplicate of 100456 ***
Comment 2 Owen Taylor 2003-09-25 02:14:56 UTC
There's a patch attached to bug 100456.