Bug 790391 – Found Korean Syllables Canonical Decomposition bug

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 790391 - Found Korean Syllables Canonical Decomposition bug


Summary:	Found Korean Syllables Canonical Decomposition bug


Status:	RESOLVED FIXED

Product:	gnome-characters
Classification:	Other
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	GNOME Characters maintainer(s)
QA Contact:	GNOME Characters maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2017-11-15 17:42 UTC by DaeHyun Sung
Modified:	2017-11-26 10:50 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH] Fixed Korean Hangul Syllables Canonical Decomposition bug on GNOME-characters (4.77 KB, patch) 2017-11-15 17:42 UTC, DaeHyun Sung	none	Details \| Review
present Korean Hangul Canonical Decomposition. It's bug. (36.92 KB, image/png) 2017-11-15 17:43 UTC, DaeHyun Sung		Details
Expected Korean Hangul Canonical Decomposition. (41.18 KB, image/png) 2017-11-15 17:43 UTC, DaeHyun Sung		Details
new patch (4.63 KB, patch) 2017-11-18 20:06 UTC, DaeHyun Sung	none	Details \| Review
Modified Fixed Korean Hangul Syllables Canonical Decomposition (3.85 KB, patch) 2017-11-19 08:42 UTC, DaeHyun Sung	none	Details \| Review
libgc: Perform full canonical decomposition for Hangul syllables (4.41 KB, patch) 2017-11-19 09:54 UTC, Daiki Ueno	committed	Details \| Review

Description DaeHyun Sung 2017-11-15 17:42:04 UTC

Created attachment 363718 [details] [review]
[PATCH] Fixed Korean Hangul Syllables Canonical Decomposition bug on  GNOME-characters

I found Korean Syllables Canonical Decomposition bug 
Not fully decompose Hangul Syllables.
Expected: U+D4DB → <U+1111, U+1171, U+11B6>  = Full canonical composition result.
Result: U+D4DB → <U+D4CC,U+11B6>  = intermediate step.

tracked the Bug, The base of this bug exists in GNU libunistring.
It's GNU libunistring Korean Hangul Syllables Canonical Decomposition bug.
It also depends on GNU libunistring.

The Hangul Decomposition Algorithm as specified above directly
decomposes precomposed Hangul syllable characters into a sequence of either two or three Hangul jamo characters.

I fixed GNU libunistring's Hangul Decomposition Algorithm  as known as Korean Alphabet Decomposition algorithm.

Check the documentation
The Unicode® Standard Version 10.0 – Core Specification
http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
3.12 Conjoining Jamo Behavior
Unicode® Standard Annex #15 - UNICODE NORMALIZATION FORMS
http://unicode.org/reports/tr15/


A detailed explanation will be written this weekend. 
I'll also send the libunistring bug to GNU libunistring committer about  Korean Canonical Decomposition bug.

Comment 1 DaeHyun Sung 2017-11-15 17:43:05 UTC

Created attachment 363721 [details]
present Korean Hangul Canonical Decomposition. It's bug.

present Korean Hangul Canonical Decomposition. It's bug.

Comment 2 DaeHyun Sung 2017-11-15 17:43:31 UTC

Created attachment 363723 [details]
Expected Korean Hangul Canonical Decomposition.

Expected Korean Hangul Canonical Decomposition.

Comment 3 DaeHyun Sung 2017-11-15 17:53:19 UTC

Hangul elements are commonly referred to as jamo(자모/字母), meaning “alphabet”

Korean has special term for the jamo that are used to construct hangul syllable, depending on where in the syllable they appear:
- Choseong(초성/初聲) for the initial sound, usually a consonant
- Jungseong(중성/中聲) for the middle sound, usually a vowel
- Jongseong(종성/終聲) for the final sound, usually a consonant

Hangul syllables are the characters that are used to express contemporary Korean texts in writing.

ex1) Decomposition of hangul syllable 
Unicode codepoint: U+AC00
Hangul(한글) ‘가’ 
jamo(자모/字母): ㄱ plus ㅏ
choseong(초성/初聲): ㄱ (codepoint: U+1100)
jungseong(중성/中聲): ㅏ(codepoint: U+1161)

Selected Hangul syllable ‘가’(U+AC00)
Present
Canonical decomposition: 
ㄱ U+1100 HANGUL CHOSEONG KIYEOK -> only shown
'ㅏ U+1161 HANGUL JUNGSEONG A' is hidden

Expected result
Canonical decomposition: 
ㄱ U+1100 HANGUL CHOSEONG KIYEOK 
ㅏ U+1161 HANGUL JUNGSEONG A

Hangul Choseong:ᄀ
Hangul Jungseong:ᅡ

ex2) Decomposition of hangul syllable 
Unicode code point: U+AC01
Hangul(한글) ‘각’
jamo(자모/字母):  ‘ᄀ’  plus ‘ᅡ’  plus ‘ᆨ’ 
choseong(초성/初聲):ㄱ (codepoint: U+1100)
jungseong(중성/中聲):ㅏ(codepoint: U+1161)
jongseong(종성/終聲):ᆨ (codepoint: U+11A8)


Selected Hangul syllable ‘각’(U+AC01)
Present  
Canonical decomposition: 
‘가 U+AC00 HANGUL SYLLABLE GA' only shown. but It's intermediate step. 
'ᆨ U+11A8 HANGUL JONGSEONG KIYEOK' is hidden

Expected Result
Canonical decomposition(Fully): 
ㄱ U+1100 HANGUL CHOSEONG KIYEOK 
ㅏ U+1161 HANGUL JUNGSEONG A 
ᆨ U+11A8 HANGUL JONGSEONG KIYEOK

Hangul Choseong:ᄀ
Hangul Jungseong:ᅡ
Hangul Jongseong:ᆨ


Reference
Unicode Normalization forms http://unicode.org/reports/tr15/
Unicode Normalization forms #14.1.4. Hangul Decomposition and Composition http://unicode.org/reports/tr15/#Hangul_Composition 
Hangul Jamo (Range: U+1100-U+11FF) http://www.unicode.org/charts/PDF/U1100.pdf 
Hangul Syllables (Range: U+AC00-U+D7AF) http://www.unicode.org/charts/PDF/UAC00.pdf

Comment 4 DaeHyun Sung 2017-11-18 19:15:03 UTC

I also reported the bug on GNU libunistring.

This is GNU libunistring bug report post.
Hello, My name is DaeHyun Sung(성대현,成大鉉).

I'm Korean and also, GNOME Foundation member in Korea.
My mother tongue is Korean Language.

I found a Korean Syllables canonical decomposition bug on GNU libunistring.

When I found a Korean Syllables canonical decomposition bug on GNONE characters, I also found GNU libunistring bug.
It depends on GNU libunistring.

libunistring/lib/uninorm/canonical-decomposition.c


 /* Hangul syllable.  See Unicode standard, chapter 3, section
         "Hangul Syllable Decomposition",  See also the clarification at
         <http://www.unicode.org/versions/Unicode5.1.0/>, section
         "Clarification of Hangul Jamo Handling".  */
#if 1 /* Return the pairwise decomposition, not the full decomposition.  */
          decomposition[0] = 0xAC00 + uc - t; /* = 0xAC00 + (l * 21 + v) * 28; */
          decomposition[1] = 0x11A7 + t;
          return 2;
#else  
          unsigned int v, l; 
          uc = uc / 28; 
          decomposition[1] = 0x1161 + v; 
          decomposition[2] = 0x11A7 + t; 
          return 3; 
#endif
 
 

I watched That source comment 'he clarification at  <http://www.unicode.org/versions/Unicode5.1.0/>, section "Clarification of Hangul Jamo Handling"'.
It's a misleading description of people who do not know Korean well.


I found Korean Syllables Canonical Decomposition bug Not fully decompose Hangul Syllables. 
Expected: U+D4DB → <U+1111, U+1171, U+11B6> = Full canonical composition result. correct! 
Result: U+D4DB → <U+D4CC,U+11B6> = only intermediate step. incorrect


If you check the Unicode Standard Version 10.0 - core specification, Chapter3.12. Conjoining Jamo Behavior
Hangul Decomposition. 
The Hangul Decomposition Algorithm as specified above directly decomposes precomposed Hangul syllable characters into a sequence of either two or three Hangul jamo characters. 
The Hangul Decomposition Algorithm could also be expressed equivalently as a recursion of binary decompositions, as is the case for other non-Hangul characters.
 All LVT syllables would decompose into an LV syllable plus a T jamo. 
The LV syllables themselves would in turn decompose into an L jamo plus a V jamo. 
This approach can be used to produce somewhat more compact code than what is illustrated in this sample method.

That code is not recursion of decompositions. So It can't fully decomposition of Hangul Syllables.
If you use that code, recursively use it the source code.
So, I suggest removing the source code part of #if 1. and use the source code part of #else.

That code(the source code part of #if 1) is not Korean hangul fully decomposition.


Korean Alphabet Hangul Canonical Decomposition Explain 
Hangul elements are commonly referred to as jamo(자모/字母), meaning “alphabet”

Korean has special term for the jamo that are used to construct hangul syllable, depending on where in the syllable they appear:
- Choseong(초성/初聲) for the initial sound, usually a consonant
- Jungseong(중성/中聲) for the middle sound, usually a vowel
- Jongseong(종성/終聲) for the final sound, usually a consonant

Hangul syllables are the characters that are used to express contemporary Korean texts in writing.

ex1) Decomposition of hangul syllable 
Unicode codepoint: U+AC00
Hangul(한글) ‘가’ 
jamo(자모/字母): ㄱ plus ㅏ
choseong(초성/初聲): ㄱ (codepoint: U+1100)
jungseong(중성/中聲): ㅏ(codepoint: U+1161)

Selected Hangul syllable ‘가’(U+AC00)
Present
Canonical decomposition: 
ㄱ U+1100 HANGUL CHOSEONG KIYEOK 
ㅏ U+1161 HANGUL JUNGSEONG A

Expected result
Canonical decomposition: 
ㄱ U+1100 HANGUL CHOSEONG KIYEOK 
ㅏ U+1161 HANGUL JUNGSEONG A

Hangul Choseong:ᄀ
Hangul Jungseong:ᅡ

ex2) Decomposition of hangul syllable 
Unicode code point: U+AC01
Hangul(한글) ‘각’
jamo(자모/字母):  ‘ᄀ’  plus ‘ᅡ’  plus ‘ᆨ’ 
choseong(초성/初聲):ㄱ (codepoint: U+1100)
jungseong(중성/中聲):ㅏ(codepoint: U+1161)
jongseong(종성/終聲):ᆨ (codepoint: U+11A8)


Selected Hangul syllable ‘각’(U+AC01)
Present  
Canonical decomposition: 
‘가 U+AC00 HANGUL SYLLABLE GA'   It's intermediate step. 
'ᆨ U+11A8 HANGUL JONGSEONG KIYEOK' 

Expected Result
Canonical decomposition(Fully): 
ㄱ U+1100 HANGUL CHOSEONG KIYEOK 
ㅏ U+1161 HANGUL JUNGSEONG A 
ᆨ U+11A8 HANGUL JONGSEONG KIYEOK

Hangul Choseong:ᄀ
Hangul Jungseong:ᅡ
Hangul Jongseong:ᆨ

---


I attached diff files on mail.

canonical-decomposition.c.diff -> libunistring/lib/uninorm/canonical-decomposition.c
test-canonical-decomposition.c.diff -> libunistring/tests/uninorm/test-canonical-decomposition.c
 
Also checked Hangul decomposition of GNOME and KDE 
GNOME gucharmap, my suggestion: https://bugzilla.gnome.org/show_bug.cgi?id=777829 
GNOME gucharmap's Korean Hangul decomposition source code https://github.com/GNOME/gucharmap/blob/master/gucharmap/gucharmap-unicode-info.c

else if (wc >= 0xac00 && wc <= 0xd7af) 
{ 
    /* compute hangul syllable name as per UAX #15 */ 
    gint SIndex = wc - SBase; 
    gint LIndex, VIndex, TIndex; 
    if (SIndex < 0 || SIndex >= SCount) 
        return ""; 
    LIndex = SIndex / NCount; 
    VIndex = (SIndex % NCount) / TCount; 
    TIndex = SIndex % TCount; 
    g_snprintf (buf, sizeof (buf), "HANGUL SYLLABLE %s%s%s", JAMO_L_TABLE[LIndex], JAMO_V_TABLE[VIndex], JAMO_T_TABLE[TIndex]); 
    return buf; 
}


KDE kwidgetsaddons, kcharselect: https://git.reviewboard.kde.org/r/129943/diff/1#index_header


Check the documentation 
The Unicode® Standard Version 10.0 – Core Specification 
http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf 
3.12 Conjoining Jamo Behavior 

Unicode® Standard Annex #15 - UNICODE NORMALIZATION FORMS
http://unicode.org/reports/tr15/ 

Unicode Normalization forms http://unicode.org/reports/tr15/ 
Unicode Normalization forms #14.1.4. Hangul Decomposition and Composition http://unicode.org/reports/tr15/#
Hangul_Composition Hangul Jamo (Range: U+1100-U+11FF) http://www.unicode.org/charts/PDF/U1100.pdf 
Hangul Syllables (Range: U+AC00-U+D7AF) http://www.unicode.org/charts/PDF/UAC00.pdf 

Please, check the mail, ASAP!

Thanks!


Sincerely,
DaeHyun Sung(성대현,成大鉉)

Comment 5 DaeHyun Sung 2017-11-18 19:50:59 UTC

Comment on attachment 363718 [details] [review]
[PATCH] Fixed Korean Hangul Syllables Canonical Decomposition bug on  GNOME-characters

>From 9d06c21c687c09336d3daf9814f0eadfc31e6868 Mon Sep 17 00:00:00 2001
>From: DaeHyun Sung <sungdh86+git@gmail.com>
>Date: Thu, 16 Nov 2017 01:57:05 +0900
>Subject: [PATCH] Fixed Korean Hangul Syllables Canonical Decomposition bug on
> GNOME-characters
>MIME-Version: 1.0
>Content-Type: text/plain; charset=UTF-8
>Content-Transfer-Encoding: 8bit
>
>Not fully decompose Hangul Syllables.
>Expected: U+D4DB â <U+1111, U+1171, U+11B6>  = Full canonical composition result.
>Result: U+D4DB â <U+D4CC,U+11B6>  = intermediate step.
>
>tracked the Bug, The base of this bug exists in GNU libunistring.
>It's GNU libunistring Korean Hangul Syllables Canonical Decomposition bug.
>It also depends on GNU libunistring.
>
>The Hangul Decomposition Algorithm as specified above directly
>decomposes precomposed Hangul syllable characters into a sequence of either two or three Hangul jamo characters.
>
>I fixed GNU libunistring's Hangul Decomposition Algorithm  as known as Korean Alphabet Decomposition algorithm.
>
>Check the documentation
>The UnicodeÂ® Standard Version 10.0 â Core Specification
>http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
>3.12 Conjoining Jamo Behavior
>UnicodeÂ® Standard Annex #15 - UNICODE NORMALIZATION FORMS
>http://unicode.org/reports/tr15/
>---
> gllib/uninorm/canonical-decomposition.c | 11 ++---------
> lib/gc.c                                |  9 +++++++--
> src/window.js                           |  3 ++-
> 3 files changed, 11 insertions(+), 12 deletions(-)
>
>diff --git a/gllib/uninorm/canonical-decomposition.c b/gllib/uninorm/canonical-decomposition.c
>index dfeea71..3862636 100644
>--- a/gllib/uninorm/canonical-decomposition.c
>+++ b/gllib/uninorm/canonical-decomposition.c
>@@ -1,6 +1,7 @@
> /* Canonical decomposition of Unicode characters.
>    Copyright (C) 2009-2017 Free Software Foundation, Inc.
>    Written by Bruno Haible <bruno@clisp.org>, 2009.
>+   Modified by DaeHyun Sung <sungdh86@gmail.com>, 2017.
> 
>    This program is free software: you can redistribute it and/or modify it
>    under the terms of the GNU General Public License as published
>@@ -30,9 +31,7 @@ uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)
>   if (uc >= 0xAC00 && uc < 0xD7A4)
>     {
>       /* Hangul syllable.  See Unicode standard, chapter 3, section
>-         "Hangul Syllable Decomposition",  See also the clarification at
>-         <http://www.unicode.org/versions/Unicode5.1.0/>, section
>-         "Clarification of Hangul Jamo Handling".  */
>+         "Hangul Syllable Decomposition"*/
>       unsigned int t;
> 
>       uc -= 0xAC00;
>@@ -52,11 +51,6 @@ uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)
>         }
>       else
>         {
>-#if 1 /* Return the pairwise decomposition, not the full decomposition.  */
>-          decomposition[0] = 0xAC00 + uc - t; /* = 0xAC00 + (l * 21 + v) * 28; */
>-          decomposition[1] = 0x11A7 + t;
>-          return 2;
>-#else
>           unsigned int v, l;
> 
>           uc = uc / 28;
>@@ -67,7 +61,6 @@ uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)
>           decomposition[1] = 0x1161 + v;
>           decomposition[2] = 0x11A7 + t;
>           return 3;
>-#endif
>         }
>     }
>   else if (uc < 0x110000)
>diff --git a/lib/gc.c b/lib/gc.c
>index 46bb0df..e4992cc 100644
>--- a/lib/gc.c
>+++ b/lib/gc.c
>@@ -851,10 +851,15 @@ populate_related_characters (GcCharacterIter *iter)
>           decomposition_base = decomposition[0];
>           if (decomposition_base != iter->uc)
>             g_array_append_val (result, decomposition_base);
>-        }
+	  decomposition_base = decomposition[1];
+          if (decomposition_base != iter->uc)
+            g_array_append_val (result, decomposition_base);
+          decomposition_base = decomposition[2];
+          if (decomposition_base != iter->uc)
+            g_array_append_val (result, decomposition_base);
+	}
>       else
>         decomposition_base = iter->uc;
>-
>       script = uc_script (iter->uc);
>       if (script)
>         {
>diff --git a/src/window.js b/src/window.js
>index 10c51e0..a9a7cb3 100644
>--- a/src/window.js
>+++ b/src/window.js
>@@ -193,7 +193,8 @@ var MainWindow = new Lang.Class({
>             { artists: [ 'Allan Day <allanpday@gmail.com>',
>                          'Jakub Steiner <jimmac@gmail.com>' ],
>               authors: [ 'Daiki Ueno <dueno@src.gnome.org>',
>-                         'Giovanni Campagna <scampa.giovanni@gmail.com>' ],
>+                         'Giovanni Campagna <scampa.giovanni@gmail.com>',
>+                         'DaeHyun Sung <sungdh86@gmail.com>' ],
>               // TRANSLATORS: put your names here, one name per line.
>               translator_credits: _("translator-credits"),
>               program_name: _("GNOME Characters"),
>-- 
>2.14.3
>

Comment 6 DaeHyun Sung 2017-11-18 19:53:29 UTC

Comment on attachment 363718 [details] [review]
[PATCH] Fixed Korean Hangul Syllables Canonical Decomposition bug on  GNOME-characters

>From 9d06c21c687c09336d3daf9814f0eadfc31e6868 Mon Sep 17 00:00:00 2001
>From: DaeHyun Sung <sungdh86+git@gmail.com>
>Date: Thu, 16 Nov 2017 01:57:05 +0900
>Subject: [PATCH] Fixed Korean Hangul Syllables Canonical Decomposition bug on
> GNOME-characters
>MIME-Version: 1.0
>Content-Type: text/plain; charset=UTF-8
>Content-Transfer-Encoding: 8bit
>
>Not fully decompose Hangul Syllables.
>Expected: U+D4DB â <U+1111, U+1171, U+11B6>  = Full canonical composition result.
>Result: U+D4DB â <U+D4CC,U+11B6>  = intermediate step.
>
>tracked the Bug, The base of this bug exists in GNU libunistring.
>It's GNU libunistring Korean Hangul Syllables Canonical Decomposition bug.
>It also depends on GNU libunistring.
>
>The Hangul Decomposition Algorithm as specified above directly
>decomposes precomposed Hangul syllable characters into a sequence of either two or three Hangul jamo characters.
>
>I fixed GNU libunistring's Hangul Decomposition Algorithm  as known as Korean Alphabet Decomposition algorithm.
>
>Check the documentation
>The UnicodeÂ® Standard Version 10.0 â Core Specification
>http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
>3.12 Conjoining Jamo Behavior
>UnicodeÂ® Standard Annex #15 - UNICODE NORMALIZATION FORMS
>http://unicode.org/reports/tr15/
>---
> gllib/uninorm/canonical-decomposition.c | 11 ++---------
> lib/gc.c                                |  9 +++++++--
> src/window.js                           |  3 ++-
> 3 files changed, 11 insertions(+), 12 deletions(-)
>
>diff --git a/gllib/uninorm/canonical-decomposition.c b/gllib/uninorm/canonical-decomposition.c
>index dfeea71..3862636 100644
>--- a/gllib/uninorm/canonical-decomposition.c
>+++ b/gllib/uninorm/canonical-decomposition.c
>@@ -1,6 +1,7 @@
> /* Canonical decomposition of Unicode characters.
>    Copyright (C) 2009-2017 Free Software Foundation, Inc.
>    Written by Bruno Haible <bruno@clisp.org>, 2009.
>+   Modified by DaeHyun Sung <sungdh86@gmail.com>, 2017.
> 
>    This program is free software: you can redistribute it and/or modify it
>    under the terms of the GNU General Public License as published
>@@ -30,9 +31,7 @@ uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)
>   if (uc >= 0xAC00 && uc < 0xD7A4)
>     {
>       /* Hangul syllable.  See Unicode standard, chapter 3, section
>-         "Hangul Syllable Decomposition",  See also the clarification at
>-         <http://www.unicode.org/versions/Unicode5.1.0/>, section
>-         "Clarification of Hangul Jamo Handling".  */
>+         "Hangul Syllable Decomposition"*/
>       unsigned int t;
> 
>       uc -= 0xAC00;
>@@ -52,11 +51,6 @@ uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)
>         }
>       else
>         {
>-#if 1 /* Return the pairwise decomposition, not the full decomposition.  */
>-          decomposition[0] = 0xAC00 + uc - t; /* = 0xAC00 + (l * 21 + v) * 28; */
>-          decomposition[1] = 0x11A7 + t;
>-          return 2;
>-#else
>           unsigned int v, l;
> 
>           uc = uc / 28;
>@@ -67,7 +61,6 @@ uc_canonical_decomposition (ucs4_t uc, ucs4_t *decomposition)
>           decomposition[1] = 0x1161 + v;
>           decomposition[2] = 0x11A7 + t;
>           return 3;
>-#endif
>         }
>     }
>   else if (uc < 0x110000)
>diff --git a/lib/gc.c b/lib/gc.c
>index 46bb0df..e4992cc 100644
>--- a/lib/gc.c
>+++ b/lib/gc.c
>@@ -851,10 +851,15 @@ populate_related_characters (GcCharacterIter *iter)
>           decomposition_base = decomposition[0];
>           if (decomposition_base != iter->uc)
>             g_array_append_val (result, decomposition_base);
>-        }
+	  decomposition_base = decomposition[1];
+          if (decomposition_base != iter->uc)
+            g_array_append_val (result, decomposition_base);
+          decomposition_base = decomposition[2];
+          if (decomposition_base != iter->uc)
+            g_array_append_val (result, decomposition_base);
+	}
>       else
>         decomposition_base = iter->uc;
>-
>       script = uc_script (iter->uc);
>       if (script)
>         {
>diff --git a/src/window.js b/src/window.js
>index 10c51e0..a9a7cb3 100644
>--- a/src/window.js
>+++ b/src/window.js
>@@ -193,7 +193,8 @@ var MainWindow = new Lang.Class({
>             { artists: [ 'Allan Day <allanpday@gmail.com>',
>                          'Jakub Steiner <jimmac@gmail.com>' ],
>               authors: [ 'Daiki Ueno <dueno@src.gnome.org>',
>-                         'Giovanni Campagna <scampa.giovanni@gmail.com>' ],
>+                         'Giovanni Campagna <scampa.giovanni@gmail.com>',
>+                         'DaeHyun Sung <sungdh86@gmail.com>' ],
>               // TRANSLATORS: put your names here, one name per line.
>               translator_credits: _("translator-credits"),
>               program_name: _("GNOME Characters"),
>-- 
>2.14.3
>

Comment 7 DaeHyun Sung 2017-11-18 20:06:21 UTC

Created attachment 363987 [details] [review]
new patch

edited patch file

Comment 8 Daiki Ueno 2017-11-18 20:21:50 UTC

Review of attachment 363987 [details] [review]:

Thank you for the patches, but please use the Bugzilla patch status properly ("committed" means that the patch has already been pushed to the git repository, but this is not the case).

::: lib/gc.c
@@ +852,3 @@
           if (decomposition_base != iter->uc)
             g_array_append_val (result, decomposition_base);
+	   decomposition_base = decomposition[1];

I have a couple of questions:

- Why did you remove the check of decomposition_length, from the previous patch?  Couldn't it lead to unbound array access?

- Now decomposition_base always points to the last character of a composed character; what if the character is a Latin composed character, e.g. á?

For the latter, I would suggest to special case Hangul characters, since the current code assumes "base character + modifiers".

Comment 9 DaeHyun Sung 2017-11-18 20:48:26 UTC

Answer two questions.

Q: - Why did you remove the check of decomposition_length, from the previous patch?  Couldn't it lead to unbound array access?

A: Because, When I deleted check of decomposition_length, app is not creaked. 
But, I checked your message, I made a mistake about unbound array access. 


Q: - Now decomposition_base always points to the last character of a composed character; what if the character is a Latin composed character, e.g. á?
A: I have not considered Latin letters. because, I'm Korean and I don't know about some latin composed characters.
Maybe Special case "Hangul characters" I think it should be implemented separately.

Comment 10 DaeHyun Sung 2017-11-19 08:42:28 UTC

Created attachment 363998 [details] [review]
Modified Fixed Korean Hangul Syllables Canonical Decomposition

Yesterday, I submitted GNU libunistring's Korean canonical composition bug report.

Today Morning, I got a mail from the GNU libunistring committer "Bruno Haible" .

I agree with GNU libunistring committer "Bruno Haible"'s opinion.

 http://git.savannah.gnu.org/gitweb/?p=libunistring.git;a=commitdiff;h=4e49b798264d01433f64137fb525f507778fb781

I refer to "Bruno Haible"'s opinion,
I modified Korean Hangul Sylables canonical decomposition on GNOME characters. It has implemented Separately, example, Special case "Hangul characters" and the  others

Please, Check my source ASAP!

Thanks, committer's opinion!

Comment 11 Daiki Ueno 2017-11-19 09:54:56 UTC

Created attachment 363999 [details] [review]
libgc: Perform full canonical decomposition for Hangul syllables

Previously, the code finding related characters only took into account
of composed characters built from a base character and combining
characters.  However, Hangul syllables are composed of two or three
Hangul jamo characters, all of which should be considered as a base
character.

For the implementation, uc_canonical_decomposition() is not capable of
decomposing Hangul syllables.  Instead of the function, this patch
uses u32_normalize() with UNINORM_NFD, as suggested by Bruno Haible in:
https://lists.gnu.org/archive/html/bug-libunistring/2017-11/msg00002.html
--
I have slightly modified your patch based on Bruno's suggestion.  Would it make sense for you?

Comment 12 Daiki Ueno 2017-11-22 06:18:26 UTC

Assuming silence means no objection, I am going to push it soon.

Comment 13 Daiki Ueno 2017-11-22 06:24:17 UTC

Attachment 363999 [details] pushed as 70e5e05 - libgc: Perform full canonical decomposition for Hangul syllables

Comment 14 DaeHyun Sung 2017-11-26 10:50:23 UTC

Hmm, Meanwhile, due to overworking at my working, I checked that messages lately.

Changed the libunistring  and change the GNOME charactes, 
It's make sense for me.

I checked GNU libunistring patch based on Bruno' suggestion.

And I read CJKV Information Processing, written by Ken Lunde.
CJKV Information Processing P.170
"More details about how Normalization of hangul syllables is handled, including some useful historical information, ca be found online.

The complexity of Normalization is clearly beyond the scope of this book, and I encourage you to explore Unicode resources if what is presented here does not satisfy your needs."

This link shown by CJKV Information Processing, 2nd Edition.
Hangul Conjoining Jamo Rendering http://www.i18nl10n.com/korean/jamo.html
http://www.unicode.org/charts/normalization/

I ran with change the code, It was confirmed that the expected results.

Thanks!