Bug 95569 – Pango Hangul shaper crashes when thrown in a generic sequence of Hangul Conjoining Jamos

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 95569 - Pango Hangul shaper crashes when thrown in a generic sequence of Hangul Conjoining Jamos


Summary:	Pango Hangul shaper crashes when thrown in a generic sequence of Hangul Conjo...


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	hangul
Version:	unspecified
Hardware:	Other other

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Changwoo Ryu
QA Contact:	Owen Taylor

URL:
Whiteboard:

Duplicates:	109699 (view as bug list)
Depends on:
Blocks:

Reported:	2002-10-11 23:05 UTC by Jungshik Shin
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
a simple patch (1.65 KB, patch) 2002-10-13 05:18 UTC, Jungshik Shin	none	Details \| Review
patch v2 (9.21 KB, patch) 2002-10-13 11:38 UTC, Jungshik Shin	none	Details \| Review
a new patch only fixing memory corruption (+ a little @) (1.92 KB, patch) 2002-10-13 14:10 UTC, Jungshik Shin	none	Details \| Review
a bare-bone patch (1.36 KB, patch) 2002-10-15 09:00 UTC, Jungshik Shin	none	Details \| Review

Description Jungshik Shin 2002-10-11 23:05:06 UTC

Package: pango
Severity: normal
Version: 2.0.2
Synopsis: Pango Hangul shaper crashes when thrown in a generic sequence of Hangul Conjoining Jamos
Bugzilla-Product: pango
Bugzilla-Component: general
BugBuddy-GnomeVersion: 2.0 (2.0.3)

Description:
Description of Problem:
When pre-1933 orthography Korean text is copy'n'pasted into
gedit, gedit crashes. 



Steps to reproduce the problem:
1. Run gedit under ko_KR.UTF-8 locale
2. Launch Mozilla and point it at
http://jshin.net/i18n/korean/hunmin.html
3. Copy the first paragraph of the main text in pre-1933 orthography
Korean and paste it into gedit

Actual Results:
gedit crashes

Expected Results:
gedit should render it as rendered by Mozilla (the screenshot at
the URL above) or Kwrite

How often does this happen?
Always


Additional Information:

hangul_shaper module in Pango needs to be improved to support
as generic a form of Hangul Jamo sequence as possible
(i.e. up to 3 L's + up to 3 V's + optional upto 3 T's).




Debugging Information:

Backtrace was generated from '/usr/bin/gedit'

(no debugging symbols found)...[New Thread 8192 (LWP 23076)]
0x420ae169 in wait4 () from /lib/i686/libc.so.6

+ Trace 28679

Thread 1 (Thread 8192 (LWP 23076))

#0 wait4
from /lib/i686/libc.so.6
#1 __DTOR_END__
from /lib/i686/libc.so.6
#2 waitpid
from /lib/i686/libpthread.so.0
#3 libgnomeui_module_info_get
from /usr/lib/libgnomeui-2.so.0
#4 __pthread_sighandler
from /lib/i686/libpthread.so.0
#5 <signal handler called>
#6 render_syllable
from /usr/lib/pango/1.1.0/modules/pango-hangul-xft.so
#7 hangul_engine_shape
from /usr/lib/pango/1.1.0/modules/pango-hangul-xft.so
#8 pango_shape
from /usr/lib/libpango-1.0.so.0
#9 process_item
from /usr/lib/libpango-1.0.so.0
#10 process_line
from /usr/lib/libpango-1.0.so.0
#11 pango_layout_check_lines
from /usr/lib/libpango-1.0.so.0
#12 pango_layout_get_extents_internal
from /usr/lib/libpango-1.0.so.0
#13 pango_layout_get_extents
from /usr/lib/libpango-1.0.so.0
#14 gtk_text_layout_get_line_display
from /usr/lib/libgtk-x11-2.0.so.0
#15 gtk_text_layout_real_wrap
from /usr/lib/libgtk-x11-2.0.so.0
#16 gtk_text_layout_wrap
from /usr/lib/libgtk-x11-2.0.so.0
#17 _gtk_text_btree_validate_line
from /usr/lib/libgtk-x11-2.0.so.0
#18 gtk_text_layout_validate_yrange
from /usr/lib/libgtk-x11-2.0.so.0
#19 gtk_text_view_validate_onscreen
from /usr/lib/libgtk-x11-2.0.so.0
#20 gtk_text_view_flush_first_validate
from /usr/lib/libgtk-x11-2.0.so.0
#21 first_validate_callback
from /usr/lib/libgtk-x11-2.0.so.0
#22 g_idle_dispatch
from /usr/lib/libglib-2.0.so.0
#23 g_main_dispatch
from /usr/lib/libglib-2.0.so.0
#24 g_main_context_dispatch
from /usr/lib/libglib-2.0.so.0
#25 g_main_context_iterate
from /usr/lib/libglib-2.0.so.0
#26 g_main_loop_run
from /usr/lib/libglib-2.0.so.0
#27 gtk_main
from /usr/lib/libgtk-x11-2.0.so.0
#28 main
#29 __libc_start_main
from /lib/i686/libc.so.6
#0 wait4
from /lib/i686/libc.so.6





------- Bug moved to this database by unknown@bugzilla.gnome.org 2002-10-11 19:05 -------

The original reporter (jshin@mailaps.org) of this bug does not have an account here.
Reassigning to the exporter, unknown@bugzilla.gnome.org.
Reassigning to the default owner of the component, otaylor@redhat.com.

Comment 1 Jungshik Shin 2002-10-11 23:35:06 UTC

By a generic sequence of Hangul Jamos, I meant 

S := L+ V+ T*

where L,V and T denote leading consonant, vowel
and trailing consonant, respectively. Currently,
hangul_shaper() in Pango can take care of 
'LVT?' sequence, but it can't take care of
'L+V+T*' sequence as specified in Unicode 3.0
(section 3.10?) and as required for the full
support of pre-1933 orthography Korean text.
(see http://jshin.net/i18n/uyeo.html)

Comment 2 Jungshik Shin 2002-10-11 23:43:23 UTC

FYI, UCM can be used to enter a sequence of Hangul Jamos
 instead
of cut'n'pasting from the existing document. Alternatively,
vim (with keymap defined for Hangul Jamo) under xterm
or Yudit can be used. 
Sequences like 'U+1101 U+116E U+1167 U+11BB' 
(mentioned in http://jshin.net/i18n/uyeo.html)
always lead to a crash.

Comment 3 Changwoo Ryu 2002-10-12 06:36:26 UTC

The Hangul shaper supports the generic sequences.  Just it lacks some
testing.  :-)

Try the patch in #86591.  For your convenience:

http://bugzilla.gnome.org/showattachment.cgi?attach_id=9478

This adds Hangul Jamo shaper for Xft, but it also has some bug fixes.
It's applied to the HEAD and maybe 2.1.0, but not to 2.0.x branch.

Someone could make the bug-fixing-only patch from it.

Comment 4 Jungshik Shin 2002-10-13 05:18:38 UTC

Created attachment 11517 [details] [review]
a simple patch

Comment 5 Jungshik Shin 2002-10-13 05:42:15 UTC

What attachment 11517 [details] [review] does are :

 - correct the memory 'corruption' in hangul_shape_engine()
   when max_jamos is reached and jamos=jamos_static,
   a new chunk of memory is allocated by g_new, but the content of
   the buffer is not copied to the new place. 

 - increase the default buffer size for jamos_static[] to 9
   for up to 3 L's , 3 V's and 3 T's. jamos_static[] is not
   in heap so that 6 more is cheap.

 - replace a sequence of LVT and LV with a glyph for
   a precomposed Hangul syllable ONLY when
   the length of a given sequence is *equal to* (as opposed
   to 'equal to and greater than') 3 and 2, respectively.
   
    The rationale behind that is that even though LVT and LV
    form a syllable valid in modern Korean (i.e. in U+AC00
    Hangul precomposed block), both can be followed by
    an extra T or V. An example is given in my previous
    comment and http://jshin.net/i18n/uyeo.html. 
    Given a sequence of 'U+1101 U+116E U+1167 U+11BB',
    Pango should not replace 'U+1101 U+116E' with 
    'U+AFB8' but must treat the whole sequence as a single
    syllable.  PARK Won-kyu's BDF fonts(for simple overstrking
    of Jamos) and James Kass's
    CODE2000 can render it in a more or less legible way.

    When OTF with gsub and other OT tables for Hangul Conjoining
    Jamos become widely available (currently, the only source
    of those fonts are Korean version of MS Office XP), 
    this issue(rendering 'L+V+T*' sequence ) has to be revisited.

    In the meantime, I think this bug can be closed after applying
    my patch.

    Another possibility is to make Pango do what Yudit 
    and Lambda do with Ogulim/Obatang/Ogunseo and
    fonts. These fonts 
    are distributed in Korean MS Office 2000 and
    Ogulim is also available as 'Old Korean support
    kit' at MS web site. They do not have OT tables
    for Hangul Jamos. Ogulim has a set of glyphs
    for all known consonant and vowel clusters which can
    be assembled together to render a pretty generic
    sequence of Hangul Jamos. 

    There are another set of fonts in MS Word 2000(Korean)
    and Old Korean support kit, namely Ngulim/Nbatang/Ngungseo.
    They have  precomposed glyphs for 
    all known precomposed Hangul syllables(thousands of
    them) ever found in Korean literature. Producing the  mapping
    from Hangul Jamo sequences to those precomposed syllables
    are tedious, but doable. 
 
    I'm wondering whether this font-specific 'hack' can be
    included in Pango. This is sort of like a hack for
    KAIST/Iyagi BDF johab fonts. If there's a way to
    uniquely identify these fonts, I think it's possible.
    and would dramatically improve Pango's ability to
    render pre-1933 orthography Korean text until
    Korean OTFs with Korean Hangul Jamo support are
    widely available.

Comment 6 Jungshik Shin 2002-10-13 05:52:27 UTC

> The Hangul shaper supports the generic sequences.  Just it lacks some
> testing.  :-)

  Well, what I tested is Pango-1.1.1 with your patch for 
bug 86591 applied. Unfortunately, it's not generic enough
and that's why I filed this bug and came up with a patch
against it. 


My patch still needs some more work to
make it more generic (or at least to the extent
that Lambda(http://www.ktug.or.kr)
and Yudit(http://www.yudit.org. see swindow/SFontOFT.cpp)
 do with Hangul Jamos). You may
refer to <http://jshin.net/i18n/middle.html> for more
information. Sometime between Unicode 2.0 and Unicode 3.0,
compatibility decomposition of consonant clusters and
vowel clusters into basic Jamo sequences have been removed
by a not-so-wise request of South Korean standard body to
ISO/IEC JTC1/WG2/SC2 and UTC but IMO, that has to be supported
as well. I'm not sure that has to be dealt with in Pango
or Glib (unicode normalization related routines).

Comment 7 Jungshik Shin 2002-10-13 11:38:36 UTC

Created attachment 11519 [details] [review]
patch v2

Comment 8 Changwoo Ryu 2002-10-13 13:19:27 UTC

1. please file another bug if you want more features than this
crashing bug.  And submit separate patches if you want more features.
 It is not very acceptable to apply your patch, which contains many
other fixes.  And a part of your patch should be applied to the 2.0.x
tree and some also to the HEAD.  Step by step...  :-)

2. why your patch has these?

-  if (length >= 3 && IS_L_S(text[0]) && IS_V_S(text[1]) &&
IS_T_S(text[2]))
+  if (length == 3 && IS_L_S(text[0]) && IS_V_S(text[1]) &&
IS_T_S(text[2]))
     composed = 3;
-  else if (length >= 2 && IS_L_S(text[0]) && IS_V_S(text[1]))
+  else if (length == 2 && IS_L_S(text[0]) && IS_V_S(text[1]))

It is a feature, not a bug.  It renders some possible prefix of jamo
sequences as a syllable.  Imagine what preedit string looks like when
you input with 2-bulsik Hangul keyboard and ksc5601.1987-0 font.

Comment 9 Jungshik Shin 2002-10-13 14:01:41 UTC

> 2. why your patch has these?

> -  if (length >= 3 && IS_L_S(text[0]) && IS_V_S(text[1]) &&
> IS_T_S(text[2]))
> +  if (length == 3 && IS_L_S(text[0]) && IS_V_S(text[1]) &&
> IS_T_S(text[2]))
>      composed = 3;
> -  else if (length >= 2 && IS_L_S(text[0]) && IS_V_S(text[1]))
> +  else if (length == 2 && IS_L_S(text[0]) && IS_V_S(text[1]))

> It is a feature, not a bug.  It renders some possible prefix of jamo
> sequences as a syllable.  Imagine what preedit string looks like when
> you input with 2-bulsik Hangul keyboard and ksc5601.1987-0 font.

  I'm not sure what you meant by 'possible prefix of Jamo
sequence'. Could you elaborate with a couple of examples? 

Anyway, whether a sequence like
'U+1101 U+116E U+1167 U+11BB' has to be treated
as a single syllable and rendered as such OR
as a syllabel (U+1101 U+116E) followed by
stand-alone 'U+1167' and 'U+11BB'  is arguably
debatable. However, Unicode 3.0 section 3.11
is very clear about that, IMO. And, the whole
sequence has to be treated as a single syllable

Nonetheless, you may leave that alone in hangul-x.c, but
in hangul-xft.c, it should definitely be changed
as I explained in my comment dated 2002-10-13 01:42
(see the paragraph beg. with 'the rationale behind....')
With Xft deployment rapidly spreading, I don't
care much about hangul-x.c 

> 1. please file another bug if you want more features than this
> crashing bug. 

  Fine, I can do that although I prefer to 
change the summary line of this bug to 'enhance
Hangul shaper' and do all the work here because it's
a pain in the ass to make separate patches against
a single set of files and test them separately. 

 My first patch more or less fits what you want.
I'll upload a slightly revised patch.

Comment 10 Jungshik Shin 2002-10-13 14:10:09 UTC

Created attachment 11523 [details] [review]
a new patch only fixing memory corruption (+ a little @)

Comment 11 Changwoo Ryu 2002-10-13 20:23:18 UTC

>   I'm not sure what you meant by 'possible prefix of Jamo
> sequence'. Could you elaborate with a couple of examples? 
> 
> Anyway, whether a sequence like
> 'U+1101 U+116E U+1167 U+11BB' has to be treated
> as a single syllable and rendered as such OR
> as a syllabel (U+1101 U+116E) followed by
> stand-alone 'U+1167' and 'U+11BB'  is arguably
> debatable. However, Unicode 3.0 section 3.11
> is very clear about that, IMO. And, the whole
> sequence has to be treated as a single syllable

You don't understand what (length >= 3) or (length >= 2) means.
Carefully read the loop below the if() condition.  It renders 
the prefixing L+V or L+V+T as a syllable *and* the rest L, T, 
whatever jamos as separate jamo glyphs, rather than "length" number 
of jamo glyphs.

Why?  If not, with 2-bulsik Hangul keyboard, the preedit 
string will be widen first and narrowed next thus very confusing.
The early hangul shaper has =='s as your patch, and I got
numerous complaints from 2-bulsik users.

'U+1101 U+116E U+1167 U+11BB' as a single syllable is another game.

Comment 12 Jungshik Shin 2002-10-14 07:54:51 UTC

> You don't understand what (length >= 3) or (length >= 2) means.
> Carefully read the loop below the if() condition.  It renders 
> the prefixing L+V or L+V+T as a syllable *and* the rest L, T, 
> whatever jamos as separate jamo glyphs, rather than "length" number 
> of jamo glyphs.

Come on !!. I do know how to read the code( have you
read my previous comments at all? )
That's exactly the problem I want to fix. hangul-xft.c
MUST NOT do that if it wants to be compliant to Unicode 3.0.

> Why?  If not, with 2-bulsik Hangul keyboard, the preedit 
> string will be widen first and narrowed next thus very confusing.
> The early hangul shaper has =='s as your patch, and I got
> numerous complaints from 2-bulsik users.

  As things stand now, I'm not aware of
any  Korean XIM that uses U+1100 Jamos during
preedit. Ami uses Hangul Compatibility Jamos(U+3130 block). 
Given that,I don't understand why my patch would present a problem to
2set-keyboard users. If you think I'm still missing anything,
could you please give a very concrete example with a sequence
of Jamos? 

> 'U+1101 U+116E U+1167 U+11BB' as a single syllable is another game.

  How would you solve this problem? The problem has to be solved
for sure. Otherwise, pre-1933 orthography Korean text cannot
be properly rendered with Pango while it can be by Uniscribe
under MS WIndows 2k/XP. Do you know how many Jamo clusters
have to be represented that way for the full support of pre-1933
orthography Korean text? There are a lot of them. If in doubt,
why don't you look at Ngulim.ttf and Ogulim.ttf with tools like
pfaedit.

Comment 13 Jungshik Shin 2002-10-14 08:24:02 UTC

> Why?  If not, with 2-bulsik Hangul keyboard, the preedit 
> string will be widen first and narrowed next thus very confusing.

There are a couple of problems in this line of argument.

If you're talking about a sequence like 'U+1100 U+116E U+1165',
you're trying to solve other's problem at a wrong level assuming
there's such a Korean XIM as uses U+1100 Jamo and exposes 
a sequence like 'U+116E U+1165' to the underlying rendering
layer. Arguably, it's not Pango's responsibility but XIM's
and that of fonts (fonts should have combining/non-spacing
glyphs for Hangul Conjoining Jamos).
XIM should not expose such a sequence to the rendering
layer.  

Secondly, even if a not-so-well-written XIM does, Pango can 
solve the problem when my pre-Unicode 3.0 normalization routine
for Hangul Jamos(it will convert 'U+116E U+1165' to 
'U+116F. If you still have Unicode 2.0 book, 
you can see consonant clusters and vowel clustes have
compatibility decomposition into sequences of basic Jamos) 
is in place. 

Thirdly, even WITHOUT the normalization mentioned above, TTFs like
CODE2000 don't have 'widening and narrowing' problem you wrote
about because Hangul vowels and trailing consonants in
CODE2000 are combining/non-spacing. Broken fonts like Arial MS
Unicode and Cyberbit have this problem because their glyphs
for Hangul Conjoining JAmos are spacing instead of
non-spacing/zero-width.  However, you have
to note that so-called 'pre-edit' problem is ficitious (as far as I can
tell) because no Korean XIM(I'm aware of) uses U+1100 Hangul 
Jamos at the moment.  If there's a Korean XIM that uses
U+1100 Hangul Jamos, please let me know so that I can test
it myself (and I'll be glad to stand corrected)

Finally, if you're concerned about the problem when ksc5601.1987-0
font is used, I wouldn't insist on replacing '>=' with '=='
in hangul-x.c as I wrote in my comment 
dated 2002-10-13 10:01.  My latest patch doesn't have it
for hangul-x.c. However, it's absolutely necessary for
hangul-xft.c

Comment 14 Jungshik Shin 2002-10-14 10:19:49 UTC

> It is not very acceptable to apply your patch, which contains many
> other fixes.  And a part of your patch should be applied to the 2.0.x
> tree and some also to the HEAD.  Step by step...  :-)

  Pls, bear with me because I'm not familiar with the life cycle
of a bug in Gnome development(review, check-in rules, etc) and 
for that matter, Gnome devel. in
general. My patch is against Pango 1.1.1, but the current CVS(HEAD?)
has exactly the same code as Pango 1.1.1 as far as pango/modules/hangul
is concerned. ('cvs diff' yields exactly the same diff file as 
diff against Pango 1.1.1). 

 You wrote  about 2.0.x and HEAD. My understanding is
that you were refering to Gtk version number. What I'm not sure
of is the relationship between Gtk version and Pango version.
It seems like gtk 2.0.x(stable branch?) corresponds to Pango 1.0.x
(stable) and
gtk 2.1.x(HEAD: development/bleeding edge branch) goes together with
Pango 1.1.x
. Am I right? 

 If that's the case, how about applying my latest (and simplest) patch
to Pango 1.1.x tree first? Pango 1.0.x doesn't have hangul-xft.c,
does it? For Pango 1.0.x, a part of patch for hangul-x.c can be
applied. Then, we can close this bug as solved and go onto
fixing other problems and enhancing hangul-xft in (a) separate bug(s).
I filed bug  95708 for font-specific hack (Oxxxx/Nxxx fonts)

Comment 15 Jungshik Shin 2002-10-14 10:32:19 UTC

> Why?  If not, with 2-bulsik Hangul keyboard, the preedit 
> string will be widen first and narrowed next thus very confusing.

  Now I know what kind of sequence you're talking about.
You meant a sequence like 'U+1100 U+1161 U+11AF U+1100'
during preedit, didn't you?
However, as I wrote before, that kind of sequence is not
used by any Korean XIM that I know of. All Korean XIMs I know
use U+3130 Compa. Jamos. 

EVEN IF they're used by XIM during
preedit, fonts and rendering engine that implement 
Hangul Conjoining Jamo behavior compliant to Unicode spec.
should have no problem.  If somebody complains about this,
(s)he has to be told to use fonts compliant to Unicode spec.
ksc5601.1987-0 X11 BDF, ksc5601.1992-3 X11 BDF, johab*-1 X11 BDF
are not compliant to Unicode spec and I understand
your concern about 'narrowing and widening' when they're
used. That's why my patch doesn't '>=' with '==' in
render_syllable_with_(ksc5601|johab|ksx1005) () in hangul-x.c
while it does in render_syllable_with_iso10646 in hangul-x.c
and render_syllable() in hangul-xft.c

Comment 16 Changwoo Ryu 2002-10-14 12:10:27 UTC

OK, let me summarize:

1. In the first, I've been confused by your comments.  You mentioned
'U+1101 U+116E U+1167 U+11BB as a single syllable...' when you ask to
replace '>='s with '=='s.  But it's a DIFFERENT problem, OK?  Perhaps
the code should be fixed to support that but I don't want to change
the way it works for un-normalizable jamos.

2. I couldn't find any sentence in Unicode 3.0 which specifies this
case, when a Jamo sequence is unable to be normalized.  The section
3.11 only specifies some ideal (but most used) cases and it never
specifies how to render such sequences when the underlying font is not
very capable.

Well if it is really in the Unicode, then...I won't follow it.  Then
the standard is worse than my code!  :)

3. I once wrote a prototype GTK+ Hangul input module which used Jamos.
 :-)  Yes, most Hangul input methods don't uses Jamos but maybe in the
future.  In fact Choe, Hwanjin is working on his GTK+ input module
(http://imhangul.kldp.net) to input mid-age Hangul jamos (then it
_should_ use jamos).

AND...if XIMs don't uses Jamos, then it is clearer that my decision is
right for the consistency with the XIMs; let's take an example:

SIOS YU PHIEUPH EO

Assume ksc5601.1987-0 font, now most XIM servers render,

1) 'SIOS YU PHIEUPH' as 'SIOS YU' syllable and 'PHIEUPH' compatibility
jamo.
2) If the user types 'EO' on (1), they render it as 'SIOS YU' and
'PHIEUPH EO' syllables.
3) If the user stops input on (1), the current code renders the final
string as is; 'SIOS YU' syllable and 'PHIEUPH' jamo.  But your way
renders it as three 'SIOS', 'YU', and 'PHIEUPH' jamos.

4. What about other better Hangul fonts you suggested?  I don't think
any font _in the world_ could render all (unlimited number of)
possible jamo sequences with perfect.  I agree the jamos should be
rendered as syllable forms as possible, but some (even normalized)
jamo sequences sequences still need the fallback.  And I think it's
better to render prefixing L+V or L+V+T as a syllable in this case.

Comment 17 Jungshik Shin 2002-10-14 16:39:42 UTC

> 1. In the first, I've been confused by your comments.  You mentioned
> 'U+1101 U+116E U+1167 U+11BB as a single syllable...' when you 
> ask to
> replace '>='s with '=='s.  But it's a DIFFERENT problem, OK? 

  No, it's not a different problem. Your code keeps that
sequence from being properly rendered by breaking it up
into two pieces, 'U+1101 U+116E' (which your code
converts into a Hangul Syllable 'U') and two stand-alone
Jamos, U+1167 and U+11BB. By replacing '>=' with '==',
'U+1101 U+116E U+1167 U+11BB' is carried over to the code
down the road and gets rendered correctly as a single
syllable by the magic of a 'font'. 
 

>  Perhaps
> the code should be fixed to support that but I don't want to change
> the way it works for un-normalizable jamos.

  Hmm...  As I wrote before, you can leave your code
alone in hangul-x.c but hangul-xft.c needs to be changed.
Because there are already fonts that work more or less
with Hangul Conjoining Jamos the way they're supposed
to work according to Unicode 3.0. 
  

> 2. I couldn't find any sentence in Unicode 3.0 which specifies this
> case, when a Jamo sequence is unable to be normalized.  The section

   Unable to be normalized to what? a precomposed syllable?  
There are lots and lots of them and can be taken care of by
what I'm writing now. 

> 3.11 only specifies some ideal (but most used) cases and it never
> specifies how to render such sequences when the underlying font is 
> not very capable.

  Unicode 3.0 is very clear as to where Hangul syllable
boundary is. Your code  assumes that
the boundary is either right after LVT or LV(perhaps
you know well that it's not right but are doing
it as a fallback. Unfortunately, while doing it,
you're shooting for LESS than  what's currently
possible.)  However, LVT or LV can
be followed by another T or V so that currently hangul-xft.c
renders a sequence like LVTT or LVV in a way NOT compliant
to Unicode 3.0 standard. And, I'm not making up a fictious
case. Real world Korean literature contain numerous
examples like that. 


> Well if it is really in the Unicode, then...I won't follow it.  Then
> the standard is worse than my code!  :)

  How many times do I have to repeat that a Hangul syllable
is defined as 'L+V+T*M?' instead of 'LVT'? Your code
assumes it's defined as 'LVT?' which is downright wrong. 
As for hangul-x.c, I can live with that, but hangul-xft.c,
it MUST be changed. Fonts like CODE2000 can render
'LVTT' or 'LVV' or 'LVM', 'LVTM' in a reasonably legible
way. By artificially splitting 'LVV' or 'LVTT' into
'LV' and 'V' or 'LVT' and 'T', your code makes it impossible
for it to work.  Moreover, fonts like Oxxx  can
deal with all known instances of Hangul syllables in
literature. Why would you want to be content with a fallback
when we can do  more? 

> AND...if XIMs don't uses Jamos, then it is clearer that my decision is
> right for the consistency with the XIMs; 

  I'm not following you here.

> let's take an example:

> SIOS YU PHIEUPH EO

> Assume ksc5601.1987-0 font, now most XIM servers render,

  Why would XIM servers running under ko_KR.UTF-8 locale 
have to be limited to using ksc5601.1987-0 fonts? 
For instance, my patched version of Ami (for ko_KR.UTF-8 locale)
uses iso10646-1 fonts.  If CHOI Hwan-jin's new XIM uses
ksc5601.1987-0 fonts, he has to fix his XIM to use
iso10646-1 fonts. Pango should  not be held responsible
for problems of other programs. 

> 1) 'SIOS YU PHIEUPH' as 'SIOS YU' syllable and 'PHIEUPH' 
> compatibility jamo.
> 2) If the user types 'EO' on (1), they render it as 'SIOS YU' and
> 'PHIEUPH EO' syllables.
> 3) If the user stops input on (1), the current code renders the final
> string as is; 'SIOS YU' syllable and 'PHIEUPH' jamo.  But your way
> renders it as three 'SIOS', 'YU', and 'PHIEUPH' jamos.

  You're making an assumption that U+1100 Hangul Jamo sequences
(such as U+1109, U+1172 U+11C1) 
won't be rendered as composed when left by themselves. That's
NOT the case with fonts like CODE2000 and PARK WOn-kyu's
ISO 10646 X11 BDF fonts. As I wrote a couple
of times, **BROKEN** fonts like Arial MS Unicode
and Cyberbit would render them that way. However, fonts
with combining/non-spacing glyphs for Hangul Conjoing
Jamos would render them as syllables. Therefore, your point
is mute.  

  Moreover, in this particular case, ' U+1109, U+1172 U+11C1'
wouldn't go down the road any way because it can be rendered
as a precomposed syllable. You wrote that ksc5601.1987-0 font
is assumed. Didn't I say that I don't care if you want to
leave '>=' alone in render_syllable_with_ksc5601()
(and render_syllable_with_johab|ksx1005 as well) in hangul-x.c?
 


> 4. What about other better Hangul fonts you suggested?  I don't think
> any font _in the world_ could render all (unlimited number of)
> possible jamo sequences with perfect.  

  Sure, no font ever will be able to deal with the most generic
'L+V+T*M?' and even 'L{1,3}V{1,3}T{0,3}M?'. But some of them
are already more capable than your code assumes and Pango
shouldn't keep them from doing what they can do. 


> I agree the jamos should be
> rendered as syllable forms as possible, but some (even normalized)
> jamo sequences sequences still need the fallback.  And I think it's
> better to render prefixing L+V or L+V+T as a syllable in this case.

  I'm not disputing your point that some kind of fallback is
inevitable. Problem is that  your fallback in hangul-xft.c is less
than optimal and  gets in the way of perfectly
capable fonts. Which would you prefer 'KuyeoSS' or 'Ku' followed
by stand-alone 'Yeo' and 'SS' for 'U+1100 U+116E U+1167 U+11BB'?

   Let me summarize what I've been saying all along:

   - I don't mind your leaving '>=' alone in hangul-x.c
     with the possible exception of render_syllable_with_iso10646().
     Esepcially, in render_syllable_with_ksc5601(), your rationale
     behind using '>=' certainly makes quite a lot of sense. 

   - I feel very strongly that '>=' in render_syllable()
     in hangul-xft.c has to be replaced with '=='. 

Hopefully, this time around, I was successful in getting through
to you what I want to.

Comment 18 Changwoo Ryu 2002-10-14 18:07:00 UTC

About the standard: now I understand why you think the hangul shaper
violates Unicode.  But I think it doesn't.  In my way, the hangul
shaper renders a *single* syllable but just with several glyphs; a
syllable glyph and jamos' glyphs.  It separates jamos as syllables as
specified in the standard.  But no standard specifies how to render
the each syllable with a poor font.  If a font renders the sequence as
N jamo glyphs, then it also violates the standard in your sense.

When I wrote the code only Unicode 2.0 is out so it stil doesn't know
about the Hangul tone "M".  I don't know how to interpret it.  Plz
file a bug if there's any issue around it.

And you made the point, the font magic.  Sorry, I have not known the
'font magic' can render such sequence as a syllable.  I was thinking
about combining LVVTT or like as a precomposed syllable if possible.  

OK but it still might be considered..  Your patch could make it
prettier or (IMO) uglier, depending on the font.  Only a few fonts
work that way, don't?  And most importantly there is no such free font
now.  Hmm...  Like the way you suggested in bug 95708, hangul-xft
could chose how to render according to the font's capability.  Well,
of course it'll be dirty hacks...  I can not agree to make hangul-xft
works more optimally with a few proprietary fonts and more poorly with
other fonts, including free ones.

> If CHOI Hwan-jin's new XIM uses
> ksc5601.1987-0 fonts, he has to fix his XIM to use
> iso10646-1 fonts. Pango should  not be held responsible
> for problems of other programs. 

It's not an XIM but a GTK+ native input module.  You should know that
the GTK+ input modules don't know what font it uses.  They can't
choose their fonts.

Comment 19 Jungshik Shin 2002-10-14 19:43:23 UTC

> About the standard: now I understand why you think the hangul shaper
> violates Unicode.  But I think it doesn't.

  I think I was not very careful in my choice of word. I should
not have used 'not compliant' or 'violate'.  

> OK but it still might be considered..  Your patch could make it
> prettier or (IMO) uglier, depending on the font.

  Sure, this is where different people can have different 
opinions. However, with your current code, EVEN if I
use fonts that can do some *rudimentary* 'magic', I have to
live with less-than-desirable
result. With my patch applied, Pango can blame fonts and 
we can tell people
to use fonts like CODE2000 or PARK Won-kyu's BDF fonts
instead of Arial MS Unicode or Cyberbit
(or Ogulim when my code to make use of it is put in place)

> Like the way you suggested in bug 95708, hangul-xft
> could chose how to render according to the font's capability.  Well,
> of course it'll be dirty hacks... 

  It's as dirty as special-casing johab(sh)-1 X11 BDF fonts in
hangul-x.c  

> I can not agree to make hangul-xft
> works more optimally with a few proprietary fonts

  Proprietary fonts? CODE2000 is free(not GPLed, but
nonetheless freely available. How many GPLed Korean
TTFs do we have? ). So are PARK Won-kyu's. 
Next release of Baekmuk TTFs can have combining/non-spacing
glyphs for Hangul Conjoining Jamos. 
 
> and more poorly with other fonts, including free ones.

  Well well... 'More poorly' is certainly debatable. Even if
it is not, I can still make a case for my patch. 
As I wrote above, Pango should not try to take a blame for
what it cannot do anything over. If some people
decide to use fonts without  combining/non-spacing glyphs 
for Hangul Conjoining Jamo,  they cannot blame Pango. They do have a
choice(free
fonts with more or less correct glyphs for Jamos are
available) and make a wrong choice. Trying to make up
for their wrong choice, Pango should not 'punish' others
who make a better choice.

>  Only a few fonts work that way, don't?  
> And most importantly there is no such free font now.  

  To begin with, we don't have many free TTFs for Korean
(we have only a few free TTFs, don't we?)
CODE2000 is freely available and does the 'magic'. So are/do
PARK Won-kyu's X11 BDF fonts in iso10646-1.  Besides,
PARK Won-kyu's X11 BDF fonts can be converted to TTFs
with appropriate spacing. It can happen anytime soon.

Of course, eventually this has to be dealt with in a way similar
to the way Indic scripts are handled using Opentype fonts.
Unfortunately, there's no free OTF with appropriate Opentype
tables for Hangul Conjoining Jamos.  It's frustrating that
Microsoft (or Korean commerical foundries) already
have a few such OTFs, but haven't published any spec.
For Indic scripts, MS published the full spec. and many
people have been working on making free Indic OTFs.
Pango also  take advantage of the published spec as well, I believe.
 

> In my way, the hangul
> shaper renders a *single* syllable but just with several glyphs; a
> syllable glyph and jamos' glyphs. 

>  It separates jamos as syllables as
> specified in the standard. 

It could well be used as a fallback, but only if  other
alternatives are exhausted, which I'm afraid is not the case
(Xft) here. Nonetheless, I admit that it's a debatable point.

> If CHOI Hwan-jin's new XIM uses
>> ksc5601.1987-0 fonts, he has to fix his XIM to use
>> iso10646-1 fonts. Pango should  not be held responsible
>> for problems of other programs. 

>It's not an XIM but a GTK+ native input module.  You should know that
>the GTK+ input modules don't know what font it uses.  They can't
>choose their fonts.

  If they can't, can I assume that it'll be given  a font with
the widest coverage by Gtk+(instead of fonts in ksc5601.1987-0
with a very poor coverage)? With Baekmuk fonts covering
the full repertoire of Hangul syllables and available
in iso10646-1 encoding (in an unlikely case
Gtk+ input module cannot use Xft fonts - in TTF -  and can
only use X11 core fonts. even X11 core fonts can be outline
thanks to FT and X-tt module), how much odd do you think 
your example given in your previous comment
(a syllable not covered by ksc5601.1987-0 font
in the middle of input)  happening have? Anyway,
for ksc5601.1987-0, it's perfectly all right with
me to leave '>='.    


> When I wrote the code only Unicode 2.0 is out so it stil doesn't know
> about the Hangul tone "M".  

  hangul-xft.c was written this summer, wasn't it? All right.
hangul-x.c must have been written a few years ago. 
  

> I don't know how to interpret it.  Plz
> file a bug if there's any issue around it.

  Hangul tone marks are non-spacing/combining and
can follow a Hangul precomposed syllable or 
a sequence of Hangul Conjoining Jamos forming a syllable.
Although they follow a Hangul syllable or a seq. of
Hangul Jamos, they have to be rendered to the *left*
of the preceeding Hangul syllable. Therefore, they
have to be put at the begiining of a seq. of glyphs
(that is, reordering is necessary)
as is done by Yudit. 
 
  CODE2000 and PARK Won-kyu's X11 BDF fonts have glyphs
for them (combining/non-spacing).

Comment 20 Changwoo Ryu 2002-10-15 02:39:49 UTC

(I'm very tired by your long and detailed replies.  You don't have to
copy my previous replies all the time.  It's not a mailing list or
Usenet.)

1. CODE2000 is NOT free (libre, DFSG free, OSD compliant, ...).  'GPL'
is just one free license.  I won't say much about it.

2. I agree to make the capable fonts work better than now.  But also,
I don't wanna make the poor fonts work worse than now.  'Just
replacing >= with ==' is not an option.

3. When I first wrote the hangul-x, it was to render Hangul 'best as
possible with the given font'.  If I ever wanted to blame poor,broken
fonts, I drop ksc5601.1987-0 support.  Even when the font is not very
capable, hangul shaper still can do its best with that font.  I want
such approach, not just relying on the font's capability.

My conclusion is: 

I think doing some font specific handling is the best solution.  The
default option can be '==' or '>=', but either should not make worse
to the old, poor, and broken fonts supports.


>   If they can't, can I assume that it'll be given  a font with
> the widest coverage by Gtk+(instead of fonts in ksc5601.1987-0
> with a very poor coverage)?

No, input modules do not take care of displaying their preedit
strings. They just pass the strings to GTK+.  Well, more than 90% of
Korean X users still use the poor ksc5601.1987-0 fonts.

Comment 21 Jungshik Shin 2002-10-15 04:33:36 UTC

> 1. CODE2000 is NOT free (libre, DFSG free, OSD compliant, ...). 'GPL'
> is just one free license.  I won't say much about it.

  So, do you want to 'punish' users who choose to install it
for not agreeing with you on the definition of being free? 

> 2. I agree to make the capable fonts work better than now.  
> But also,
> I don't wanna make the poor fonts work worse than now.  'Just
> replacing >= with ==' is not an option.

  Can you tell me what poor fonts you're talking about here? 
Don't tell me you're talking about ksc5601.1987-0 fonts. 
It's nothing to do with hangul-xft.c



> No, input modules do not take care of displaying their preedit
> strings. They just pass the strings to GTK+. 

  The result is the same. Gtk+ will pick a font with the best
coverage, won't it? If not, it has to be fixed, IMHO.
If it's end-user-configurable, it's her/his responsibility
to pick the best font (s)he has.   

>  Well, more than 90% of
> Korean X users still use the poor ksc5601.1987-0 fonts.

  How did you come up with 90%? Baekmuk TTFs are widely available
and they're GPLed. With FT/X-TT module of XFree86 4.x,
they're presented as X11 core fonts in ksc5601.1987-0,
ksc5601.1992-3 and iso10646-1 encoding. Moreover, like RH 8.0
just does, other Linux distros will (if they have not done
already) begin to support Xft. 

  All right. You wrote X users not Linux/FreeBSD/OpenBSD/NetBSD
users. Solaris, AIX, Tru64 have been supporting ko_KR.UTF-8
for a few years and at least Solaris is shipped with a set of
commercial CID-keyed fonts with the full coverage of Hangul 
syllables that can be presented as ksc5601.1992-3 X11 core fonts.
Simply put, you don't have to worry about commerical Unix/X users.

Even if only ksc5601.1987-0 fonts are available to some people,
replacing '>=' with '==' in hangul-xft.c doesn't affect those poor souls 
at all. How many times do I have to write that you 
CAN LEAVE '>=' alone in render_syllable_with_ksc5601() in hangul-x.c? 
Three times, four times, five times?  Is it now clear enough?

Comment 22 Changwoo Ryu 2002-10-15 05:26:22 UTC

Well, you don't distinguish betweeen 'free' and 'GPLed'?  Baekmuk is
free but NOT 'GPLed'.  Baekmuk has MIT/X like license.  In the first,
you should know Bugzilla is not a good flaming place.  If you want
more flame, go to gnu.discuss newsgroup.


The poor fonts includes the 'broken' TTF fonts, including Baekmuk TTF.
Even in hangul-xft, some broken TTF fonts don't have no such 'magic'.
 Just replacing >= with == will make them worse.  

And don't get me tired; again, I want to make the capable fonts
support better, but without sacrificing of the current poor fonts
fonts support.

Comment 23 Jungshik Shin 2002-10-15 06:14:08 UTC

> Well, you don't distinguish betweeen 'free' and 'GPLed'?  Baekmuk is
> free but NOT 'GPLed'.  Baekmuk has MIT/X like license. 

  You're absolutely right. It was another instance of 
my momnetary lapse of memory, which unfortunately happens so often
these days. 

>  In the first,
> you should know Bugzilla is not a good flaming place.  If you want
> more flame, go to gnu.discuss newsgroup.

  Well, your definition of the word 'flaming' must be different from
mine. I don't recall I ever did any flaming here. Anyway,
here goes my apology if you felt that way. 


 As for Baekmuk being poor, I can make  a case for hangul-x.c 
(render_with_ksc5601()) and hangul-xft.c being different. 

As you know too well, ksc5601.1987-0 has only 2350 precomposed
syllables while Baekmuk TTFs have 11,172 precomposed syllables
(at least Baekmuk batang and Baekmuk Gulim do). Therefore, 
when Baekmuk Batang/Gulim are used, NO modern Hangul syllable would
be  rendered as a series of stand-alone Jamos as would happen
with ksc5601.1987-0 fonts. That is, whether '>=' or '==' is used,
Jamo rendering routine wouldn't be reached  at all and
your so-called 'widening-narrowing' problem wouldn't happen
with Baekmuk batang/gulim. 

Now let me talk about generic Hangul syllables (represented with
a sequence of Hangul Jamos) that don't have precomposed forms
in U+AC00 block. You and I disagree on which is
a better way of rendering them. You want to render them 
as 'a precomposed syllable' followed by stand-alone Jamo glyphs
if a part of that generic sequence can form a precomposed
syllable in U+AC00 block.  Although I don't agree with you on this point
(I think a series of stand-alone Jamo glyphs for a whole syllable
is better partly because that was what some prominent Korean 
linguists - e.g. Choo Shi-gyung - in the early 20th century 
tried to implement), I'm willing to grant you  
a point here. So, let's say we're tied on this issue.
In other words, some people would agree to your view
of a better rendering while others would not and rather
agree to my view of a better rendering. Put it in yet
another way,  I don't think you can simply dismiss
my change as shooting for a better result with
a small set of fonts at the sacrifice of rendering
quality with 'poor' fonts.  

Because we're tied, we need a tie-breaker.  As I already wrote,
your current code blocks even a capable font from rendering
a generic Hangul syllable. With my change, users have a choice
at their disposal. If they care about generic Hangul syllables
and are not satisfied with the rendering quality obtained
with 'poor' fonts, they can buy/get/download/install/whatever fonts
that do the right thing. They can also contribute to enhance 
Baekmuk fonts or other fonts. 

If I'm successful making my case, that's great. If not,
let's just move on. Why don't you commit(if you can)
only a part of my patch that fixes 'buffer' problem. 

Then, we can resolve this issue in a different way.
Perhaps, we can look into a font and do things
differently depending on  whether they
have non-spacing/combing glyphs for Hangul vowels and
trailing consonants, which is a clear sign that
it can do some 'rudimentary magic'. Hopefully,
looking into a font this way is not so expensive an
operation.

Comment 24 Changwoo Ryu 2002-10-15 08:45:50 UTC

Well, Baekmuk also has the widening-and-narrowing problem if a
(future) input method supports non-modern Jamo input.

And I don't like controling users that way, especially if the 'better'
choice is a proprietary one.  You know, few people work on improving
Baekmuk to make their desktop better.  They just learn how to use
proprietary fonts in Linux.

Anyway I prefer doing the best thing as possible with the given font,
rather than relying on the font.  I think the best thing is the way
you suggested in the last; works differently depending on font.

Comment 25 Changwoo Ryu 2002-10-15 08:58:23 UTC

Patch applied.  See bug 95730 for stable branch.


2002-10-15  Changwoo Ryu  <cwryu@debian.org>

	* modules/hangul/hangul-xft.c (hangul_engine_shape): 
	* modules/hangul/hangul-x.c (hangul_engine_shape): Added missing
	memcpy() from the static jamo buffer to allocated jamo buffer
	(#95569).  Thanks to Jungshik Shin.




(If you want more fixes or features from the hangul shaper, please
don't hesitate to file bugs or drop mails to me.  It'll not be very
easy to fix even my code, when pango 1.1 become stable.)

Comment 26 Jungshik Shin 2002-10-15 09:00:43 UTC

Created attachment 11550 [details] [review]
a bare-bone patch

Comment 27 Jungshik Shin 2002-10-15 09:07:05 UTC

Ooops. Sorry that your comment about commiting my patch was in transit
while I attached a bare-bone patch. That can be now ignored.
Thank you for committing it. 

As for improving hangul-xft, let's keep on talking in bug 95708.
Now I've got a skeleton of code to resolve bug 95708.

As for fixing some obscure problems in hangul-x.c (addressed my 
second attachment), I'll open a new bug and post my patch there.

Comment 28 Jungshik Shin 2002-10-15 09:32:21 UTC

In your patch, you used 'gunichar2', but jamos_static and jamos
are gunichar of which size is diff.  from that of gunichar2. 

> memcpy(jamos, jamos_static, n_jamos*sizeof(gunichar2));

Was it a typo?

Comment 29 Changwoo Ryu 2002-10-15 09:50:51 UTC

> > memcpy(jamos, jamos_static, n_jamos*sizeof(gunichar2));
> 
> Was it a typo? 

Fixed now, thanks.

Comment 30 Jungshik Shin 2002-10-15 10:06:44 UTC

> As for fixing some obscure problems in hangul-x.c (addressed my 
> second attachment), I'll open a new bug and post my patch there.

  It's bug 95800. Can you take a look?

Comment 31 Changwoo Ryu 2002-10-26 18:13:36 UTC

I think someone who has the right (maybe the reporter or Owen?) can
mark this bug as RESOLVED.

Many things were discussed but not directly related with this bug...

Comment 32 Jungshik Shin 2002-10-26 19:07:13 UTC

I don't have enough privil. to change the status to 'resolved' from
'unconfirmed'. Owen, can you change the status? All other issues
discussed here have been filed as separate bugs.

Comment 33 Owen Taylor 2002-11-02 05:42:41 UTC

Moving bugs to new hangul component

Comment 34 Changwoo Ryu 2002-11-04 02:41:18 UTC

Marking this bug as RESOLVED FIXED

Comment 35 Elijah Newren 2003-04-01 22:57:24 UTC

*** Bug 109699 has been marked as a duplicate of this bug. ***