GNOME Bugzilla – Bug 95569
Pango Hangul shaper crashes when thrown in a generic sequence of Hangul Conjoining Jamos
Last modified: 2004-12-22 21:47:04 UTC
Package: pango Severity: normal Version: 2.0.2 Synopsis: Pango Hangul shaper crashes when thrown in a generic sequence of Hangul Conjoining Jamos Bugzilla-Product: pango Bugzilla-Component: general BugBuddy-GnomeVersion: 2.0 (2.0.3) Description: Description of Problem: When pre-1933 orthography Korean text is copy'n'pasted into gedit, gedit crashes. Steps to reproduce the problem: 1. Run gedit under ko_KR.UTF-8 locale 2. Launch Mozilla and point it at http://jshin.net/i18n/korean/hunmin.html 3. Copy the first paragraph of the main text in pre-1933 orthography Korean and paste it into gedit Actual Results: gedit crashes Expected Results: gedit should render it as rendered by Mozilla (the screenshot at the URL above) or Kwrite How often does this happen? Always Additional Information: hangul_shaper module in Pango needs to be improved to support as generic a form of Hangul Jamo sequence as possible (i.e. up to 3 L's + up to 3 V's + optional upto 3 T's). Debugging Information: Backtrace was generated from '/usr/bin/gedit' (no debugging symbols found)...[New Thread 8192 (LWP 23076)] 0x420ae169 in wait4 () from /lib/i686/libc.so.6
+ Trace 28679
Thread 1 (Thread 8192 (LWP 23076))
------- Bug moved to this database by unknown@bugzilla.gnome.org 2002-10-11 19:05 ------- The original reporter (jshin@mailaps.org) of this bug does not have an account here. Reassigning to the exporter, unknown@bugzilla.gnome.org. Reassigning to the default owner of the component, otaylor@redhat.com.
By a generic sequence of Hangul Jamos, I meant S := L+ V+ T* where L,V and T denote leading consonant, vowel and trailing consonant, respectively. Currently, hangul_shaper() in Pango can take care of 'LVT?' sequence, but it can't take care of 'L+V+T*' sequence as specified in Unicode 3.0 (section 3.10?) and as required for the full support of pre-1933 orthography Korean text. (see http://jshin.net/i18n/uyeo.html)
FYI, UCM can be used to enter a sequence of Hangul Jamos instead of cut'n'pasting from the existing document. Alternatively, vim (with keymap defined for Hangul Jamo) under xterm or Yudit can be used. Sequences like 'U+1101 U+116E U+1167 U+11BB' (mentioned in http://jshin.net/i18n/uyeo.html) always lead to a crash.
The Hangul shaper supports the generic sequences. Just it lacks some testing. :-) Try the patch in #86591. For your convenience: http://bugzilla.gnome.org/showattachment.cgi?attach_id=9478 This adds Hangul Jamo shaper for Xft, but it also has some bug fixes. It's applied to the HEAD and maybe 2.1.0, but not to 2.0.x branch. Someone could make the bug-fixing-only patch from it.
Created attachment 11517 [details] [review] a simple patch
What attachment 11517 [details] [review] does are : - correct the memory 'corruption' in hangul_shape_engine() when max_jamos is reached and jamos=jamos_static, a new chunk of memory is allocated by g_new, but the content of the buffer is not copied to the new place. - increase the default buffer size for jamos_static[] to 9 for up to 3 L's , 3 V's and 3 T's. jamos_static[] is not in heap so that 6 more is cheap. - replace a sequence of LVT and LV with a glyph for a precomposed Hangul syllable ONLY when the length of a given sequence is *equal to* (as opposed to 'equal to and greater than') 3 and 2, respectively. The rationale behind that is that even though LVT and LV form a syllable valid in modern Korean (i.e. in U+AC00 Hangul precomposed block), both can be followed by an extra T or V. An example is given in my previous comment and http://jshin.net/i18n/uyeo.html. Given a sequence of 'U+1101 U+116E U+1167 U+11BB', Pango should not replace 'U+1101 U+116E' with 'U+AFB8' but must treat the whole sequence as a single syllable. PARK Won-kyu's BDF fonts(for simple overstrking of Jamos) and James Kass's CODE2000 can render it in a more or less legible way. When OTF with gsub and other OT tables for Hangul Conjoining Jamos become widely available (currently, the only source of those fonts are Korean version of MS Office XP), this issue(rendering 'L+V+T*' sequence ) has to be revisited. In the meantime, I think this bug can be closed after applying my patch. Another possibility is to make Pango do what Yudit and Lambda do with Ogulim/Obatang/Ogunseo and fonts. These fonts are distributed in Korean MS Office 2000 and Ogulim is also available as 'Old Korean support kit' at MS web site. They do not have OT tables for Hangul Jamos. Ogulim has a set of glyphs for all known consonant and vowel clusters which can be assembled together to render a pretty generic sequence of Hangul Jamos. There are another set of fonts in MS Word 2000(Korean) and Old Korean support kit, namely Ngulim/Nbatang/Ngungseo. They have precomposed glyphs for all known precomposed Hangul syllables(thousands of them) ever found in Korean literature. Producing the mapping from Hangul Jamo sequences to those precomposed syllables are tedious, but doable. I'm wondering whether this font-specific 'hack' can be included in Pango. This is sort of like a hack for KAIST/Iyagi BDF johab fonts. If there's a way to uniquely identify these fonts, I think it's possible. and would dramatically improve Pango's ability to render pre-1933 orthography Korean text until Korean OTFs with Korean Hangul Jamo support are widely available.
> The Hangul shaper supports the generic sequences. Just it lacks some > testing. :-) Well, what I tested is Pango-1.1.1 with your patch for bug 86591 applied. Unfortunately, it's not generic enough and that's why I filed this bug and came up with a patch against it. My patch still needs some more work to make it more generic (or at least to the extent that Lambda(http://www.ktug.or.kr) and Yudit(http://www.yudit.org. see swindow/SFontOFT.cpp) do with Hangul Jamos). You may refer to <http://jshin.net/i18n/middle.html> for more information. Sometime between Unicode 2.0 and Unicode 3.0, compatibility decomposition of consonant clusters and vowel clusters into basic Jamo sequences have been removed by a not-so-wise request of South Korean standard body to ISO/IEC JTC1/WG2/SC2 and UTC but IMO, that has to be supported as well. I'm not sure that has to be dealt with in Pango or Glib (unicode normalization related routines).
Created attachment 11519 [details] [review] patch v2
1. please file another bug if you want more features than this crashing bug. And submit separate patches if you want more features. It is not very acceptable to apply your patch, which contains many other fixes. And a part of your patch should be applied to the 2.0.x tree and some also to the HEAD. Step by step... :-) 2. why your patch has these? - if (length >= 3 && IS_L_S(text[0]) && IS_V_S(text[1]) && IS_T_S(text[2])) + if (length == 3 && IS_L_S(text[0]) && IS_V_S(text[1]) && IS_T_S(text[2])) composed = 3; - else if (length >= 2 && IS_L_S(text[0]) && IS_V_S(text[1])) + else if (length == 2 && IS_L_S(text[0]) && IS_V_S(text[1])) It is a feature, not a bug. It renders some possible prefix of jamo sequences as a syllable. Imagine what preedit string looks like when you input with 2-bulsik Hangul keyboard and ksc5601.1987-0 font.
> 2. why your patch has these? > - if (length >= 3 && IS_L_S(text[0]) && IS_V_S(text[1]) && > IS_T_S(text[2])) > + if (length == 3 && IS_L_S(text[0]) && IS_V_S(text[1]) && > IS_T_S(text[2])) > composed = 3; > - else if (length >= 2 && IS_L_S(text[0]) && IS_V_S(text[1])) > + else if (length == 2 && IS_L_S(text[0]) && IS_V_S(text[1])) > It is a feature, not a bug. It renders some possible prefix of jamo > sequences as a syllable. Imagine what preedit string looks like when > you input with 2-bulsik Hangul keyboard and ksc5601.1987-0 font. I'm not sure what you meant by 'possible prefix of Jamo sequence'. Could you elaborate with a couple of examples? Anyway, whether a sequence like 'U+1101 U+116E U+1167 U+11BB' has to be treated as a single syllable and rendered as such OR as a syllabel (U+1101 U+116E) followed by stand-alone 'U+1167' and 'U+11BB' is arguably debatable. However, Unicode 3.0 section 3.11 is very clear about that, IMO. And, the whole sequence has to be treated as a single syllable Nonetheless, you may leave that alone in hangul-x.c, but in hangul-xft.c, it should definitely be changed as I explained in my comment dated 2002-10-13 01:42 (see the paragraph beg. with 'the rationale behind....') With Xft deployment rapidly spreading, I don't care much about hangul-x.c > 1. please file another bug if you want more features than this > crashing bug. Fine, I can do that although I prefer to change the summary line of this bug to 'enhance Hangul shaper' and do all the work here because it's a pain in the ass to make separate patches against a single set of files and test them separately. My first patch more or less fits what you want. I'll upload a slightly revised patch.
Created attachment 11523 [details] [review] a new patch only fixing memory corruption (+ a little @)
> I'm not sure what you meant by 'possible prefix of Jamo > sequence'. Could you elaborate with a couple of examples? > > Anyway, whether a sequence like > 'U+1101 U+116E U+1167 U+11BB' has to be treated > as a single syllable and rendered as such OR > as a syllabel (U+1101 U+116E) followed by > stand-alone 'U+1167' and 'U+11BB' is arguably > debatable. However, Unicode 3.0 section 3.11 > is very clear about that, IMO. And, the whole > sequence has to be treated as a single syllable You don't understand what (length >= 3) or (length >= 2) means. Carefully read the loop below the if() condition. It renders the prefixing L+V or L+V+T as a syllable *and* the rest L, T, whatever jamos as separate jamo glyphs, rather than "length" number of jamo glyphs. Why? If not, with 2-bulsik Hangul keyboard, the preedit string will be widen first and narrowed next thus very confusing. The early hangul shaper has =='s as your patch, and I got numerous complaints from 2-bulsik users. 'U+1101 U+116E U+1167 U+11BB' as a single syllable is another game.
> You don't understand what (length >= 3) or (length >= 2) means. > Carefully read the loop below the if() condition. It renders > the prefixing L+V or L+V+T as a syllable *and* the rest L, T, > whatever jamos as separate jamo glyphs, rather than "length" number > of jamo glyphs. Come on !!. I do know how to read the code( have you read my previous comments at all? ) That's exactly the problem I want to fix. hangul-xft.c MUST NOT do that if it wants to be compliant to Unicode 3.0. > Why? If not, with 2-bulsik Hangul keyboard, the preedit > string will be widen first and narrowed next thus very confusing. > The early hangul shaper has =='s as your patch, and I got > numerous complaints from 2-bulsik users. As things stand now, I'm not aware of any Korean XIM that uses U+1100 Jamos during preedit. Ami uses Hangul Compatibility Jamos(U+3130 block). Given that,I don't understand why my patch would present a problem to 2set-keyboard users. If you think I'm still missing anything, could you please give a very concrete example with a sequence of Jamos? > 'U+1101 U+116E U+1167 U+11BB' as a single syllable is another game. How would you solve this problem? The problem has to be solved for sure. Otherwise, pre-1933 orthography Korean text cannot be properly rendered with Pango while it can be by Uniscribe under MS WIndows 2k/XP. Do you know how many Jamo clusters have to be represented that way for the full support of pre-1933 orthography Korean text? There are a lot of them. If in doubt, why don't you look at Ngulim.ttf and Ogulim.ttf with tools like pfaedit.
> Why? If not, with 2-bulsik Hangul keyboard, the preedit > string will be widen first and narrowed next thus very confusing. There are a couple of problems in this line of argument. If you're talking about a sequence like 'U+1100 U+116E U+1165', you're trying to solve other's problem at a wrong level assuming there's such a Korean XIM as uses U+1100 Jamo and exposes a sequence like 'U+116E U+1165' to the underlying rendering layer. Arguably, it's not Pango's responsibility but XIM's and that of fonts (fonts should have combining/non-spacing glyphs for Hangul Conjoining Jamos). XIM should not expose such a sequence to the rendering layer. Secondly, even if a not-so-well-written XIM does, Pango can solve the problem when my pre-Unicode 3.0 normalization routine for Hangul Jamos(it will convert 'U+116E U+1165' to 'U+116F. If you still have Unicode 2.0 book, you can see consonant clusters and vowel clustes have compatibility decomposition into sequences of basic Jamos) is in place. Thirdly, even WITHOUT the normalization mentioned above, TTFs like CODE2000 don't have 'widening and narrowing' problem you wrote about because Hangul vowels and trailing consonants in CODE2000 are combining/non-spacing. Broken fonts like Arial MS Unicode and Cyberbit have this problem because their glyphs for Hangul Conjoining JAmos are spacing instead of non-spacing/zero-width. However, you have to note that so-called 'pre-edit' problem is ficitious (as far as I can tell) because no Korean XIM(I'm aware of) uses U+1100 Hangul Jamos at the moment. If there's a Korean XIM that uses U+1100 Hangul Jamos, please let me know so that I can test it myself (and I'll be glad to stand corrected) Finally, if you're concerned about the problem when ksc5601.1987-0 font is used, I wouldn't insist on replacing '>=' with '==' in hangul-x.c as I wrote in my comment dated 2002-10-13 10:01. My latest patch doesn't have it for hangul-x.c. However, it's absolutely necessary for hangul-xft.c
> It is not very acceptable to apply your patch, which contains many > other fixes. And a part of your patch should be applied to the 2.0.x > tree and some also to the HEAD. Step by step... :-) Pls, bear with me because I'm not familiar with the life cycle of a bug in Gnome development(review, check-in rules, etc) and for that matter, Gnome devel. in general. My patch is against Pango 1.1.1, but the current CVS(HEAD?) has exactly the same code as Pango 1.1.1 as far as pango/modules/hangul is concerned. ('cvs diff' yields exactly the same diff file as diff against Pango 1.1.1). You wrote about 2.0.x and HEAD. My understanding is that you were refering to Gtk version number. What I'm not sure of is the relationship between Gtk version and Pango version. It seems like gtk 2.0.x(stable branch?) corresponds to Pango 1.0.x (stable) and gtk 2.1.x(HEAD: development/bleeding edge branch) goes together with Pango 1.1.x . Am I right? If that's the case, how about applying my latest (and simplest) patch to Pango 1.1.x tree first? Pango 1.0.x doesn't have hangul-xft.c, does it? For Pango 1.0.x, a part of patch for hangul-x.c can be applied. Then, we can close this bug as solved and go onto fixing other problems and enhancing hangul-xft in (a) separate bug(s). I filed bug 95708 for font-specific hack (Oxxxx/Nxxx fonts)
> Why? If not, with 2-bulsik Hangul keyboard, the preedit > string will be widen first and narrowed next thus very confusing. Now I know what kind of sequence you're talking about. You meant a sequence like 'U+1100 U+1161 U+11AF U+1100' during preedit, didn't you? However, as I wrote before, that kind of sequence is not used by any Korean XIM that I know of. All Korean XIMs I know use U+3130 Compa. Jamos. EVEN IF they're used by XIM during preedit, fonts and rendering engine that implement Hangul Conjoining Jamo behavior compliant to Unicode spec. should have no problem. If somebody complains about this, (s)he has to be told to use fonts compliant to Unicode spec. ksc5601.1987-0 X11 BDF, ksc5601.1992-3 X11 BDF, johab*-1 X11 BDF are not compliant to Unicode spec and I understand your concern about 'narrowing and widening' when they're used. That's why my patch doesn't '>=' with '==' in render_syllable_with_(ksc5601|johab|ksx1005) () in hangul-x.c while it does in render_syllable_with_iso10646 in hangul-x.c and render_syllable() in hangul-xft.c
OK, let me summarize: 1. In the first, I've been confused by your comments. You mentioned 'U+1101 U+116E U+1167 U+11BB as a single syllable...' when you ask to replace '>='s with '=='s. But it's a DIFFERENT problem, OK? Perhaps the code should be fixed to support that but I don't want to change the way it works for un-normalizable jamos. 2. I couldn't find any sentence in Unicode 3.0 which specifies this case, when a Jamo sequence is unable to be normalized. The section 3.11 only specifies some ideal (but most used) cases and it never specifies how to render such sequences when the underlying font is not very capable. Well if it is really in the Unicode, then...I won't follow it. Then the standard is worse than my code! :) 3. I once wrote a prototype GTK+ Hangul input module which used Jamos. :-) Yes, most Hangul input methods don't uses Jamos but maybe in the future. In fact Choe, Hwanjin is working on his GTK+ input module (http://imhangul.kldp.net) to input mid-age Hangul jamos (then it _should_ use jamos). AND...if XIMs don't uses Jamos, then it is clearer that my decision is right for the consistency with the XIMs; let's take an example: SIOS YU PHIEUPH EO Assume ksc5601.1987-0 font, now most XIM servers render, 1) 'SIOS YU PHIEUPH' as 'SIOS YU' syllable and 'PHIEUPH' compatibility jamo. 2) If the user types 'EO' on (1), they render it as 'SIOS YU' and 'PHIEUPH EO' syllables. 3) If the user stops input on (1), the current code renders the final string as is; 'SIOS YU' syllable and 'PHIEUPH' jamo. But your way renders it as three 'SIOS', 'YU', and 'PHIEUPH' jamos. 4. What about other better Hangul fonts you suggested? I don't think any font _in the world_ could render all (unlimited number of) possible jamo sequences with perfect. I agree the jamos should be rendered as syllable forms as possible, but some (even normalized) jamo sequences sequences still need the fallback. And I think it's better to render prefixing L+V or L+V+T as a syllable in this case.
> 1. In the first, I've been confused by your comments. You mentioned > 'U+1101 U+116E U+1167 U+11BB as a single syllable...' when you > ask to > replace '>='s with '=='s. But it's a DIFFERENT problem, OK? No, it's not a different problem. Your code keeps that sequence from being properly rendered by breaking it up into two pieces, 'U+1101 U+116E' (which your code converts into a Hangul Syllable 'U') and two stand-alone Jamos, U+1167 and U+11BB. By replacing '>=' with '==', 'U+1101 U+116E U+1167 U+11BB' is carried over to the code down the road and gets rendered correctly as a single syllable by the magic of a 'font'. > Perhaps > the code should be fixed to support that but I don't want to change > the way it works for un-normalizable jamos. Hmm... As I wrote before, you can leave your code alone in hangul-x.c but hangul-xft.c needs to be changed. Because there are already fonts that work more or less with Hangul Conjoining Jamos the way they're supposed to work according to Unicode 3.0. > 2. I couldn't find any sentence in Unicode 3.0 which specifies this > case, when a Jamo sequence is unable to be normalized. The section Unable to be normalized to what? a precomposed syllable? There are lots and lots of them and can be taken care of by what I'm writing now. > 3.11 only specifies some ideal (but most used) cases and it never > specifies how to render such sequences when the underlying font is > not very capable. Unicode 3.0 is very clear as to where Hangul syllable boundary is. Your code assumes that the boundary is either right after LVT or LV(perhaps you know well that it's not right but are doing it as a fallback. Unfortunately, while doing it, you're shooting for LESS than what's currently possible.) However, LVT or LV can be followed by another T or V so that currently hangul-xft.c renders a sequence like LVTT or LVV in a way NOT compliant to Unicode 3.0 standard. And, I'm not making up a fictious case. Real world Korean literature contain numerous examples like that. > Well if it is really in the Unicode, then...I won't follow it. Then > the standard is worse than my code! :) How many times do I have to repeat that a Hangul syllable is defined as 'L+V+T*M?' instead of 'LVT'? Your code assumes it's defined as 'LVT?' which is downright wrong. As for hangul-x.c, I can live with that, but hangul-xft.c, it MUST be changed. Fonts like CODE2000 can render 'LVTT' or 'LVV' or 'LVM', 'LVTM' in a reasonably legible way. By artificially splitting 'LVV' or 'LVTT' into 'LV' and 'V' or 'LVT' and 'T', your code makes it impossible for it to work. Moreover, fonts like Oxxx can deal with all known instances of Hangul syllables in literature. Why would you want to be content with a fallback when we can do more? > AND...if XIMs don't uses Jamos, then it is clearer that my decision is > right for the consistency with the XIMs; I'm not following you here. > let's take an example: > SIOS YU PHIEUPH EO > Assume ksc5601.1987-0 font, now most XIM servers render, Why would XIM servers running under ko_KR.UTF-8 locale have to be limited to using ksc5601.1987-0 fonts? For instance, my patched version of Ami (for ko_KR.UTF-8 locale) uses iso10646-1 fonts. If CHOI Hwan-jin's new XIM uses ksc5601.1987-0 fonts, he has to fix his XIM to use iso10646-1 fonts. Pango should not be held responsible for problems of other programs. > 1) 'SIOS YU PHIEUPH' as 'SIOS YU' syllable and 'PHIEUPH' > compatibility jamo. > 2) If the user types 'EO' on (1), they render it as 'SIOS YU' and > 'PHIEUPH EO' syllables. > 3) If the user stops input on (1), the current code renders the final > string as is; 'SIOS YU' syllable and 'PHIEUPH' jamo. But your way > renders it as three 'SIOS', 'YU', and 'PHIEUPH' jamos. You're making an assumption that U+1100 Hangul Jamo sequences (such as U+1109, U+1172 U+11C1) won't be rendered as composed when left by themselves. That's NOT the case with fonts like CODE2000 and PARK WOn-kyu's ISO 10646 X11 BDF fonts. As I wrote a couple of times, **BROKEN** fonts like Arial MS Unicode and Cyberbit would render them that way. However, fonts with combining/non-spacing glyphs for Hangul Conjoing Jamos would render them as syllables. Therefore, your point is mute. Moreover, in this particular case, ' U+1109, U+1172 U+11C1' wouldn't go down the road any way because it can be rendered as a precomposed syllable. You wrote that ksc5601.1987-0 font is assumed. Didn't I say that I don't care if you want to leave '>=' alone in render_syllable_with_ksc5601() (and render_syllable_with_johab|ksx1005 as well) in hangul-x.c? > 4. What about other better Hangul fonts you suggested? I don't think > any font _in the world_ could render all (unlimited number of) > possible jamo sequences with perfect. Sure, no font ever will be able to deal with the most generic 'L+V+T*M?' and even 'L{1,3}V{1,3}T{0,3}M?'. But some of them are already more capable than your code assumes and Pango shouldn't keep them from doing what they can do. > I agree the jamos should be > rendered as syllable forms as possible, but some (even normalized) > jamo sequences sequences still need the fallback. And I think it's > better to render prefixing L+V or L+V+T as a syllable in this case. I'm not disputing your point that some kind of fallback is inevitable. Problem is that your fallback in hangul-xft.c is less than optimal and gets in the way of perfectly capable fonts. Which would you prefer 'KuyeoSS' or 'Ku' followed by stand-alone 'Yeo' and 'SS' for 'U+1100 U+116E U+1167 U+11BB'? Let me summarize what I've been saying all along: - I don't mind your leaving '>=' alone in hangul-x.c with the possible exception of render_syllable_with_iso10646(). Esepcially, in render_syllable_with_ksc5601(), your rationale behind using '>=' certainly makes quite a lot of sense. - I feel very strongly that '>=' in render_syllable() in hangul-xft.c has to be replaced with '=='. Hopefully, this time around, I was successful in getting through to you what I want to.
About the standard: now I understand why you think the hangul shaper violates Unicode. But I think it doesn't. In my way, the hangul shaper renders a *single* syllable but just with several glyphs; a syllable glyph and jamos' glyphs. It separates jamos as syllables as specified in the standard. But no standard specifies how to render the each syllable with a poor font. If a font renders the sequence as N jamo glyphs, then it also violates the standard in your sense. When I wrote the code only Unicode 2.0 is out so it stil doesn't know about the Hangul tone "M". I don't know how to interpret it. Plz file a bug if there's any issue around it. And you made the point, the font magic. Sorry, I have not known the 'font magic' can render such sequence as a syllable. I was thinking about combining LVVTT or like as a precomposed syllable if possible. OK but it still might be considered.. Your patch could make it prettier or (IMO) uglier, depending on the font. Only a few fonts work that way, don't? And most importantly there is no such free font now. Hmm... Like the way you suggested in bug 95708, hangul-xft could chose how to render according to the font's capability. Well, of course it'll be dirty hacks... I can not agree to make hangul-xft works more optimally with a few proprietary fonts and more poorly with other fonts, including free ones. > If CHOI Hwan-jin's new XIM uses > ksc5601.1987-0 fonts, he has to fix his XIM to use > iso10646-1 fonts. Pango should not be held responsible > for problems of other programs. It's not an XIM but a GTK+ native input module. You should know that the GTK+ input modules don't know what font it uses. They can't choose their fonts.
> About the standard: now I understand why you think the hangul shaper > violates Unicode. But I think it doesn't. I think I was not very careful in my choice of word. I should not have used 'not compliant' or 'violate'. > OK but it still might be considered.. Your patch could make it > prettier or (IMO) uglier, depending on the font. Sure, this is where different people can have different opinions. However, with your current code, EVEN if I use fonts that can do some *rudimentary* 'magic', I have to live with less-than-desirable result. With my patch applied, Pango can blame fonts and we can tell people to use fonts like CODE2000 or PARK Won-kyu's BDF fonts instead of Arial MS Unicode or Cyberbit (or Ogulim when my code to make use of it is put in place) > Like the way you suggested in bug 95708, hangul-xft > could chose how to render according to the font's capability. Well, > of course it'll be dirty hacks... It's as dirty as special-casing johab(sh)-1 X11 BDF fonts in hangul-x.c > I can not agree to make hangul-xft > works more optimally with a few proprietary fonts Proprietary fonts? CODE2000 is free(not GPLed, but nonetheless freely available. How many GPLed Korean TTFs do we have? ). So are PARK Won-kyu's. Next release of Baekmuk TTFs can have combining/non-spacing glyphs for Hangul Conjoining Jamos. > and more poorly with other fonts, including free ones. Well well... 'More poorly' is certainly debatable. Even if it is not, I can still make a case for my patch. As I wrote above, Pango should not try to take a blame for what it cannot do anything over. If some people decide to use fonts without combining/non-spacing glyphs for Hangul Conjoining Jamo, they cannot blame Pango. They do have a choice(free fonts with more or less correct glyphs for Jamos are available) and make a wrong choice. Trying to make up for their wrong choice, Pango should not 'punish' others who make a better choice. > Only a few fonts work that way, don't? > And most importantly there is no such free font now. To begin with, we don't have many free TTFs for Korean (we have only a few free TTFs, don't we?) CODE2000 is freely available and does the 'magic'. So are/do PARK Won-kyu's X11 BDF fonts in iso10646-1. Besides, PARK Won-kyu's X11 BDF fonts can be converted to TTFs with appropriate spacing. It can happen anytime soon. Of course, eventually this has to be dealt with in a way similar to the way Indic scripts are handled using Opentype fonts. Unfortunately, there's no free OTF with appropriate Opentype tables for Hangul Conjoining Jamos. It's frustrating that Microsoft (or Korean commerical foundries) already have a few such OTFs, but haven't published any spec. For Indic scripts, MS published the full spec. and many people have been working on making free Indic OTFs. Pango also take advantage of the published spec as well, I believe. > In my way, the hangul > shaper renders a *single* syllable but just with several glyphs; a > syllable glyph and jamos' glyphs. > It separates jamos as syllables as > specified in the standard. It could well be used as a fallback, but only if other alternatives are exhausted, which I'm afraid is not the case (Xft) here. Nonetheless, I admit that it's a debatable point. > If CHOI Hwan-jin's new XIM uses >> ksc5601.1987-0 fonts, he has to fix his XIM to use >> iso10646-1 fonts. Pango should not be held responsible >> for problems of other programs. >It's not an XIM but a GTK+ native input module. You should know that >the GTK+ input modules don't know what font it uses. They can't >choose their fonts. If they can't, can I assume that it'll be given a font with the widest coverage by Gtk+(instead of fonts in ksc5601.1987-0 with a very poor coverage)? With Baekmuk fonts covering the full repertoire of Hangul syllables and available in iso10646-1 encoding (in an unlikely case Gtk+ input module cannot use Xft fonts - in TTF - and can only use X11 core fonts. even X11 core fonts can be outline thanks to FT and X-tt module), how much odd do you think your example given in your previous comment (a syllable not covered by ksc5601.1987-0 font in the middle of input) happening have? Anyway, for ksc5601.1987-0, it's perfectly all right with me to leave '>='. > When I wrote the code only Unicode 2.0 is out so it stil doesn't know > about the Hangul tone "M". hangul-xft.c was written this summer, wasn't it? All right. hangul-x.c must have been written a few years ago. > I don't know how to interpret it. Plz > file a bug if there's any issue around it. Hangul tone marks are non-spacing/combining and can follow a Hangul precomposed syllable or a sequence of Hangul Conjoining Jamos forming a syllable. Although they follow a Hangul syllable or a seq. of Hangul Jamos, they have to be rendered to the *left* of the preceeding Hangul syllable. Therefore, they have to be put at the begiining of a seq. of glyphs (that is, reordering is necessary) as is done by Yudit. CODE2000 and PARK Won-kyu's X11 BDF fonts have glyphs for them (combining/non-spacing).
(I'm very tired by your long and detailed replies. You don't have to copy my previous replies all the time. It's not a mailing list or Usenet.) 1. CODE2000 is NOT free (libre, DFSG free, OSD compliant, ...). 'GPL' is just one free license. I won't say much about it. 2. I agree to make the capable fonts work better than now. But also, I don't wanna make the poor fonts work worse than now. 'Just replacing >= with ==' is not an option. 3. When I first wrote the hangul-x, it was to render Hangul 'best as possible with the given font'. If I ever wanted to blame poor,broken fonts, I drop ksc5601.1987-0 support. Even when the font is not very capable, hangul shaper still can do its best with that font. I want such approach, not just relying on the font's capability. My conclusion is: I think doing some font specific handling is the best solution. The default option can be '==' or '>=', but either should not make worse to the old, poor, and broken fonts supports. > If they can't, can I assume that it'll be given a font with > the widest coverage by Gtk+(instead of fonts in ksc5601.1987-0 > with a very poor coverage)? No, input modules do not take care of displaying their preedit strings. They just pass the strings to GTK+. Well, more than 90% of Korean X users still use the poor ksc5601.1987-0 fonts.
> 1. CODE2000 is NOT free (libre, DFSG free, OSD compliant, ...). 'GPL' > is just one free license. I won't say much about it. So, do you want to 'punish' users who choose to install it for not agreeing with you on the definition of being free? > 2. I agree to make the capable fonts work better than now. > But also, > I don't wanna make the poor fonts work worse than now. 'Just > replacing >= with ==' is not an option. Can you tell me what poor fonts you're talking about here? Don't tell me you're talking about ksc5601.1987-0 fonts. It's nothing to do with hangul-xft.c > No, input modules do not take care of displaying their preedit > strings. They just pass the strings to GTK+. The result is the same. Gtk+ will pick a font with the best coverage, won't it? If not, it has to be fixed, IMHO. If it's end-user-configurable, it's her/his responsibility to pick the best font (s)he has. > Well, more than 90% of > Korean X users still use the poor ksc5601.1987-0 fonts. How did you come up with 90%? Baekmuk TTFs are widely available and they're GPLed. With FT/X-TT module of XFree86 4.x, they're presented as X11 core fonts in ksc5601.1987-0, ksc5601.1992-3 and iso10646-1 encoding. Moreover, like RH 8.0 just does, other Linux distros will (if they have not done already) begin to support Xft. All right. You wrote X users not Linux/FreeBSD/OpenBSD/NetBSD users. Solaris, AIX, Tru64 have been supporting ko_KR.UTF-8 for a few years and at least Solaris is shipped with a set of commercial CID-keyed fonts with the full coverage of Hangul syllables that can be presented as ksc5601.1992-3 X11 core fonts. Simply put, you don't have to worry about commerical Unix/X users. Even if only ksc5601.1987-0 fonts are available to some people, replacing '>=' with '==' in hangul-xft.c doesn't affect those poor souls at all. How many times do I have to write that you CAN LEAVE '>=' alone in render_syllable_with_ksc5601() in hangul-x.c? Three times, four times, five times? Is it now clear enough?
Well, you don't distinguish betweeen 'free' and 'GPLed'? Baekmuk is free but NOT 'GPLed'. Baekmuk has MIT/X like license. In the first, you should know Bugzilla is not a good flaming place. If you want more flame, go to gnu.discuss newsgroup. The poor fonts includes the 'broken' TTF fonts, including Baekmuk TTF. Even in hangul-xft, some broken TTF fonts don't have no such 'magic'. Just replacing >= with == will make them worse. And don't get me tired; again, I want to make the capable fonts support better, but without sacrificing of the current poor fonts fonts support.
> Well, you don't distinguish betweeen 'free' and 'GPLed'? Baekmuk is > free but NOT 'GPLed'. Baekmuk has MIT/X like license. You're absolutely right. It was another instance of my momnetary lapse of memory, which unfortunately happens so often these days. > In the first, > you should know Bugzilla is not a good flaming place. If you want > more flame, go to gnu.discuss newsgroup. Well, your definition of the word 'flaming' must be different from mine. I don't recall I ever did any flaming here. Anyway, here goes my apology if you felt that way. As for Baekmuk being poor, I can make a case for hangul-x.c (render_with_ksc5601()) and hangul-xft.c being different. As you know too well, ksc5601.1987-0 has only 2350 precomposed syllables while Baekmuk TTFs have 11,172 precomposed syllables (at least Baekmuk batang and Baekmuk Gulim do). Therefore, when Baekmuk Batang/Gulim are used, NO modern Hangul syllable would be rendered as a series of stand-alone Jamos as would happen with ksc5601.1987-0 fonts. That is, whether '>=' or '==' is used, Jamo rendering routine wouldn't be reached at all and your so-called 'widening-narrowing' problem wouldn't happen with Baekmuk batang/gulim. Now let me talk about generic Hangul syllables (represented with a sequence of Hangul Jamos) that don't have precomposed forms in U+AC00 block. You and I disagree on which is a better way of rendering them. You want to render them as 'a precomposed syllable' followed by stand-alone Jamo glyphs if a part of that generic sequence can form a precomposed syllable in U+AC00 block. Although I don't agree with you on this point (I think a series of stand-alone Jamo glyphs for a whole syllable is better partly because that was what some prominent Korean linguists - e.g. Choo Shi-gyung - in the early 20th century tried to implement), I'm willing to grant you a point here. So, let's say we're tied on this issue. In other words, some people would agree to your view of a better rendering while others would not and rather agree to my view of a better rendering. Put it in yet another way, I don't think you can simply dismiss my change as shooting for a better result with a small set of fonts at the sacrifice of rendering quality with 'poor' fonts. Because we're tied, we need a tie-breaker. As I already wrote, your current code blocks even a capable font from rendering a generic Hangul syllable. With my change, users have a choice at their disposal. If they care about generic Hangul syllables and are not satisfied with the rendering quality obtained with 'poor' fonts, they can buy/get/download/install/whatever fonts that do the right thing. They can also contribute to enhance Baekmuk fonts or other fonts. If I'm successful making my case, that's great. If not, let's just move on. Why don't you commit(if you can) only a part of my patch that fixes 'buffer' problem. Then, we can resolve this issue in a different way. Perhaps, we can look into a font and do things differently depending on whether they have non-spacing/combing glyphs for Hangul vowels and trailing consonants, which is a clear sign that it can do some 'rudimentary magic'. Hopefully, looking into a font this way is not so expensive an operation.
Well, Baekmuk also has the widening-and-narrowing problem if a (future) input method supports non-modern Jamo input. And I don't like controling users that way, especially if the 'better' choice is a proprietary one. You know, few people work on improving Baekmuk to make their desktop better. They just learn how to use proprietary fonts in Linux. Anyway I prefer doing the best thing as possible with the given font, rather than relying on the font. I think the best thing is the way you suggested in the last; works differently depending on font.
Patch applied. See bug 95730 for stable branch. 2002-10-15 Changwoo Ryu <cwryu@debian.org> * modules/hangul/hangul-xft.c (hangul_engine_shape): * modules/hangul/hangul-x.c (hangul_engine_shape): Added missing memcpy() from the static jamo buffer to allocated jamo buffer (#95569). Thanks to Jungshik Shin. (If you want more fixes or features from the hangul shaper, please don't hesitate to file bugs or drop mails to me. It'll not be very easy to fix even my code, when pango 1.1 become stable.)
Created attachment 11550 [details] [review] a bare-bone patch
Ooops. Sorry that your comment about commiting my patch was in transit while I attached a bare-bone patch. That can be now ignored. Thank you for committing it. As for improving hangul-xft, let's keep on talking in bug 95708. Now I've got a skeleton of code to resolve bug 95708. As for fixing some obscure problems in hangul-x.c (addressed my second attachment), I'll open a new bug and post my patch there.
In your patch, you used 'gunichar2', but jamos_static and jamos are gunichar of which size is diff. from that of gunichar2. > memcpy(jamos, jamos_static, n_jamos*sizeof(gunichar2)); Was it a typo?
> > memcpy(jamos, jamos_static, n_jamos*sizeof(gunichar2)); > > Was it a typo? Fixed now, thanks.
> As for fixing some obscure problems in hangul-x.c (addressed my > second attachment), I'll open a new bug and post my patch there. It's bug 95800. Can you take a look?
I think someone who has the right (maybe the reporter or Owen?) can mark this bug as RESOLVED. Many things were discussed but not directly related with this bug...
I don't have enough privil. to change the status to 'resolved' from 'unconfirmed'. Owen, can you change the status? All other issues discussed here have been filed as separate bugs.
Moving bugs to new hangul component
Marking this bug as RESOLVED FIXED
*** Bug 109699 has been marked as a duplicate of this bug. ***