GNOME Bugzilla – Bug 96299
Hangul tone mark handling in Hangul-xft shaper
Last modified: 2004-12-22 21:47:04 UTC
spun off from bug 95708. Hangul tone marks U+302E and U+302F are non-spacing/combining characters that follow a Hangul syllable(either represented by U+1100 Jamos or U+AC00 precomposed syllables). There are a few fonts with combining glyphs for them, but most fonts don't even have glyphs for them, in which case a fallback glyph has to be used. For fonts with combining glyphs, not much has to be done on the part of Pango. However, when either a font has a spacing glyph or a font doesn't have a glyph, x-offset and width of PangoGlyph structure have to be adjusted as necessary. A similar approach is used in Thai shaper. I have a patch for the simplest case (included in my patch for bug 95708). I'll try to separate it out and put it up here. Another issue is that U+AC00 is dealt with by basic-xft at the moment. If we want to support Hangul tone marks, U+AC00 precomposed syllables have to be handled by hangul-xft as well. However, we can put it off for now and consider it in the future in another bug.
Created attachment 11702 [details] [review] a simplistic patch
My patch at the moment only works with fonts with combining glyphs for Hangul tone marks. Code2000 is one of them. It also doesn't deal with hangul-x.c. PARK Won-kyu's 10646 bitmap fonts have necessary combining glyphs for Hangul tone marks. To make it simple for the present, I want to focus on hangul-xft now and later we can have another bug to add tone mark support in hangul-x.
Created attachment 11703 [details] [review] patch v2(forgot to set the coverage for tone marks)
Created attachment 11704 [details] [review] patch v3(sorry for spamming. forgot hangul-defs.h)
Created attachment 11727 [details] [review] a new patch(syllable boundary check was missing)
I found out why tone marks are not rendered as expected when a font without glyphs for them is used and thus a fallback glyph has to be used. The fallback glyph at leaat has to be the left of a syllable, but it's rendered to the right of a syllable. Somewhere before hangul-shaper is invoked, a sequence of Jamos followed by a tone mark is broken into two separate clusters, a seq. of Jamos and a tone mark. It should not be pango/break.c because it does the right thing with Hangul tone mark. I'll try to track it down. I'll also modify render_tone so that when a fallback glyph(spacing) is used, width and x-offset are adjusted to emulate combining-glyph behavior as in Thai shaper.
Right after writing the prev. comment, I realized that basic-xft handles hangul tone marks. However, removing Hangul tone marks from basic-xft didn't make any difference in terms of rendering result when a font without tone marks is used except that render_tone in hangul-xft.c is indeed used. It turns out that Pango 'analysis' breaks up a seq. of Jamo + tone mark into two chunks, a seq. of Jamo and a tone mark and invokes hangul shaper twice, once with a seq. of Jamo and 'the default font' (a font specified downstream) and the other with the tone mark and a font Pango 'analysis' detected to have a glyph for the tone mark. If that 'fallback' font automatically picked by Pango 'analysis' has a spacing glyph for the tone mark, the result is less than optimal.... 'char/glyph-level-fallback' in hangul-xft.c would never be invoked because font-substition-fallback occurs upstream.....
Created attachment 11742 [details] [review] a patch with a fix for spacing tone mark
oops. The first half of attachment 11742 [details] [review] has to be ignored. I accidentally used '>>' when I meant '>' while making a diff. What it does: - remove Hangul tone marks from basic-xft - Make Hangul tone marks rendered more or less correctly with a font with spacing glyphs for them. (set_glyph_tone()). For instance, MS Arial Unicode works well. As before, rendering with fonts with non-spacing glyphs for them works well. (code2000) - skip any tone mark in a series of tone marks other than the first one. However, rendering with fonts WITHOUT glyphs for tone-mark does not work. Because font-substitution takes away the preceding Jamo seq. from Hangul tone mark. Although some kludge can be done in this situation, it's not reliable and I didn't include it. (it works for the case when a 'well-formed' syllable comes before a tone mark). The best(??) way to solve this problem appears to be to modify fontconfig so that 'charset' is editable in fonts.conf... Then, Pango wouldn't have to do anything other than just believing what fontconfig tells it...
Created attachment 11819 [details] [review] a patch (with a work-around for a font w/o glyphs for tone marks)
attachment 11819 [details] [review] includes a work-around for fonts without glyphs for Hangul tone marks mentioned in my prev. comment. Hack as it may be, it works more or less(although it looks a bit crowded !) if the preceding cluster is either a precomposed syllable or a syllable (represented in a Jamo sequence) that can be rendered as a modern or pre-1933 syllable (instead of a series of enumerated Jamos). That covers over 1.5 million syllables(all known cases) if NewGulim(or Olg Gulim) font is used. With Baekmuk fonts, tone marks work for 11,172 syllables(represented in U+AC00 syllable or U+1100 Jamos) and stand-alone Jamos. Needless to say, the easiest way to work around this problem is add glyphs for Hangul Tone marks to Baekmuk fonts and other free fonts. I'll do that soon. Owen, could you take a look and comment?
http://jshin.net/i18n/korean/tonemarks.sfd have outlines for two tone marks. (four other glyphs in the file are for U+20A9 WON Sign) They can be copied and pasted into U+302E and U+302F positions of Baekmuk Batang/Gulim/Dotum/Hline with pfaedit. Both glyphs have negative Lbearing and zero width to make them combine with the preceding glyph/glyph cluster at the leftmost position as is the case in Code2000. I may have to adjust the vertical position a little to make them fit better with glyphs of Baekmuk fonts. Anyway, modified Baekmuk fonts work fine with my latest patch.
Created attachment 11853 [details] [review] a new patch(fix 'off by 1' error in calling glyph_string_extents_range(). bug 96843)
> 'char/glyph-level-fallback' in hangul-xft.c would never be > invoked because font-substition-fallback occurs upstream..... I guess you have already resolved your problem..? $(sysconfdir)/pango/pango.modules file should always be updated when you change any module's rendering ranges.
>> 'char/glyph-level-fallback' in hangul-xft.c would never be >> invoked because font-substition-fallback occurs upstream..... >I guess you have already resolved your problem..? >$(sysconfdir)/pango/pango.modules file should always be updated when >you change any module's rendering ranges. Sure, the file is updated automatically every time I make a change and run 'make install' in the topmost directory of pango tree. However, that has little to do with what I wrote about above. If that file had not been updated, that would have blocked me from solving my problem __even with__ the solution I proposed a couple of times and am gonna write about again below (solving it at fontconfig level) implemented. My problem has exactly the same cause as the problem of 'fallback' not being invoked in your original hangul-xft.c (for that matter, my modified hangul-xft.c). The upstream font-substitution fallback mechanism (fontconfig + Pango's analysis routine) prevents fonts like Baekmuk Batang (without glyphs for U+1100 Jamos) from being used to render U+1100 Jamos although hangul-xft.c has its own fallback mechanism. Because of this, fallback mechanism in hangul-xft.c for both Hangul Jamos and tone marks is NEVER invoked if fonts like Baekmuk Batang are specified. Instead, fonts with glyphs for Hangul Jamos and tone markes are handed over to hangul-xft.c by what I'm calling upstream font-level-substition mechanism. This 'font-substitution' is kinda double-edged sword. As I wrote a couple of times, it'd be nice if end-users (advanced) could override this font-substitution mechanism of fontconfig (fonts.conf) by editing 'charset' property. (this would also make it possible to use hack/custom-encoded fonts for Hangul Jamos - e.g. Oxxx fonts- and Indic scripts) with little change in Pango's analyis routine). However, Keith doesn't seem very font of the idea.... Anyway, for Hangul tonemarks, it's not a big deal. Firstly, I already made glyphs for Baekmuk fonts and expect them to be included in next release of Baekmuk fonts. Secondly, even without adding those glyphs to fonts like Baekmuk, my set_glyph_tone() can do a reasonable job if a well-formed-syllable is followed by a tone mark.
Owen and cwryu, Could take a look at my latest patch and give some comments? As far as I can tell, it does all I want it to do the way I want it to and can be checked into HEAD barring some obvious problems I may have missed. TIA,
Yeah, yeah you don't have to repeat all the theories all the time... OK, now I know it was the same problem which suffers the bug 86591 and bug 95708.. If there's no other issue, I'll commit the patch with some GTK+ coding style fixes.
Applied the patch to the HEAD. 2002-11-01 Changwoo Ryu <cwryu@debian.org> * modules/hangul/hangul-defs.h modules/hangul/hangul-xft.c: Added Hangul Tone Marks rendering by Jungshik Shin (#96299). But I have not applied your 'rough guessing' code. It might look OK or not. Even if it looks OK, it won't work correctly anyway as the preceding syllable/jamos and tone mark are treated as separate clusters by pango (think about cursor movement, backspace/delete, etc.).
Thank you for commiting it. However, to put it very mildly, it'd have been better if you had discussed the issue of 'rough guess' with me in advance. As it's committed, unfortunately it breaks things without avoiding two problems you mentioned (cursor movement and deletion. Actually, IMO, deleting a tonemark by itself is not a problem but rather an unintended side-benefit). I don't know how much time you spent playing with various cases and various algorithms, but I'm almost sure I've played with this a lot longer than you have. Firstly, my comment about having info. on the preceding syllable without 'if (i) '-clause doesn't make sense because when 'i' is zero, we don't have info on the preceding syllable. Secondly, the following doesn't work with i = 0(i.e. j=-1). ---------------- pango_glyph_string_extents_range (glyphs, j + 1, i, font, NULL, &logical_rect_cluster); /* logical_rect_cluster.width is all the offset we need so that the * inherent x_offset in the glyph (ink_rect.x) should be * canceled out. */ glyphs->glyphs[i].geometry.x_offset = - logical_rect_cluster.width - ink_rect.x ; -------------------- When a glyph for tone mark has an inherent x_offset, this should NOT be canceled out if i = 0 (j=-1, logical_rect_cluster.width = 0). Canceling it out would put a tonemark to the right of the preceding syllable even if a glyph for a tonemark has an inherent negative x_offset and zero-logical-width. For a spacing tone mark, it's even worse. The following wouldn't work as intended for i=0(j=-1) case: --------------- /* make an additional room for a tone mark if it has a spacing glyph * because that's likely to be an indication that glyphs for other * characters in the font are not designed for combining with tone marks.*/ if ( logical_rect.width ) { glyphs->glyphs[i].geometry.x_offset -= ink_rect.width; glyphs->glyphs[j + 1].geometry.width += ink_rect.width; glyphs->glyphs[j + 1].geometry.x_offset += ink_rect.width; } -------------- Given all these, I beleive that my 'if (i) clause' and rough-guess fallback have to be put in as in my latest patch.
For example: syllableA + syllableB + tonemark Assume the cursor is between syllableA and syllableB. DELETE key only removes syllableB. Then syllableA will have the tonemark which was once at syllableB. Is it what you want? And please read what are different from your patch. The 'i' will never be zero. I fixed hangul_engine_shape so it ignore any tone mark which does not follow any syllable/jamo sequence. Well, it unfortunately mades all tonemarks disappear if Xft(or Pango) gives a tone mark as separate chunks. But it is what I intended.
Moving bugs to new hangul component
See also bug 100625
I guess this bug can be closed when my patch to bug 100625 is committed. I've got a small patch for a minor enhancement here, but I guess we can do it in 1.2.1 phase.
Closing now that bug 100625 is fixed - if you have another patch, please file it as a separate bug. (It gets hard to keep track of a bug report that is this long)