After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 96299 - Hangul tone mark handling in Hangul-xft shaper
Hangul tone mark handling in Hangul-xft shaper
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: hangul
1.1.x
Other Linux
: Normal normal
: 1.2.0
Assigned To: Changwoo Ryu
Owen Taylor
Depends on:
Blocks:
 
 
Reported: 2002-10-20 09:15 UTC by Jungshik Shin
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a simplistic patch (1.77 KB, patch)
2002-10-20 09:43 UTC, Jungshik Shin
none Details | Review
patch v2(forgot to set the coverage for tone marks) (2.13 KB, patch)
2002-10-20 09:50 UTC, Jungshik Shin
none Details | Review
patch v3(sorry for spamming. forgot hangul-defs.h) (1.67 KB, patch)
2002-10-20 10:02 UTC, Jungshik Shin
none Details | Review
a new patch(syllable boundary check was missing) (3.23 KB, patch)
2002-10-21 13:07 UTC, Jungshik Shin
none Details | Review
a patch with a fix for spacing tone mark (11.96 KB, patch)
2002-10-21 22:29 UTC, Jungshik Shin
none Details | Review
a patch (with a work-around for a font w/o glyphs for tone marks) (6.20 KB, patch)
2002-10-25 01:15 UTC, Jungshik Shin
none Details | Review
a new patch(fix 'off by 1' error in calling glyph_string_extents_range(). bug 96843) (6.82 KB, patch)
2002-10-26 18:10 UTC, Jungshik Shin
none Details | Review

Description Jungshik Shin 2002-10-20 09:15:36 UTC
spun off from bug 95708.

Hangul tone marks U+302E and U+302F are non-spacing/combining
characters that follow a Hangul syllable(either represented
by U+1100 Jamos or U+AC00 precomposed syllables). There are
a few fonts with combining glyphs for them, but most fonts
don't even have glyphs for them, in which case a fallback
glyph has to be used. For fonts with combining glyphs,
not much has to be done on the part of Pango. However, 
when either a font has a spacing glyph or a font doesn't
have a glyph, x-offset and width of PangoGlyph structure
have to be adjusted as necessary. A similar approach
is used in Thai shaper. 

I have a patch for the simplest case (included in my patch
for bug 95708). I'll try to separate it out and 
put it up here. 

Another issue is that U+AC00 is dealt with by basic-xft
at the moment. If we want to support Hangul tone marks,
U+AC00 precomposed syllables have to be handled by
hangul-xft as well. However, we can put it off for now
and consider it in the future in another bug.
Comment 1 Jungshik Shin 2002-10-20 09:43:55 UTC
Created attachment 11702 [details] [review]
a simplistic patch
Comment 2 Jungshik Shin 2002-10-20 09:48:19 UTC
My patch at the moment only works with fonts with combining glyphs
for Hangul tone marks. Code2000 is one of them. It also 
doesn't deal with hangul-x.c. PARK Won-kyu's 10646 bitmap fonts
have necessary combining glyphs for Hangul tone marks. 
To make it simple for the present, I want to focus on hangul-xft
now and later we can have another bug to add tone mark support
in hangul-x. 
Comment 3 Jungshik Shin 2002-10-20 09:50:52 UTC
Created attachment 11703 [details] [review]
patch v2(forgot to set the coverage for tone marks)
Comment 4 Jungshik Shin 2002-10-20 10:02:10 UTC
Created attachment 11704 [details] [review]
patch v3(sorry for spamming. forgot hangul-defs.h)
Comment 5 Jungshik Shin 2002-10-21 13:07:58 UTC
Created attachment 11727 [details] [review]
a new patch(syllable boundary check was missing)
Comment 6 Jungshik Shin 2002-10-21 17:09:05 UTC
I found out why tone marks are not rendered as expected when
a font without glyphs for them is used and thus a fallback glyph
has to be used. The fallback glyph at leaat has to be the left
of a syllable, but it's rendered to the right of a syllable.

 Somewhere before hangul-shaper is invoked, a sequence of Jamos 
followed by a tone mark is broken into two separate clusters, 
a seq. of Jamos and a tone mark. It should not be pango/break.c
because it does the right thing with Hangul tone mark. I'll
try to track it down.

  I'll also modify render_tone so that when a fallback
glyph(spacing) is used, width and x-offset are adjusted to emulate
combining-glyph behavior as in Thai shaper.
  
Comment 7 Jungshik Shin 2002-10-21 18:39:12 UTC
Right after writing the prev. comment, I realized that basic-xft
handles hangul tone marks. However,
removing Hangul tone marks from basic-xft didn't make any difference
in terms of rendering result when a font without tone marks is 
used except that render_tone in hangul-xft.c is indeed used.  

It turns out that 
Pango 'analysis' breaks up a seq. of Jamo + tone mark
into two chunks, a seq. of Jamo and a tone mark and invokes
hangul shaper twice, once with a seq. of Jamo and 'the default
font' (a font specified downstream) and the other with the tone
mark and a font Pango 'analysis' detected to have a glyph
for the tone mark. If that 'fallback' font automatically
picked by Pango 'analysis' has a spacing glyph for the tone mark,
the result is less than optimal....

'char/glyph-level-fallback' in hangul-xft.c would never be 
invoked because  font-substition-fallback occurs upstream.....

Comment 8 Jungshik Shin 2002-10-21 22:29:47 UTC
Created attachment 11742 [details] [review]
a patch with a fix for spacing tone mark
Comment 9 Jungshik Shin 2002-10-21 22:40:32 UTC
oops. The first half of attachment 11742 [details] [review] has to be ignored.
I accidentally used '>>' when I meant '>' while making a diff.

What it does:

  - remove Hangul tone marks from basic-xft
  - Make Hangul tone marks rendered more or less
    correctly with a font with spacing glyphs for
    them. (set_glyph_tone()). For instance, MS Arial Unicode
    works well.  As before, rendering
    with fonts with non-spacing glyphs for them
    works well. (code2000) 
  - skip any tone mark in a series of tone marks
    other than the first one.

  However, rendering with fonts WITHOUT glyphs for
tone-mark does not work. Because font-substitution
takes away the preceding Jamo seq. from Hangul tone mark.
Although some kludge can be done in this situation, 
it's not reliable and I didn't include it. (it works for 
the case when a 'well-formed' syllable comes before a tone mark).
    
  The best(??) way to solve this problem appears to be
to modify fontconfig so that 'charset' is editable in
fonts.conf... Then, Pango wouldn't have to do anything
other than just believing what fontconfig tells it...
Comment 10 Jungshik Shin 2002-10-25 01:15:21 UTC
Created attachment 11819 [details] [review]
a patch (with a work-around for a font w/o glyphs for tone marks)
Comment 11 Jungshik Shin 2002-10-25 01:51:08 UTC
attachment 11819 [details] [review] includes a work-around for fonts without glyphs
for Hangul tone marks mentioned in my prev. comment.
Hack as it may be, it works more or less(although it looks a bit
crowded !) if the preceding 
cluster is either
a precomposed syllable or a syllable (represented in a Jamo sequence) 
that can be rendered as a modern or pre-1933 
syllable (instead of a series of enumerated Jamos). That covers
over 1.5 million syllables(all known cases) if NewGulim(or Olg Gulim)
font is used. With Baekmuk fonts, tone marks work for 11,172
syllables(represented in U+AC00 syllable or U+1100 Jamos) and
stand-alone Jamos.

Needless to say, the easiest way to work around this problem
is add glyphs for Hangul Tone marks to Baekmuk fonts and other
free fonts. I'll do that soon. 


Owen, could you take a look and comment? 
Comment 12 Jungshik Shin 2002-10-25 20:08:55 UTC
http://jshin.net/i18n/korean/tonemarks.sfd
have outlines for two tone marks. (four other glyphs
in the file are for U+20A9 WON Sign)
They can be copied and pasted into U+302E and U+302F
positions of Baekmuk Batang/Gulim/Dotum/Hline
with pfaedit. Both glyphs have negative Lbearing
and zero width to make them combine with the preceding
glyph/glyph cluster at the leftmost position
as is the case in Code2000. 

I may have to adjust the vertical position a little
to make them fit better with glyphs of Baekmuk fonts.
Anyway, modified Baekmuk fonts work fine with my latest patch.
 
Comment 13 Jungshik Shin 2002-10-26 18:10:50 UTC
Created attachment 11853 [details] [review]
a new patch(fix 'off by 1' error in calling glyph_string_extents_range(). bug 96843)
Comment 14 Changwoo Ryu 2002-10-30 18:52:40 UTC
> 'char/glyph-level-fallback' in hangul-xft.c would never be 
> invoked because  font-substition-fallback occurs upstream.....

I guess you have already resolved your problem..?

$(sysconfdir)/pango/pango.modules file should always be updated when
you change any module's rendering ranges.
Comment 15 Jungshik Shin 2002-10-30 21:16:53 UTC
>> 'char/glyph-level-fallback' in hangul-xft.c would never be 
>> invoked because  font-substition-fallback occurs upstream.....

>I guess you have already resolved your problem..?

>$(sysconfdir)/pango/pango.modules file should always be updated when
>you change any module's rendering ranges.

  Sure, the file is updated automatically every time I
make a change and run 'make install' in the topmost directory of 
pango tree. However, that has little to do with
what I wrote about above. If that file had not been
updated, that would have blocked me from solving
my problem __even with__ the solution I proposed a couple
of times and am gonna write about again below
(solving it at fontconfig level) implemented.  


My problem has exactly the same cause
as the problem of 'fallback' not being invoked in your original
hangul-xft.c (for that matter, my modified hangul-xft.c).  
The upstream font-substitution fallback mechanism  
(fontconfig + Pango's analysis routine) prevents fonts like
Baekmuk Batang (without glyphs for U+1100 Jamos) from being
used to render U+1100 Jamos although hangul-xft.c has its
own fallback mechanism. Because of this, fallback mechanism
in hangul-xft.c for both Hangul Jamos and tone marks
is NEVER invoked if fonts like Baekmuk Batang are specified.
Instead, fonts with glyphs for Hangul Jamos and tone markes
are handed over to hangul-xft.c by what I'm calling
upstream font-level-substition mechanism. 

  This 'font-substitution' is kinda double-edged sword. As I wrote a
couple of times, it'd be nice if end-users
(advanced) could override this font-substitution mechanism
of fontconfig (fonts.conf) by editing 'charset' property. 
(this would also make it possible to use hack/custom-encoded
fonts for Hangul Jamos - e.g. Oxxx fonts-  and Indic scripts)
with little change in Pango's analyis routine). However,
Keith doesn't seem very font of the idea....
 
Anyway, for Hangul tonemarks, it's not a big deal. 
Firstly, I already made glyphs for Baekmuk fonts
and expect them to be included in next release of Baekmuk fonts.
Secondly, even without adding those glyphs to fonts like
Baekmuk, my set_glyph_tone() can do a reasonable job
if a well-formed-syllable is followed by a tone mark. 
Comment 16 Jungshik Shin 2002-10-31 01:12:34 UTC
Owen and cwryu,
Could take a look at my latest patch and give some comments? 
As far as I can tell, it does all I want it to do the way I want
it to and can be checked into HEAD barring some obvious problems
I may have missed.

TIA,
Comment 17 Changwoo Ryu 2002-10-31 02:58:14 UTC
Yeah, yeah you don't have to repeat all the theories all the time...

OK, now I know it was the same problem which suffers the bug 86591 and
bug 95708..   If there's no other issue, I'll commit the patch with
some GTK+ coding style fixes.
Comment 18 Changwoo Ryu 2002-11-01 04:08:41 UTC
Applied the patch to the HEAD.

2002-11-01  Changwoo Ryu  <cwryu@debian.org>

	* modules/hangul/hangul-defs.h modules/hangul/hangul-xft.c:
	Added Hangul Tone Marks rendering by Jungshik Shin (#96299).


But I have not applied your 'rough guessing' code.  It might look OK
or not.  Even if it looks OK, it won't work correctly anyway as the
preceding syllable/jamos and tone mark are treated as separate
clusters by pango (think about cursor movement, backspace/delete, etc.).
Comment 19 Jungshik Shin 2002-11-01 10:17:28 UTC
Thank you for commiting it.  
However, to put it very mildly, it'd have been better if you had 
discussed the issue of 'rough guess' with me in advance. 

As it's committed, unfortunately it breaks things without
avoiding two problems you mentioned (cursor movement and
deletion. Actually, IMO, deleting a tonemark by itself is
not a problem but rather an unintended side-benefit). 
I don't know how much time you spent playing
with various cases and various algorithms, 
but I'm almost sure I've played with this a lot longer than you have. 

Firstly, my comment about having info. on the
preceding syllable without 'if (i) '-clause doesn't make
sense because when 'i' is zero, we don't have 
info on the preceding syllable. 

Secondly, the following doesn't work with i = 0(i.e. j=-1).

----------------  
 pango_glyph_string_extents_range (glyphs, j + 1, i, font, 
					NULL, &logical_rect_cluster); 
   /* logical_rect_cluster.width is all the offset we need so that the
    * inherent x_offset in the glyph (ink_rect.x) should be 
    *  canceled out. */

    glyphs->glyphs[i].geometry.x_offset = 
               - logical_rect_cluster.width - ink_rect.x ;

--------------------
When a glyph for tone mark has an inherent x_offset, this
should NOT be canceled out if i = 0 (j=-1, 
logical_rect_cluster.width = 0). Canceling it out would put
a tonemark to the right of the preceding syllable 
even if a glyph for a tonemark has an inherent negative x_offset
and zero-logical-width. 

For a spacing tone mark,  it's even worse.
The following wouldn't work as intended for i=0(j=-1) case:
---------------
/* make an additional room for a tone mark if it has a spacing glyph
 * because that's likely to be an indication that glyphs for other 
 * characters in the font are not designed for combining with tone 
marks.*/

  if ( logical_rect.width )
  {
    glyphs->glyphs[i].geometry.x_offset -= ink_rect.width;
    glyphs->glyphs[j + 1].geometry.width += ink_rect.width;
    glyphs->glyphs[j + 1].geometry.x_offset += ink_rect.width;
  }
--------------



Given all these, I  beleive that 
my 'if (i) clause' and rough-guess fallback have to be put in 
as in my latest patch.  
  

Comment 20 Changwoo Ryu 2002-11-01 13:35:38 UTC
For example:

syllableA + syllableB + tonemark

Assume the cursor is between syllableA and syllableB.  DELETE key only
 removes syllableB.  Then syllableA will have the tonemark which was
once at syllableB.  Is it what you want?


And please read what are different from your patch.  The 'i' will
never be zero.  I fixed hangul_engine_shape so it ignore any tone mark
which does not follow any syllable/jamo sequence. 

Well, it unfortunately mades all tonemarks disappear if Xft(or Pango)
gives a tone mark as separate chunks.  But it is what I intended.
Comment 21 Owen Taylor 2002-11-02 05:42:43 UTC
Moving bugs to new hangul component
Comment 22 Owen Taylor 2002-12-07 22:15:17 UTC
See also bug 100625
Comment 23 Jungshik Shin 2002-12-12 23:20:31 UTC
I guess this bug can be closed  when my patch to
bug 100625 is committed. I've got a small patch for a minor
enhancement here, but I guess we can do it in 1.2.1 phase. 
Comment 24 Owen Taylor 2002-12-17 02:09:58 UTC
Closing now that bug 100625 is fixed - if you have another patch, 
please file it as a separate bug. (It gets hard to keep track 
of a bug report that is this long)