Bug 705727 – Incorrect rendering w/ Hangul syllable composition GSUB

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 705727 - Incorrect rendering w/ Hangul syllable composition GSUB


Summary:	Incorrect rendering w/ Hangul syllable composition GSUB


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.32.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	pango-maint
QA Contact:	pango-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-08-09 16:06 UTC by Changwoo Ryu
Modified:	2014-07-31 19:31 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
test text file (365 bytes, text/plain) 2013-08-09 16:06 UTC, Changwoo Ryu	Details
resulting gedit screenshot (70.56 KB, image/png) 2013-08-09 16:21 UTC, Changwoo Ryu	Details

Description Changwoo Ryu 2013-08-09 16:06:58 UTC

Created attachment 251243 [details]
test text file

I'm modifying Hangul fonts so they have Hangul syllable composition with GSUB tables.

With my current implementation, "U+1100 U+1100 U+1161 U+11A8 U+11A8" (2 leading consonant, 1 vowel, 2 trailing consonant) string with Hangul jamos should be rendered as one Hangul syllable glyph, U+AE4E.

Harfbuzz commands works as expected, so I think my implementation is correct:

$ hb-shape build/JebudoSans.ttf $'\xe1\x84\x80\xe1\x84\x80\xe1\x85\xa1\xe1\x86\xa8\xe1\x86\xa8'
[uniAE4E=0+962]
$

Libreoffice, which uses harfbuzz, also renders them as expected. 

But in Pango, the same string is rendered as partially-combined three glyphs "U+1101 U+1161 U+11A9". I doubt Pango splits this string as 3 separate strings.

Test files will be attached.

Comment 1 Behdad Esfahbod 2013-08-09 16:08:22 UTC

That's very weird.

Comment 2 Changwoo Ryu 2013-08-09 16:21:51 UTC

Created attachment 251244 [details]
resulting gedit screenshot

My font is too large to attach. Get it from this URL: http://people.debian.org/~cwryu/bugs/JebudoSans.ttf

Comment 3 Behdad Esfahbod 2013-08-09 16:50:03 UTC

I don't have time to look into this right now.  You sure your pango is actually recent enough to use harfbuzz, and the harfbuzz you have compiled specifically?

Comment 4 Changwoo Ryu 2013-08-09 17:15:40 UTC

I have not compiled them myself. I just used pango 1.32.5 and harfbuzz 0.9.19 Debian packages. And yes, this pango version uses harfbuzz.

Comment 5 Changwoo Ryu 2014-07-27 14:51:08 UTC

It is the same with Noto Sans Korean font.

$ hb-shape ~/.fonts/NotoSansKR-Regular.otf 
간
[gid10047=0+920]
$

But Pango rendering result is not one glyph.

Comment 6 Behdad Esfahbod 2014-07-30 19:35:00 UTC

We explicitly decided to not combine, eg, U+1100,U+1100 as from what I understand those are now atomically encoded in Unicode.

So, the original report is out of question.  It won't work.

But, you are right that even the shorter sequence pango doesn't seem to render correctly.  Investigating.

Comment 7 Behdad Esfahbod 2014-07-30 22:59:15 UTC

Thanks.  Fixed in Pango master.

commit 61aeba6257ec7691a7a5222fb69aec3cc042435b
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Wed Jul 30 18:58:14 2014 -0400

    Don't break run in the middle of Hangul jamo sequence
    
    See comments.
    
    Bug 705727 - Incorrect rendering w/ Hangul syllable composition GSUB
    https://bugzilla.gnome.org/show_bug.cgi?id=705727

Comment 8 Changwoo Ryu 2014-07-31 07:54:20 UTC

Actually Unicode Standard allows "one or more" sequences of L/V jamos and "zero or more" T jamos in one Hangul syllable. So U+1100 U+1100 belong to one Hangul syllable. Such this forms are not common, but some fonts have GSUB tables to make U+1100 U+1100 to U+1101, etc.

'ljmo', 'vjmo', 'tjmo' standard features exactly do that job. See https://www.microsoft.com/typography/otfntdev/hangulot/features.aspx

Of course no font in the world can render all arbitrary forms of such this sequence. But Pango doesn't have to worry about it because fallback rendering can also be done by fonts.

Comment 9 Behdad Esfahbod 2014-07-31 14:56:20 UTC

That spec is old.  I believe the latest recommendation is to not form those.  At any rate, Pango doesn't care.  Please bring it up on the HarfBuzz list and someone will point you to the previous discussion.  I'm open to changing if Windows does that, but I get the impression that Windows doesn't do that anymore.

Comment 10 Changwoo Ryu 2014-07-31 19:16:46 UTC

I found the mail thread. The issue on the thread was whether harfbuzz should do complex normalization or not. Well I don't expect harfbuzz do complex normalization, but just not-breaking one Hangul syllable sequence. Maybe my example was too extreme. :) Now I see this bug has been fixed in Pango git. Thanks.

BTW, you seem to have been misled by that mail thread. The rule of determining Hangul syllable boundaries has not been changed in Unicode 7.0 since 2.0. Only some examples on that Microsoft Truetype page are old and inappropriate, but the whole basic Hangul composition rule and the font features are still valid.

Comment 11 Behdad Esfahbod 2014-07-31 19:31:37 UTC

I think HarfBuzz does what you are asking for.  At any rate, don't continue discussion here.  I'm not going to change anything in HarfBuzz without discussion on the harfbuzz list.  Thanks.