GNOME Bugzilla – Bug 705727
Incorrect rendering w/ Hangul syllable composition GSUB
Last modified: 2014-07-31 19:31:37 UTC
Created attachment 251243 [details] test text file I'm modifying Hangul fonts so they have Hangul syllable composition with GSUB tables. With my current implementation, "U+1100 U+1100 U+1161 U+11A8 U+11A8" (2 leading consonant, 1 vowel, 2 trailing consonant) string with Hangul jamos should be rendered as one Hangul syllable glyph, U+AE4E. Harfbuzz commands works as expected, so I think my implementation is correct: $ hb-shape build/JebudoSans.ttf $'\xe1\x84\x80\xe1\x84\x80\xe1\x85\xa1\xe1\x86\xa8\xe1\x86\xa8' [uniAE4E=0+962] $ Libreoffice, which uses harfbuzz, also renders them as expected. But in Pango, the same string is rendered as partially-combined three glyphs "U+1101 U+1161 U+11A9". I doubt Pango splits this string as 3 separate strings. Test files will be attached.
That's very weird.
Created attachment 251244 [details] resulting gedit screenshot My font is too large to attach. Get it from this URL: http://people.debian.org/~cwryu/bugs/JebudoSans.ttf
I don't have time to look into this right now. You sure your pango is actually recent enough to use harfbuzz, and the harfbuzz you have compiled specifically?
I have not compiled them myself. I just used pango 1.32.5 and harfbuzz 0.9.19 Debian packages. And yes, this pango version uses harfbuzz.
It is the same with Noto Sans Korean font. $ hb-shape ~/.fonts/NotoSansKR-Regular.otf 간 [gid10047=0+920] $ But Pango rendering result is not one glyph.
We explicitly decided to not combine, eg, U+1100,U+1100 as from what I understand those are now atomically encoded in Unicode. So, the original report is out of question. It won't work. But, you are right that even the shorter sequence pango doesn't seem to render correctly. Investigating.
Thanks. Fixed in Pango master. commit 61aeba6257ec7691a7a5222fb69aec3cc042435b Author: Behdad Esfahbod <behdad@behdad.org> Date: Wed Jul 30 18:58:14 2014 -0400 Don't break run in the middle of Hangul jamo sequence See comments. Bug 705727 - Incorrect rendering w/ Hangul syllable composition GSUB https://bugzilla.gnome.org/show_bug.cgi?id=705727
Actually Unicode Standard allows "one or more" sequences of L/V jamos and "zero or more" T jamos in one Hangul syllable. So U+1100 U+1100 belong to one Hangul syllable. Such this forms are not common, but some fonts have GSUB tables to make U+1100 U+1100 to U+1101, etc. 'ljmo', 'vjmo', 'tjmo' standard features exactly do that job. See https://www.microsoft.com/typography/otfntdev/hangulot/features.aspx Of course no font in the world can render all arbitrary forms of such this sequence. But Pango doesn't have to worry about it because fallback rendering can also be done by fonts.
That spec is old. I believe the latest recommendation is to not form those. At any rate, Pango doesn't care. Please bring it up on the HarfBuzz list and someone will point you to the previous discussion. I'm open to changing if Windows does that, but I get the impression that Windows doesn't do that anymore.
I found the mail thread. The issue on the thread was whether harfbuzz should do complex normalization or not. Well I don't expect harfbuzz do complex normalization, but just not-breaking one Hangul syllable sequence. Maybe my example was too extreme. :) Now I see this bug has been fixed in Pango git. Thanks. BTW, you seem to have been misled by that mail thread. The rule of determining Hangul syllable boundaries has not been changed in Unicode 7.0 since 2.0. Only some examples on that Microsoft Truetype page are old and inappropriate, but the whole basic Hangul composition rule and the font features are still valid.
I think HarfBuzz does what you are asking for. At any rate, don't continue discussion here. I'm not going to change anything in HarfBuzz without discussion on the harfbuzz list. Thanks.