GNOME Bugzilla – Bug 78575
JOHAB KSC5601 1992-3 in hangul x shaper
Last modified: 2004-12-22 21:47:04 UTC
To use JOHAB encoded X fonts on Solaris. Can we recreate mapping tables in tables-big.i by adding conversion rules from JOHAB.TXT, and add a new charset entry for "ksc5601.1992-3" in charsets[]? I'll file a separate report for chinese encoded fonts like GBK or GB18030. Should we create a separate shaper for this purpose, instead of keep enhancing the basic x shaper?
This seems an issue of hangul x shaper, so the summary field has been changed accordingly. "ksc5601.1992-3" X fonts are to be used in hangle x shaper on Solaris. The ksc5601.1992-3 fonts have Hangul Syllables and CJK glyphs defined in JOHAB.TXT.
Should be pretty straightforward to add such support to the hangul-x shaper.
See the attached code_patch.78575 for the source patch. The following lines should be added into the pangox.aliases in order to render HKSCS/GB18030/CNS11643/KSX1005(KSC5601.1992-3) for this bug and bug #79812 in Solaris: -*-song-medium-r-normal--*-*-*-*-*-*-*-*,\ -*-sung-medium-r-normal--*-*-*-*-*-*-*-*,\ -*-myeongjo-medium-r-normal--*-*-*-*-*-*-*-*,\
Created attachment 8496 [details] [review] Patch for KSX1005 (KS C 5601.1992-3) support in Solaris
Reminder to everyone that using 'patch' makes life easier for everyone :)
Okay to commit? - I'd integrate this into 1.2.0 for sure -so the commit should be done any time soon - right?
Since 1.0.2 was released, are we okay to commit this now?
* I don't think this patch handles syllables that aren't CVC right... even for well-formed syllables, it isn't going to handle 'CV' syllables. Compare render_syllable_with_ksc5601() * The function seems to leak the result of g_iconv_open ("JOHAB", "UTF-8"); even if it wasn't leaking it, then there would still be a problem that opening and closing an iconv converter can be quite expensive compared to keeping it around. * Since I assume the layout of the syllables in ksc5601 1992-3 is regular, I'm not sure why its necessary to convert using iconv(). (The shaper is only registered for Hangul syllables, not for other characters that might be in KSC5601 1992-3 * I'd rather see initializations like: + size_t inbytesleft=3; + size_t outbytesleft = 2; that are done in close connection to some code (calling g_iconv() done next that code.) * A few minor indentation problems: Space around '=' inbytesleft=3 => inbytesleft = 3 No return at end of a function like: + return; +} Some other strange indentation: + if (cd == (GIConv)-1) + { + g_warning ("Could not load converter from UTF-8 to ko_KR.johap92\n"); + return; + }
+* I don't think this patch handles syllables that aren't + CVC right... even for well-formed syllables, it isn't + going to handle 'CV' syllables. Compare + render_syllable_with_ksc5601() This patch is to directly use the glyphs in the Solaris ksc5601.1992-3 font which contains all of 11172 hanguls + +* The function seems to leak the result of + g_iconv_open ("JOHAB", "UTF-8"); even if it wasn't leaking + it, then there would still be a problem that opening + and closing an iconv converter can be quite expensive + compared to keeping it around. Sorry for the leakage, I should call the g_iconv_open() once and check its availablity always before trying to reopen it. But where should I call the g_iconv_close()? + +* Since I assume the layout of the syllables in + ksc5601 1992-3 is regular, I'm not sure why its + necessary to convert using iconv(). It's because the encoding of ksc5601.1992-3 font is JOHAB, but the fonts are used (only) in UTF-8 locales in Solaris. + + (The shaper is only registered for Hangul syllables, not for + other characters that might be in KSC5601 1992-3 The Hanja characters and other symbols have already covered in the basic-x module using ksc5601.1987-0, so it's not a problem. + +* I'd rather see initializations like: + ++ size_t inbytesleft=3; ++ size_t outbytesleft = 2; Sorry, I will fix this and also the other indentation problmes.
I don't mean missing syllables, I mean that if you look at render_syllable_with_ksc5601; the only thing that you are handling is "n_cho = 1, n_jung=1, n_jong=1". While only a few combinations are possibile in legitimate syllables, if the source text is in unicode-combining-jamos, you can be called for arbitrary combinations. I'm not sure what the locale encoding has to do with anything; what your render function is responsible for doing is turning a sequences of jamos into glyphs chosen from your font.
OK, I see what's your question, in the hangul_engine_shape(), one Unicode Hangul is converted into three jamos before calling any rendering function: sindex = wc4 - SBASE; wcs[0] = LBASE + (sindex / NCOUNT); wcs[1] = VBASE + ((sindex % NCOUNT) / TCOUNT); wcs[2] = TBASE + (sindex % TCOUNT); and in the render_syllable_with_ksc5601(), it will draw all of the jamos as a fallback unless the "n_cho = 1, n_jung=1, n_jong=1" where it will draw the Hangul character itself. In the new render_syllable_with_ksx1005(), since all of the Hanguls (11,172) that are in Unicode are supported in ksc5601.1992-3 fonts, so we don't need a fallback, that's why I just convert the three jamos back to Unicode using: /* convert back to Unicode */ gindex = (lindex * VCOUNT + vindex) * TCOUNT + tindex + SBASE; and then convert it to UTF-8: inbuf[0] = ((gindex >> 12) & 0x0f) | 0xe0; inbuf[1] = ((gindex >> 6) & 0x3f) | 0x80; inbuf[2] = (gindex & 0x3f) | 0x80; and then to Johab: g_iconv (cd, (char **)&inptr, &inbytesleft, &outptr, &outbytesleft); before calling the set_glyph() I'm just trying to follow all of the existing code structure.
Created attachment 8829 [details] [review] Updated patch, please ignore the previous one
The point is, one Hangul is _not_ necessarily converted into 3 jamos, because the input to the shaper could also be combining Jamos. (Also, some precomposed syllables actually only have 2 jamos... this works a little funnny in the existing code, it looks like we have n_jamos == 3, but the 3rd is a special value. See the unicode spec section 3.11.)
If Solaris just want ISO10646 level 1 support, supporting only 2 or 3 modern jamos per syllable is enough. But when I wrote this, I wanted it to support the Hangul Jamos area as well. The ksc5601.1992-3 fonts have no enough glyphs to render these area, but it should render some fallback glyphs. In addition, Unicode hangul syllable => JOHAB converting can be done with a simple expression. iconv() is expensive. JOHAB hangul syllable (16bits) consists of "1" at the MSB, 5 bits with CHOSEONG index, 5 bits with JUNGSEONG index, and 5 bits with JONGSEONG index.
Thanks Changwoo for the information, glad to know iconv is not necessary for converting the indexes back to JOHAB. Yes, you are correct, Solaris doesn't support the Hangul Jamos, I don't know the history but looks like we have no plan to support it in the near future. Also I will follow the other modules to check whether there are one CHOSEONG, one JUNGSEONG, and no more than one JONGSEONG, and will call the fallback function for the others.
Created attachment 9811 [details] [review] Updated patch without iconv, please ignore the previous ones
Is the fallback code correct? I don't have any ksc5601.1992-3 font (maybe there's no free ksc5601.1992-3 font). But I guess (because it's JOHAB encoded) it does not have each Hangul jamo glyphs on the Unicode Hangul Jamos code value. The fallback code should render each Jamos with reasonable glyph in the corresponding font. Maybe there's no reasonable glyph for some medieval Hangul Jamos. But just let it do its best as possible.
The fallback is not for Solaris, in the Solaris ksc5601.1992-3 fonts, all of the 11172 glyphs will be in the if (n_cho == 1 && n_jung == 1 && n_jong <= 1) section. That's why I didn't include the fallback in the previous patches. If possible, I'd still like to remove the fallback.
The fallback's not for Solaris, but for what? Your patch will render with wrong glyphs in ksc5601.1992-3 if (n_cho != 1 || n_jung != 1 || n_jong > 1). Even in render_with_iso10646 or render_with_johab*, which can render the 11172 syllable, there are fallbacks. render_with_johab* does more; it even renders some of the non-modern Jamo compositions as syllable forms. Processing 11172 Hangul syllable is not an interesting issue in hangul-x module. It's too easy, isn't? :-)
This patch is to support 11172 hanguls in Solaris using ksc5601.1992-3 fonts, without this patch, only 2350 hanguls are supported in Solaris using ksc5601.1987-0 fonts, since Solaris (or ksc5601.1992-3 font) doesn't support any other hangul characters, I'd suggest to not check the (n_cho != 1 || n_jung != 1 || n_jong > 1) which was added per the request from Owen.
<blahblah> I remember an system engineer from Sun Microsystems Korea, came to (try to) fix a Solaris system when I was a University student. He often said, "we don't support it" -- "You installed XEmacs? We don't support it.", "What is it? Standard ML? We don't support it." </blahblah> Sorry but I just want to say, it's not important whether Solaris now supports Jamos or not. hangul-x/pango/GNOME supports it. It is a hangul-x module policy, render_with_*() functions should render all possible Jamo combination. Why do you want to remove the condition so much? Writing fallback is not damn very difficult.
It's a FACT that it's not supported in Solaris, I want to remove the fallback because I don't know whether it's correct or not due to the fact that I couldn't test it in Solaris environment. I have no objection to add anything that has nothing to do with Solaris, so please feel free to provide your suggested fallbacks instead of just challenging it.
If you remove the fallback, it's incorrect. Umm..but it's better to commit some example Hangul text into the pango/modules/hangul/ dir. As I don't have any ksc5601.1992-3 font, I can't provide fallback code. But as I said, it's easy. I guess the font has "jamo glyphs", glyphs which renders each jamo. Then in the fallback you could just render Unicode Jamos as the corresponding jamo glyph in the font. Is there any legal copy of any ksc5601.1992-3 font, which can be used with XFree86? I could write the correct fallback if I get one.
OK.. The font has the jamo glyphs at 0xda80. Then fallback code can use the fallback jamo table for render_with_ksc5601. Here goes the patch with improved fallback.
Created attachment 9925 [details] [review] Updated with proper fallback
Committed the last patch to stable and head with two changes: - Added a return after the non-fallback case - Fixed C++ comment