After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 78575 - JOHAB KSC5601 1992-3 in hangul x shaper
JOHAB KSC5601 1992-3 in hangul x shaper
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
1.0.x
Other opensolaris
: Normal normal
: 1.0.4
Assigned To: Owen Taylor
Owen Taylor
Depends on:
Blocks:
 
 
Reported: 2002-04-13 01:18 UTC by Hidetoshi Tajima
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch for KSX1005 (KS C 5601.1992-3) support in Solaris (2.64 KB, patch)
2002-05-15 17:55 UTC, Qingjiang (Brian) Yuan
none Details | Review
Updated patch, please ignore the previous one (2.17 KB, patch)
2002-05-29 22:40 UTC, Qingjiang (Brian) Yuan
none Details | Review
Updated patch without iconv, please ignore the previous ones (4.58 KB, patch)
2002-07-11 16:49 UTC, Qingjiang (Brian) Yuan
none Details | Review
Updated with proper fallback (5.09 KB, patch)
2002-07-17 12:35 UTC, Changwoo Ryu
none Details | Review

Description Hidetoshi Tajima 2002-04-13 01:18:07 UTC
To use JOHAB encoded X fonts on Solaris. Can we recreate
mapping tables in tables-big.i by adding conversion rules from
JOHAB.TXT, and add a new charset entry for "ksc5601.1992-3" in 
charsets[]?

I'll file a separate report for chinese encoded fonts like GBK or
GB18030.

Should we create a separate shaper for this purpose, instead of 
keep enhancing the basic x shaper?
Comment 1 Hidetoshi Tajima 2002-04-19 18:54:51 UTC
This seems an issue of hangul x shaper, so the summary field
has been changed accordingly.

"ksc5601.1992-3" X fonts are to be used in hangle x shaper on Solaris.
The ksc5601.1992-3 fonts have Hangul Syllables and CJK glyphs
defined in JOHAB.TXT.
Comment 2 Owen Taylor 2002-04-30 04:35:47 UTC
Should be pretty straightforward to add such support to
the hangul-x shaper.
Comment 3 Qingjiang (Brian) Yuan 2002-05-15 17:53:54 UTC
See the attached code_patch.78575 for the source patch.
The following lines should be added into the pangox.aliases in order
to  render HKSCS/GB18030/CNS11643/KSX1005(KSC5601.1992-3) for this bug
and bug #79812 in Solaris:

   -*-song-medium-r-normal--*-*-*-*-*-*-*-*,\
   -*-sung-medium-r-normal--*-*-*-*-*-*-*-*,\
   -*-myeongjo-medium-r-normal--*-*-*-*-*-*-*-*,\

Comment 4 Qingjiang (Brian) Yuan 2002-05-15 17:55:06 UTC
Created attachment 8496 [details] [review]
Patch for KSX1005 (KS C 5601.1992-3) support in Solaris
Comment 5 Luis Villa 2002-05-17 01:34:25 UTC
Reminder to everyone that using 'patch' makes life easier for everyone :)
Comment 6 Hidetoshi Tajima 2002-05-24 16:18:06 UTC
Okay to commit? - I'd integrate this into 1.2.0 for sure
-so the commit should be done any time soon - right?
Comment 7 Hidetoshi Tajima 2002-05-28 23:47:46 UTC
Since 1.0.2 was released, are we okay to commit this now?
Comment 8 Owen Taylor 2002-05-29 19:27:01 UTC
* I don't think this patch handles syllables that aren't 
  CVC right... even for well-formed syllables, it isn't
  going to handle 'CV' syllables. Compare
  render_syllable_with_ksc5601()

* The function seems to leak the result of
  g_iconv_open ("JOHAB", "UTF-8"); even if it wasn't leaking
  it, then there would still be a problem that opening
  and closing an iconv converter can be quite expensive
  compared to keeping it around.

* Since I assume the layout of the syllables in 
  ksc5601 1992-3 is regular, I'm not sure why its
  necessary to convert using iconv().

  (The shaper is only registered for Hangul syllables, not for
   other characters that might be in KSC5601 1992-3

* I'd rather see initializations like:

+  size_t inbytesleft=3;
+  size_t outbytesleft = 2;

  that are done in close connection to some code (calling g_iconv()
  done next that code.)

* A few minor indentation problems:

  Space around '='

   inbytesleft=3 => inbytesleft = 3

  No return at end of a function like:

 +  return;
 +}
 
 Some other strange indentation:
    
+    if (cd == (GIConv)-1)
+    {
+    g_warning ("Could not load converter from UTF-8 to ko_KR.johap92\n");
+    return;
+    }
Comment 9 Qingjiang (Brian) Yuan 2002-05-29 20:23:16 UTC
+* I don't think this patch handles syllables that aren't
+  CVC right... even for well-formed syllables, it isn't
+  going to handle 'CV' syllables. Compare
+  render_syllable_with_ksc5601()

This patch is to directly use the glyphs in the Solaris ksc5601.1992-3
font which contains all of 11172 hanguls

+
+* The function seems to leak the result of
+  g_iconv_open ("JOHAB", "UTF-8"); even if it wasn't leaking
+  it, then there would still be a problem that opening
+  and closing an iconv converter can be quite expensive
+  compared to keeping it around.

Sorry for the leakage, I should call the g_iconv_open() once and check
its availablity always before trying to reopen it.
But where should I call the g_iconv_close()?

+
+* Since I assume the layout of the syllables in
+  ksc5601 1992-3 is regular, I'm not sure why its
+  necessary to convert using iconv().

It's because the encoding of ksc5601.1992-3 font is JOHAB, but the
fonts are used (only) in UTF-8 locales in Solaris.

+
+  (The shaper is only registered for Hangul syllables, not for
+   other characters that might be in KSC5601 1992-3

The Hanja characters and other symbols have already covered in the
basic-x module using ksc5601.1987-0, so it's not a problem.
+
+* I'd rather see initializations like:
+
++  size_t inbytesleft=3;
++  size_t outbytesleft = 2;

Sorry, I will fix this and also the other indentation problmes.
Comment 10 Owen Taylor 2002-05-29 20:52:25 UTC
I don't mean missing syllables, I mean that if you look
at render_syllable_with_ksc5601; the only thing that you
are handling is "n_cho = 1, n_jung=1, n_jong=1". While
only a few combinations are possibile in legitimate syllables,
if the source text is in unicode-combining-jamos, you can
be called for arbitrary combinations.

I'm not sure what the locale encoding has to do with
anything; what your render function is responsible for
doing is turning a sequences of jamos into glyphs chosen
from your font.
Comment 11 Qingjiang (Brian) Yuan 2002-05-29 21:20:58 UTC
OK, I see what's your question, in the hangul_engine_shape(), 
one Unicode Hangul is converted into three jamos before calling any
rendering function:
          sindex = wc4 - SBASE;
          wcs[0] = LBASE + (sindex / NCOUNT);
          wcs[1] = VBASE + ((sindex % NCOUNT) / TCOUNT);
          wcs[2] = TBASE + (sindex % TCOUNT);

and in the render_syllable_with_ksc5601(), it will draw all of the
jamos as a fallback unless the "n_cho = 1, n_jung=1, n_jong=1" where
it will draw the Hangul character itself.
In the new render_syllable_with_ksx1005(), since all of the Hanguls
(11,172) that are in Unicode are supported in ksc5601.1992-3 fonts, so
we don't need a fallback, that's why I just convert the three jamos
back to Unicode using:
   /* convert back to Unicode */
  gindex = (lindex * VCOUNT + vindex) * TCOUNT + tindex + SBASE;
and then convert it to UTF-8:
  inbuf[0] = ((gindex >> 12) & 0x0f) | 0xe0;
  inbuf[1] = ((gindex >> 6) & 0x3f) | 0x80;
  inbuf[2] = (gindex & 0x3f) | 0x80;
and then to Johab:

  g_iconv (cd, (char **)&inptr, &inbytesleft, &outptr, &outbytesleft);

before calling the  set_glyph()
  I'm just trying to follow all of the existing code structure.
Comment 12 Qingjiang (Brian) Yuan 2002-05-29 22:40:34 UTC
Created attachment 8829 [details] [review]
Updated patch, please ignore the previous one
Comment 13 Owen Taylor 2002-06-12 15:47:22 UTC
The point is, one Hangul is _not_ necessarily converted
into 3 jamos, because the input to the shaper could
also be combining Jamos.

(Also, some precomposed syllables actually only have 2 
jamos... this works a little funnny in the existing
code, it looks like we have n_jamos == 3, but the 3rd
is a special value. See the unicode spec section 3.11.)
Comment 14 Changwoo Ryu 2002-06-27 04:37:27 UTC
If Solaris just want ISO10646 level 1 support, supporting only 
2 or 3 modern jamos per syllable is enough.  But when I wrote this, 
I wanted it to support the Hangul Jamos area as well.  The 
ksc5601.1992-3 fonts have no enough glyphs to render these area, 
but it should render some fallback glyphs.

In addition, Unicode hangul syllable => JOHAB converting can be done
with a simple expression.  iconv() is expensive.

JOHAB hangul syllable (16bits) consists of "1" at the MSB, 5 bits with
CHOSEONG index, 5 bits with JUNGSEONG index, and 5 bits with JONGSEONG
index.
Comment 15 Qingjiang (Brian) Yuan 2002-07-10 20:37:37 UTC
Thanks Changwoo for the information, glad to know iconv is not
necessary for converting the indexes back to JOHAB.
 
Yes, you are correct, Solaris doesn't support the Hangul Jamos, I
don't know the history but looks like we have no plan to support it in
the near future.
Also I will follow the other modules to check whether there are one
CHOSEONG, one JUNGSEONG, and no more than one JONGSEONG, and will call
 the fallback function for the others.
Comment 16 Qingjiang (Brian) Yuan 2002-07-11 16:49:23 UTC
Created attachment 9811 [details] [review]
Updated patch without iconv, please ignore the previous ones
Comment 17 Changwoo Ryu 2002-07-12 18:45:34 UTC
Is the fallback code correct?

I don't have any ksc5601.1992-3 font (maybe there's no free
ksc5601.1992-3 font).  But I guess (because it's JOHAB encoded) 
it does not have each Hangul jamo glyphs on the Unicode Hangul 
Jamos code value.

The fallback code should render each Jamos with reasonable glyph
in the corresponding font.  Maybe there's no reasonable glyph
for some medieval Hangul Jamos.  But just let it do its best
as possible.
Comment 18 Qingjiang (Brian) Yuan 2002-07-12 19:00:56 UTC
The fallback is not for Solaris, in the Solaris ksc5601.1992-3 fonts,
all of the 11172 glyphs will be in the if (n_cho == 1 && n_jung == 1
&& n_jong <= 1) section.

That's why I didn't include the fallback in the previous patches.
If possible, I'd still like to remove the fallback.
Comment 19 Changwoo Ryu 2002-07-13 01:54:06 UTC
The fallback's not for Solaris, but for what?  Your patch will render 
with wrong glyphs in ksc5601.1992-3 if (n_cho != 1 || n_jung != 1
|| n_jong > 1).

Even in render_with_iso10646 or render_with_johab*, which can render
the 11172 syllable, there are fallbacks.  render_with_johab* does
more; it even renders some of the non-modern Jamo compositions 
as syllable forms.

Processing 11172 Hangul syllable is not an interesting issue
in hangul-x module.  It's too easy, isn't?  :-)
Comment 20 Qingjiang (Brian) Yuan 2002-07-14 05:41:14 UTC
This patch is to support 11172 hanguls in Solaris using ksc5601.1992-3
fonts, without this patch, only 2350 hanguls are supported in Solaris
using ksc5601.1987-0 fonts, since Solaris (or ksc5601.1992-3 font)
doesn't support any other hangul characters, I'd suggest to not check
the (n_cho != 1 || n_jung != 1 || n_jong > 1) which was added per the
request from Owen.
Comment 21 Changwoo Ryu 2002-07-15 12:15:18 UTC
<blahblah>
I remember an system engineer from Sun Microsystems Korea, came to
(try to) fix a Solaris system when I was a University student.  He
often said, "we don't support it" -- "You installed XEmacs?  We don't
support it.", "What is it?  Standard ML?  We don't support it."
</blahblah>

Sorry but I just want to say, it's not important whether Solaris now
supports Jamos or not.   hangul-x/pango/GNOME supports it.  It is a
hangul-x module policy, render_with_*() functions should render all
possible Jamo combination.

Why do you want to remove the condition so much?  Writing fallback is
not damn very difficult.
Comment 22 Qingjiang (Brian) Yuan 2002-07-15 15:47:00 UTC
It's a FACT that it's not supported in Solaris, I want to remove the
fallback because I don't know whether it's correct or not due to the
fact that I couldn't test it in Solaris environment.
I have no objection to add anything that has nothing to do with
Solaris, so please feel free to provide your suggested fallbacks
instead of just challenging it.
Comment 23 Changwoo Ryu 2002-07-16 02:03:06 UTC
If you remove the fallback, it's incorrect.  Umm..but it's better to
commit some example Hangul text into the pango/modules/hangul/ dir.

As I don't have any ksc5601.1992-3 font, I can't provide fallback
code.  But as I said, it's easy.  I guess the font has "jamo glyphs",
glyphs which renders each jamo.  Then in the fallback you could just
render Unicode Jamos as the corresponding jamo glyph in the font.

Is there any legal copy of any ksc5601.1992-3 font, which can be used
with XFree86?  I could write the correct fallback if I get one.
Comment 24 Changwoo Ryu 2002-07-17 12:34:26 UTC
OK.. The font has the jamo glyphs at 0xda80.  Then
fallback code can use the fallback jamo table for render_with_ksc5601.

Here goes the patch with improved fallback.
Comment 25 Changwoo Ryu 2002-07-17 12:35:54 UTC
Created attachment 9925 [details] [review]
Updated with proper fallback
Comment 26 Owen Taylor 2002-07-26 23:28:59 UTC
Committed the last patch to stable and head with two changes:

 - Added a return after the non-fallback case
 - Fixed C++ comment