GNOME Bugzilla – Bug 101081
Non-BMP (plane 1 thru plane 16) characters are not supported
Last modified: 2004-12-22 21:47:04 UTC
Currently, Pango only supports BMP (plane 0) characters. Although it's not yet wide spread, plane 1 and plane 2 began to get filled (with plane 2 being rapidly filled up with CJK ideographs) and there are a couple of truetype fonts that support non-BMP characters. Code2001 by James Kass (http://home.att.net/~jameskass) has glyphs for plane 1 characters and Mac OS X comes with Japanese fonts for hundreds of plane 2 characters. There are also commerical truetype fonts with all the CJK ideographs encoded so far in Unicode/10646. So, it may be time to consider supporting non-BMP characters. There was a thread in Linux-UTF8 list(for people other than Owen who were there :-) : http://mail.nl.linux.org/linux-utf8/2002-12/msg00000.html http://mail.nl.linux.org/linux-utf8/2002-11/msg00148.html And, Mozilla bug 182877(http://bugzilla.mozilla.org/show_bug.cgi?id=182877) is of some relevance. Freetype2 had to be patched to use TTFs with UCS4 cmap and the patch was committed. Xft patch is in the queue. Incidentally, with a patched Freetype, gedit(Pango) seg-faulted when Code2001 was chosen. I'll try to track down the cause.
I tested it with a commercial font convering CJK Unified Ideographs Extension B after the following change: Index: pango/modules/basic/basic-fc.c =================================================================== RCS file: /cvs/gnome/pango/modules/basic/basic-fc.c,v retrieving revision 1.15 diff -u -r1.15 basic-fc.c --- pango/modules/basic/basic-fc.c 14 Apr 2003 23:48:23 -0000 1.15 +++ pango/modules/basic/basic-fc.c 11 Jul 2003 21:06:22 -0000 @@ -66,7 +66,7 @@ { 0xf900, 0xfa2d, "*" }, /* CJK Compatibility Ideographs */ { 0xfe30, 0xfe6b, "*" }, /* CJK Compatibility Forms and Small Form Variants */ { 0xff00, 0xffe3, "*" }, /* Halfwidth and Fullwidth Forms (partly) */ - { 0x0000, 0xffff, "" }, + { 0x0000, 0x2ffff, "" }, }; Everything works flawnessly.
Created attachment 18328 [details] [review] patch to draw the boxes for > ffff, and to enable up to 10ffff in basic-fc.c
Created attachment 18420 [details] [review] patch to optimize the pango map data structure
With the latter patch, the pango map structure takes 26k with the basic engine covering up to 10ffff (that is, with the former patch applied). For comparison, without the latter patch it takes 90k. Without the patch and covering only up to ffff it takes 40k. (These are the numbers I measured, it's not impossible that I made a mistake.) It would be possible to optimize this patch even further by adding a PangoMapEntry * to the PangoSubmap.d union, but it would probably sacrifice some readability (I had enough trouble writing map_add_engine). Looks like it would save another 6k or so.
*** Bug 118792 has been marked as a duplicate of this bug. ***
The 6-digit hex square stuff looks fine. The other changes shouldn't be necessary with the current script-based shaper selection code. Does PangoCoverage need optimization / moving to a 3 level page table like you did for PangoMap?
2003-11-18 Noah Levitt <nlevitt@columbia.edu> * pango/pangxft-font.c (pango_xft_real_render): Draw 6-digit hex boxes for > U+FFFF. (#101081)
*** Bug 140570 has been marked as a duplicate of this bug. ***