After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 91542 - Make some characters neutral for shaper selection
Make some characters neutral for shaper selection
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: future
Assigned To: pango-maint
pango-maint
Depends on:
Blocks: 112503 118302
 
 
Reported: 2002-08-23 19:00 UTC by Owen Taylor
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
pango-script.c (9.65 KB, text/plain)
2002-11-18 02:25 UTC, Owen Taylor
Details
pango-script.h (3.92 KB, text/plain)
2002-11-18 02:25 UTC, Owen Taylor
Details
testscript.c (7.01 KB, text/plain)
2002-11-18 02:27 UTC, Owen Taylor
Details
gen-script-table.pl (1.55 KB, text/plain)
2002-11-18 02:29 UTC, Owen Taylor
Details

Description Owen Taylor 2002-08-23 19:00:49 UTC
Support needs to be added for identifying characters as
"neutral" with respect to the choice of language engine. 
Currently, a block of, say Arabic text,
will be split into one-word runs of Arabic, with intervening
one-character runs for the Basic shaper for the space
character. This is, as might be imagined, a fairly major
performance problem.
Comment 2 Owen Taylor 2002-11-17 18:33:48 UTC
The code in ICU Eric was referring to is:

http://oss.software.ibm.com/cvs/icu/icu/source/extra/scrptrun/

Looks pretty simple given a function to compute UTR #24 script of
a given character.
Comment 3 Owen Taylor 2002-11-17 18:41:08 UTC
Source for script information is:

 http://www.unicode.org/Public/UNIDATA/Scripts.txt
Comment 4 Owen Taylor 2002-11-17 20:21:29 UTC
Appears that the ICU link above is a C++ prototype; there
is a C implementation:

 http://oss.software.ibm.com/cvs/icu/icu/source/common/usc_impl.c
 http://oss.software.ibm.com/cvs/icu/icu/source/common/usc_impl.h

That appears to be the current code.
Comment 5 Owen Taylor 2002-11-18 02:25:38 UTC
Created attachment 12359 [details]
pango-script.c
Comment 6 Owen Taylor 2002-11-18 02:25:49 UTC
Created attachment 12360 [details]
pango-script.h
Comment 7 Owen Taylor 2002-11-18 02:27:10 UTC
Created attachment 12361 [details]
testscript.c
Comment 8 Owen Taylor 2002-11-18 02:29:27 UTC
Created attachment 12362 [details]
gen-script-table.pl
Comment 9 Owen Taylor 2002-11-18 02:32:37 UTC
Attached port of the ICU algorithm to Pango, along with
code for looking up the script assignments.

(At least the script assignments should eventually go into
GLib, maybe the iterator too, so it probably makes sense to
protect this stuff with PANGO_ENABLE_ENGINE, to avoid
it being generally relied upon.)

Now just need to figure out how to hook it up to the engines.
I think it makes most sense to treat each engine as handling
some set of scripts, but the problem with this is that the
COMMON and INHERITED characters will result in engines getting
characters that they didnt' have to handle before, so all the 
engines will need to be audited in this regard.
Comment 10 Owen Taylor 2003-08-03 22:00:06 UTC
I've checked the script-range detection code into CVS now,
I'm still working on figureing out how to use it.
Comment 11 Owen Taylor 2003-09-23 23:58:14 UTC
OK, a complete rewrite of itemization is now in CVS. The algorithm
is more or less:

 - Correct language tags for rendering characters based on
   script information. (If Arabic script text is tagged as
   'en' change the language tag to 'ar')

 - Pick fonts for rendering characters based on corrected
   language and font description.

 - Pick fonts for non-rendering characters using the font
   for adjacent rendering characters.

Seems to work reasonably well, though I'm sure we'll discover
some additional problems that need to be fixed up.