GNOME Bugzilla – Bug 348348
Add a way to get the script name of a gunichar
Last modified: 2007-08-21 02:57:06 UTC
Currently in glib there aren't functions to get the script name of a gunichar, I propose to add something like: typedef enum { G_UNICODE_SCRIPT_ARABIC, G_UNICODE_SCRIPT_ARMENIAN, ... G_UNICODE_SCRIPT_UGARITIC } GUnicodeScript; /* returns the script of c */ GUnicodeScript g_unichar_get_script(gunichar c); /* returns the (translated?) name of the script */ const gchar *g_unichar_get_script_name(GUnicodeScript script); The code can be copied from libgunichar, the relevant files are gucharmap/gucharmap-script-codepoint-list.c, gucharmap/unicode-scripts.h and gucharmap/gen-guch-unicode-tables.pl (it generates unicode-scripts.h).
I suggest we don't do that just yet. I've been updating our stack to Unicode 5.0, and you don't know how much work it is. Updating tables in fribidi, glib (two separate scripts), pango, gucharmap... This is a mess we need to fix, and while giulia is not quite a project just yet, the Unicode Character Database wrapper behnam and I have been thinking about can be put together rather rapidly. I suggest for now you just copy whatever you need internally. The code in Pango is way faster btw. We optimized it by having a direct-lookup table for the first 8192 chars, and to remember the range in the binary search for the rest...
I'm not sure if it's clear; EggRegex is proposed as an addition to GLib, so adding it "internally" doesn't prevent the need to add it to GLib :-) I don't see much harm with having g_unichar_get_script() public; it's a pretty well-defined addition, but it could of course be added as _g_unichar_get_script(), if we think that the public interface to Unicode properties should be YAL.
So, the proposal is to move PangoScript and pango_script_for_unichar() to GUnicodeScript and g_unichar_get_script()? I think I can live with it if there are patches to cleanly move all the tables and generators to glib and provide patches for pango use the glib variants. It will mean we have two names for the same type though. I don't think PangoScriptIter should be moved.
Created attachment 74319 [details] [review] the glib patch
Created attachment 74320 [details] [review] pango patch
GLib part committed, moving to Pango. 2006-10-08 Matthias Clasen <mclasen@redhat.com> Add a way to obtain Unicode script information. (#348348, Marco Barisione) * glib/glib.symbols: * glib/gunicode.h: Add GUnicodeScript enumeration and g_unichar_get_script. * glib/guniprop.c: Implement g_unichar_get_script. * glib/gscripttable.h: Generated private header containing script tables. * glib/gen-script-table.pl: Script to generate gscripttable.h. * glib/Makefile.am: Update
Looks good. Thanks.
Is there a devel glib release out with the new API?
Not yet, but I expect to get to that in the next few days, ideally Friday
Actually, I was wrong, this api is in 2.13.0, which was released mid-March
So what should I do to PangoScript enum? * Deprecate that and don't update it anymore. * Deprecate and update to synch with GUnicodeScript. * Typedef it to GUnicodeScript and #define the current values and don't update anymore. * Don't deprecate but document that it's interchangable with GUnicodeScript. The reason for not deprecating is that I don't see us changing PangoScript in the public API to GUnicodeScript. Though it exists in surprisingly few APIs. In fact, the only APIs not inside PANGO_ENABLE_ENGINE or PANGO_ENABLE_BACKEND are: pango_gravity_get_for_script(). New in 1.16. Probably safe to change. pango_script_iter_get_range(). pango_script_get_sample_language(). At some point the language stuff needs to move to glib too. Should we go on and do that now? pango_language_includes_script(). Same as above.
Owen, do you have any thoughts on how to deprecate PangoScript and introduce GUnicodeScript?
The resolution was to don touch any API, just document that PangoScript is equivalent to GUnicodeScript and that pango_script_for_unichar() returns g_unichar_get_script(). 2007-08-20 Behdad Esfahbod <behdad@gnome.org> * glib/tmpl/unicode.sgml: Document that GUnicodeScript is interchangeable with PangoScript. 2007-08-20 Behdad Esfahbod <behdad@gnome.org> * glib/guniprop.c: Document that g_unichar_get_script() is equivalent to pango_script_for_unichar(). 2007-08-20 Behdad Esfahbod <behdad@gnome.org> Bug 348348 – Add a way to get the script name of a gunichar * configure.in: Require glib 2.14, for GUnicodeScript stuff. * docs/tmpl/scripts.sgml: Document that #PangoScript is interchangeable with GUnicodeScript. * pango/pango-script.c (pango_script_for_unichar): Use g_unichar_get_script(), and document it. * tools/Makefile.am: * tools/gen-script-table.pl: * pango/Makefile.am: * pango/pango-script-table.h: Remove pango-script-table.h and its generator. * pango/pango-gravity.c (get_script_properties): * pango/pango-language.c (pango_script_get_sample_language): * pango/pango-ot-tag.c (pango_ot_tag_from_script): Protect against unexpected script values.
So, no deprecation whatsoever. PangoScript is fine to use.