After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 348348 - Add a way to get the script name of a gunichar
Add a way to get the script name of a gunichar
Status: RESOLVED FIXED
Product: pango
Classification: Platform
Component: general
unspecified
Other All
: Urgent blocker
: ---
Assigned To: Behdad Esfahbod
pango-maint
Depends on:
Blocks: gregex
 
 
Reported: 2006-07-22 17:35 UTC by Marco Barisione
Modified: 2007-08-21 02:57 UTC
See Also:
GNOME target: 2.20.x
GNOME version: 2.19/2.20


Attachments
the glib patch (277.52 KB, patch)
2006-10-09 03:08 UTC, Matthias Clasen
committed Details | Review
pango patch (2.75 KB, patch)
2006-10-09 03:21 UTC, Matthias Clasen
rejected Details | Review

Description Marco Barisione 2006-07-22 17:35:54 UTC
Currently in glib there aren't functions to get the script name of a gunichar, I propose to add something like:

typedef enum
{
  G_UNICODE_SCRIPT_ARABIC,
  G_UNICODE_SCRIPT_ARMENIAN,
  ...
  G_UNICODE_SCRIPT_UGARITIC
} GUnicodeScript;

/* returns the script of c */
GUnicodeScript g_unichar_get_script(gunichar c);

/* returns the (translated?) name of the script */
const gchar *g_unichar_get_script_name(GUnicodeScript script);

The code can be copied from libgunichar, the relevant files are gucharmap/gucharmap-script-codepoint-list.c, gucharmap/unicode-scripts.h and gucharmap/gen-guch-unicode-tables.pl (it generates unicode-scripts.h).
Comment 1 Behdad Esfahbod 2006-07-22 19:27:51 UTC
I suggest we don't do that just yet.  I've been updating our stack to Unicode 5.0, and you don't know how much work it is.  Updating tables in fribidi, glib (two separate scripts), pango, gucharmap...

This is a mess we need to fix, and while giulia is not quite a project just yet, the Unicode Character Database wrapper behnam and I have been thinking about can be put together rather rapidly.  I suggest for now you just copy whatever you need internally.

The code in Pango is way faster btw.  We optimized it by having a direct-lookup table for the first 8192 chars, and to remember the range in the binary search for the rest...
Comment 2 Owen Taylor 2006-07-24 16:38:25 UTC
I'm not sure if it's clear; EggRegex is proposed as an addition to GLib,
so adding it "internally" doesn't prevent the need to add it to GLib :-)

I don't see much harm with having g_unichar_get_script() public; it's
a pretty well-defined addition, but it could of course be added as
_g_unichar_get_script(), if we think that the public interface to 
Unicode properties should be YAL.
Comment 3 Behdad Esfahbod 2006-10-08 16:24:38 UTC
So, the proposal is to move PangoScript and pango_script_for_unichar() to GUnicodeScript and g_unichar_get_script()?  I think I can live with it if there are patches to cleanly move all the tables and generators to glib and provide patches for pango use the glib variants.  It will mean we have two names for the same type though.

I don't think PangoScriptIter should be moved.
Comment 4 Matthias Clasen 2006-10-09 03:08:50 UTC
Created attachment 74319 [details] [review]
the glib patch
Comment 5 Matthias Clasen 2006-10-09 03:21:03 UTC
Created attachment 74320 [details] [review]
pango patch
Comment 6 Matthias Clasen 2006-10-09 04:25:38 UTC
GLib part committed, moving to Pango.


2006-10-08  Matthias Clasen  <mclasen@redhat.com>

        Add a way to obtain Unicode script information.  (#348348,
        Marco Barisione)

        * glib/glib.symbols:
        * glib/gunicode.h: Add GUnicodeScript enumeration and
        g_unichar_get_script.

        * glib/guniprop.c: Implement g_unichar_get_script.

        * glib/gscripttable.h: Generated private header containing
        script tables.

        * glib/gen-script-table.pl: Script to generate gscripttable.h.

        * glib/Makefile.am: Update
Comment 7 Behdad Esfahbod 2006-10-09 18:49:07 UTC
Looks good.  Thanks.
Comment 8 Behdad Esfahbod 2007-05-03 01:48:09 UTC
Is there a devel glib release out with the new API?
Comment 9 Matthias Clasen 2007-05-03 01:50:05 UTC
Not yet, but I expect to get to that in the next few days, ideally Friday
Comment 10 Matthias Clasen 2007-05-03 02:20:34 UTC
Actually, I was wrong, this api is in 2.13.0, which was released mid-March
Comment 11 Behdad Esfahbod 2007-05-03 23:01:00 UTC
So what should I do to PangoScript enum?

  * Deprecate that and don't update it anymore.

  * Deprecate and update to synch with GUnicodeScript.

  * Typedef it to GUnicodeScript and #define the current values and don't update anymore.

  * Don't deprecate but document that it's interchangable with GUnicodeScript.

The reason for not deprecating is that I don't see us changing PangoScript in the public API to GUnicodeScript.  Though it exists in surprisingly few APIs.  In fact, the only APIs not inside PANGO_ENABLE_ENGINE or PANGO_ENABLE_BACKEND are:

  pango_gravity_get_for_script().  New in 1.16.  Probably safe to change.
  pango_script_iter_get_range().
  pango_script_get_sample_language().  At some point the language stuff needs to move to glib too.  Should we go on and do that now?
  pango_language_includes_script().  Same as above.
Comment 12 Behdad Esfahbod 2007-07-30 17:43:05 UTC
Owen, do you have any thoughts on how to deprecate PangoScript and introduce GUnicodeScript?
Comment 13 Behdad Esfahbod 2007-08-21 02:56:25 UTC
The resolution was to don touch any API, just document that PangoScript is equivalent to GUnicodeScript and that pango_script_for_unichar() returns g_unichar_get_script().

2007-08-20  Behdad Esfahbod  <behdad@gnome.org>

        * glib/tmpl/unicode.sgml: Document that GUnicodeScript is
        interchangeable with PangoScript.

2007-08-20  Behdad Esfahbod  <behdad@gnome.org>

        * glib/guniprop.c: Document that g_unichar_get_script() is
        equivalent to pango_script_for_unichar().

2007-08-20  Behdad Esfahbod  <behdad@gnome.org>

        Bug 348348 – Add a way to get the script name of a gunichar

        * configure.in: Require glib 2.14, for GUnicodeScript stuff.

        * docs/tmpl/scripts.sgml: Document that #PangoScript is
        interchangeable with GUnicodeScript.

        * pango/pango-script.c (pango_script_for_unichar): Use
        g_unichar_get_script(), and document it.

        * tools/Makefile.am:
        * tools/gen-script-table.pl:
        * pango/Makefile.am:
        * pango/pango-script-table.h:
        Remove pango-script-table.h and its generator.

        * pango/pango-gravity.c (get_script_properties):
        * pango/pango-language.c (pango_script_get_sample_language):
        * pango/pango-ot-tag.c (pango_ot_tag_from_script):
        Protect against unexpected script values.

Comment 14 Behdad Esfahbod 2007-08-21 02:57:06 UTC
So, no deprecation whatsoever.  PangoScript is fine to use.