Bug 105626 – Add g_unichar_iswide_cjk()

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 105626 - Add g_unichar_iswide_cjk()


Summary:	Add g_unichar_iswide_cjk()


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	2.2.x
Hardware:	Other Linux

Importance:	High enhancement
Target Milestone:	---
Assigned To:	Behdad Esfahbod
QA Contact:	gtkdev

URL:
Whiteboard:

Duplicates:	338305 (view as bug list)
Depends on:
Blocks:

Reported:	2003-02-09 08:44 UTC by zlb
Modified:	2011-02-18 16:13 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch to implement (slow) g_unichar_is_ambiguously_wide (14.10 KB, patch) 2003-06-03 02:34 UTC, Nalin Dahyabhai	none	Details \| Review
Implement g_unichar_iswide_cjk() (6.33 KB, patch) 2006-04-27 07:55 UTC, Behdad Esfahbod	none	Details \| Review

Description zlb 2003-02-09 08:44:22 UTC

The function g_unichar_iswide() returns incorrect value
for some GBK (Simplified Chinese) punctuation chars.
An example of such punctuation characters is the
unicode char 0x201c (whose GBK code is 0xa1b0).

The bug causes incorrect cursor positioning of
gnome-terminal on lines containing such chars.

(Sorry I forgot if I have already reported this bug.
I only remember that I have reported this bug to
bugzilla.redhat.com).

Comment 1 zlb 2003-02-10 06:17:40 UTC

Sorry I have made a mistake in my report.
The GBK code of the character which produces
the bug should be 0xb0a1 (i.e., 0xa1 0xb0).

Comment 2 Owen Taylor 2003-02-12 22:41:24 UTC

Note that a number of Unicode characters have _ambiguous_
width. 

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

(A newer version of what g_unichar_iswide() is based upon)
has both wcwidth() and wcwidth_cjk.

Current gnome-terminal I believe has it's on wcwidth function
and does tricks to base the width of these ambiguous characters
on the source encoding.

Comment 3 zlb 2003-02-13 00:34:22 UTC

I'm using the version of gnome-terminal from RH8.0.
It is based on vte and the latter uses g_unichar_iswide()
to calculate cursor position.

I don't know much about the unicode. But I think 
g_unichar_iswide() should take into account the
current locale. My workaround for gnome-terminal
is to replace function g_unichar_iswide() with
a simple function (in vte/src/vte.c) which calls
iconv() to calculate the width. It works well and
the only drawback is the speed.

Comment 4 Owen Taylor 2003-06-02 21:39:54 UTC

Basing it off the locale wouldn't give the right results
for gnome-terminal or other applications. Imagine a user
running in an English locale, but:

 Going to a chinese web page
 Reading a chinese email

Or vice-versa.

Comment 5 Nalin Dahyabhai 2003-06-03 02:32:56 UTC

GNU libc's wcwidth() implementation sets the widths of characters on a
per-encoding basis, and assumes that the current codeset provides the
correct widths.

For example, in the ja_JP.UTF-8 locale, glibc treats ambiguous-width
characters as single-width, but in ja_JP.eucJP, it treats them as
double-width.

Something along the lines of a g_unichar_is_ambiguously_wide()
function would probably be more useful because it would allow
applications to select the right value for these cases, including
gnome-terminal which frequently deals with data which is encoded in a
non-default encoding.

Comment 6 Nalin Dahyabhai 2003-06-03 02:34:06 UTC

Created attachment 17082 [details] [review]
patch to implement (slow) g_unichar_is_ambiguously_wide

Comment 7 Behdad Esfahbod 2006-04-19 03:07:07 UTC

*** Bug 338305 has been marked as a duplicate of this bug. ***

Comment 8 Behdad Esfahbod 2006-04-27 07:55:57 UTC

Created attachment 64371 [details] [review]
Implement g_unichar_iswide_cjk()

Copying table from Markus Kuhn and using bsearch(3).

The data seems to be the same for Unicode 4.1 and 5.0.

Comment 9 Matthias Clasen 2006-04-27 15:15:26 UTC

looks fine to me in principle. how big is the table ? probably not worth 
putting the single character ranges in a separate table to save some space,
or is it ?

Comment 10 Behdad Esfahbod 2006-04-27 20:44:37 UTC

The table is between 1kb and 1.5kb.  I don't see it worth saving like 500 bytes (at most) at the cost of having to generate the table ourselves.  And I know I'm going to replace these all within a year or two when I write a separate library for UCD...

Comment 11 Matthias Clasen 2006-04-27 20:50:31 UTC

fine with me (keeping one table)
but separate library == more dirty pages...

Comment 12 Behdad Esfahbod 2006-04-27 21:14:38 UTC

2006-04-27  Behdad Esfahbod  <behdad@gnome.org>

        * docs/reference/glib/glib-sections.txt,
        * glib/gunicode.h glib/guniprop.c: Implement g_unichar_iswide_cjk().
        (#105626)

Comment 13 Behdad Esfahbod 2006-04-27 22:42:03 UTC

We can start by including the generated sources in glib.  The idea is to have a single library that everybody uses to access UCD, to not have to rerun a zillion different scripts in a million modules to update to the next UCD version...

Comment 14 Matthias Clasen 2006-05-13 03:21:53 UTC

Moving off API freeze milestone, since the API was added.