Bug 313411 – Update to UCD 4.1.0

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 313411 - Update to UCD 4.1.0


Summary:	Update to UCD 4.1.0


Status:	RESOLVED FIXED

Product:	gucharmap
Classification:	Core
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Noah Levitt
QA Contact:	Noah Levitt

URL:
Whiteboard:

Duplicates:	317164 (view as bug list)
Depends on:	313409
Blocks:

Reported:	2005-08-13 18:38 UTC by Behnam Esfahbod
Modified:	2005-11-29 12:26 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Update of gucharmap/unicode-scripts.h for UCD 4.1.0 - Breaks String Freeze (43.58 KB, patch) 2005-08-15 20:39 UTC, Behnam Esfahbod	none	Details \| Review
Update of gucharmap/unicode-blocks.h for UCD 4.1.0 - Breaks String Freeze (4.50 KB, patch) 2005-08-15 20:41 UTC, Behnam Esfahbod	none	Details \| Review
Listing characters without script in "Unassigned" script instead of "Common" (5.08 KB, patch) 2005-11-18 15:50 UTC, Behnam Esfahbod	none	Details \| Review
Adds Common script support for Scripts.txt (for UCD >= 4.0.1) (7.72 KB, patch) 2005-11-20 22:01 UTC, Behnam Esfahbod	none	Details \| Review
BZ2 compressed of zwnj--gucharmap--unicode-.h--UCD-4.1--02.patch (775.42 KB, application/octet-stream) 2005-11-20 22:30 UTC, Behnam Esfahbod		Details

Description Behnam Esfahbod 2005-08-13 18:38:48 UTC

Of course it's needed.

Noah:  Gnome 2.12?

Comment 1 Behnam Esfahbod 2005-08-15 18:38:56 UTC

I wanted to attached the patch, but it's about 5MB.  I put it here:
http://zwnj.org/proj/gucharmap/zwnj--gucharmap--unicode-.h--01.patch


Noah, would you please commit it for gnome 2.12?

Behdad, would you take a look at it and the patch for gen?

Comment 2 Noah Levitt 2005-08-15 20:09:21 UTC

Behnam, this is fantastic, thank you. Some things:

 - getting some warnings like this: (lt-gucharmap:5839): Pango-CRITICAL **:
pango_context_get_matrix: assertion `PANGO_IS_CONTEXT (context)' failed

 - the first character that shows up under the "Common" script for me is U+0242,
not sure what happened to U+0000 &c. 

 - if the patch is applied now won't it break string freeze?

Comment 3 Behdad Esfahbod 2005-08-15 20:13:56 UTC

Noah,  The warning may be what I reported and Owen fixed a while back.  Check
Pango ChangeLog for example.

About string freeze, yes.  We were thinking about asking for a permission.  We
are pushing 4.1 into Pango for 2.12, and makes a lot of sense here.  I'll go
ahead and ask the i18n list.  But the only way they may agree is that the patch
is applied today.  At least the parts that affect translation.  AFAI guess the
only visible change is the script names, right?  that means some 10 new items...
 that should be fine.  We can apply that part of the patch today and work out
the rest.

What do you think?

Comment 4 Noah Levitt 2005-08-15 20:24:14 UTC

Behdad, that sounds good, you can commit the script names. You have my
permission and gratitude.

Comment 5 Behnam Esfahbod 2005-08-15 20:39:41 UTC

Created attachment 50741 [details] [review]
Update of gucharmap/unicode-scripts.h for UCD 4.1.0 - Breaks String Freeze

It has 8 new translatable string.

Comment 6 Behnam Esfahbod 2005-08-15 20:41:17 UTC

Created attachment 50742 [details] [review]
Update of gucharmap/unicode-blocks.h for UCD 4.1.0 - Breaks String Freeze

It has 21 new/changed translatable strings too.

Comment 7 Behnam Esfahbod 2005-08-15 20:47:16 UTC

btw, compiling with all unicode-*.h updated, i just get this in stderr, that i
cannot understand what's the problem.

144 /bin/sh ../libtool --mode=link gcc -DXTHREADS -I/usr/include/glib-2.0
-I/usr/lib/glib-2.0/include -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include
-I/usr/X11R6/include -I/usr/include/atk-1.0 -I/usr/include/pango-1.0
-I/usr/include/freetype2   -DORBIT2=1 -pthread -DXTHREADS
-I/usr/include/libgnomeui-2.0 -I/usr/include/libgnome-2.0
-I/usr/include/libgnomecanvas-2.0 -I/usr/include/gtk-2.0
-I/usr/include/libart-2.0 -I/usr/include/gconf/2 -I/usr/include/libbonoboui-2.0
-I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/orbit-2.0
-I/usr/include/libbonobo-2.0 -I/usr/include/gnome-vfs-2.0
-I/usr/lib/gnome-vfs-2.0/include -I/usr/include/bonobo-activation-2.0
-I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/lib/gtk-2.0/include
-I/usr/X11R6/include -I/usr/include/atk-1.0 -I/usr/include/libxml2   -g -O2   -o
libgucharmap.la -rpath
/home/behnam/opt/gucharmap//lib -version-info 4:3: 0 gucharmap-marshal.lo
gucharmap-intl.lo chartable_accessible.lo charcell_accessible.lo
gucharmap-charmap.lo guchar

Comment 8 Behnam Esfahbod 2005-11-18 12:44:31 UTC

*** Bug 317164 has been marked as a duplicate of this bug. ***

Comment 9 Behnam Esfahbod 2005-11-18 15:05:25 UTC

Noah: I found the problem.

Scripts.txt for UCD <= 4.0.0 doesn't contain Common script.  So you listed all
characters with no script in Common script.

UCD >= 4.0.1 has the Common script.  I was going to create an Unassigned script
and list all characters with no script in that.

But Scripts.txt (also 4.1.0) says:
#  All code points not explicitly listed for Script
#  have the value Common (Zyyy).

So we have two type of characters in Common script: listed and unlisted.  I'm
going to add listed Common characters in Common block too.

Comment 10 Behnam Esfahbod 2005-11-18 15:22:40 UTC

Adding "Unassigned" and "Private Use" characters to "Common" script is not user
friendly.  I don't like this personally too.

Should gucharmap follow unicode's rule in this case?

Comment 11 Behnam Esfahbod 2005-11-18 15:47:43 UTC

I'm attaching a patch to list characters without script in "Unassigned" script.
 It works well with all UCD versions.

But another problem is some Private Use characters are listed and some are not!

Noah, Behdad, help please! :D

Comment 12 Behnam Esfahbod 2005-11-18 15:50:00 UTC

Created attachment 54919 [details] [review]
Listing characters without script in "Unassigned" script instead of "Common"

Also contains patch 54915
[http://bugzilla.gnome.org/attachment.cgi?id=54915&action=view].

With this patch, there will be no problem with updating unicode-* to 4.1.0
version.

Comment 13 Behdad Esfahbod 2005-11-18 19:38:27 UTC

Ah, good old duplication!  We've been over Scripts.txt in Pango recently.

I believe having the Common value for Script is orthogonal to being Unassigned
and to being PrivateUse.  So, keep the Common.  Have another property for being
assigned, another for being private use.

Comment 14 Behnam Esfahbod 2005-11-19 17:17:51 UTC

Behdad, so what's your suggestion?

Here's mine:  Following Scripts.txt's comment and put all unlisted items in
Common script.  But add two options to View menu like "Show Unassigned Codes"
and "Show Private Use Codes".

Noah, are you ok with this?

Comment 15 Behdad Esfahbod 2005-11-19 23:23:06 UTC

Not sure about "Show ..." menu items.  I don't like adding any new menu items. 
Just go on with Common like Scripts.txt says.  I'm not sure what is it that you
are suggesting.

Comment 16 Behnam Esfahbod 2005-11-20 22:01:38 UTC

Created attachment 54994 [details] [review]
Adds Common script support for Scripts.txt (for UCD >= 4.0.1)

gucharmap/gucharmap-script-codepoint-list.c:
- Merges get_other_chars function (for unlisted ranges) in
get_chars_for_script.

gucharmap/gen-guch-unicode-tables.pl:
- Correcting comments.
- Adding UCD version to generated unicode-*.h files.

Comment 17 Behnam Esfahbod 2005-11-20 22:04:40 UTC

And here is the generated unicode-*.h files for UCD 4.1.0:
http://zwnj.org/proj/gucharmap/zwnj--gucharmap--unicode-.h--UCD-4.1--02.patch

Noah, Behdad, would you take a look at previous patch please?

Comment 18 Behnam Esfahbod 2005-11-20 22:30:30 UTC

Created attachment 54998 [details]
BZ2 compressed of zwnj--gucharmap--unicode-.h--UCD-4.1--02.patch

Comment 19 Behdad Esfahbod 2005-11-22 07:46:43 UTC

Thanks Behnam.

2005-11-22  Behdad Esfahbod  <behdad@gnome.org>

        Update to Unicode 4.1 Character Database. (#313411, Behnam Esfahbod)

        * gucharmap/gucharmap-script-codepoint-list.c,
        * gucharmap/gen-guch-unicode-tables.pl: Updated to handle 4.1 data.

        * gucharmap/unicode-blocks.h,
        * gucharmap/unicode-categories.h,
        * gucharmap/unicode-names.h,
        * gucharmap/unicode-nameslist.h,
        * gucharmap/unicode-scripts.h,
        * gucharmap/unicode-unihan.h: Updated outputs of above scripts.

Comment 20 Noah Levitt 2005-11-28 21:12:18 UTC

I agree with Behnam-- we should make unassigned codepoints, control characters,
private use, surrogates, and other noncharacters separate items in the script
list. It's ok with me if we indicate somehow in the ui that they're technically
part of Common. Right now Common contains many useful characters but is an
enormous pain to browse.

Comment 21 Behdad Esfahbod 2005-11-29 12:26:35 UTC

A good start is a tree view that each script (eg, Common, Arabic, etc), expands
to it's Unicode blocks.