GNOME Bugzilla – Bug 313411
Update to UCD 4.1.0
Last modified: 2005-11-29 12:26:35 UTC
Of course it's needed. Noah: Gnome 2.12?
I wanted to attached the patch, but it's about 5MB. I put it here: http://zwnj.org/proj/gucharmap/zwnj--gucharmap--unicode-.h--01.patch Noah, would you please commit it for gnome 2.12? Behdad, would you take a look at it and the patch for gen?
Behnam, this is fantastic, thank you. Some things: - getting some warnings like this: (lt-gucharmap:5839): Pango-CRITICAL **: pango_context_get_matrix: assertion `PANGO_IS_CONTEXT (context)' failed - the first character that shows up under the "Common" script for me is U+0242, not sure what happened to U+0000 &c. - if the patch is applied now won't it break string freeze?
Noah, The warning may be what I reported and Owen fixed a while back. Check Pango ChangeLog for example. About string freeze, yes. We were thinking about asking for a permission. We are pushing 4.1 into Pango for 2.12, and makes a lot of sense here. I'll go ahead and ask the i18n list. But the only way they may agree is that the patch is applied today. At least the parts that affect translation. AFAI guess the only visible change is the script names, right? that means some 10 new items... that should be fine. We can apply that part of the patch today and work out the rest. What do you think?
Behdad, that sounds good, you can commit the script names. You have my permission and gratitude.
Created attachment 50741 [details] [review] Update of gucharmap/unicode-scripts.h for UCD 4.1.0 - Breaks String Freeze It has 8 new translatable string.
Created attachment 50742 [details] [review] Update of gucharmap/unicode-blocks.h for UCD 4.1.0 - Breaks String Freeze It has 21 new/changed translatable strings too.
btw, compiling with all unicode-*.h updated, i just get this in stderr, that i cannot understand what's the problem. 144 /bin/sh ../libtool --mode=link gcc -DXTHREADS -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/gtk-2.0 -I/usr/lib/gtk-2.0/include -I/usr/X11R6/include -I/usr/include/atk-1.0 -I/usr/include/pango-1.0 -I/usr/include/freetype2 -DORBIT2=1 -pthread -DXTHREADS -I/usr/include/libgnomeui-2.0 -I/usr/include/libgnome-2.0 -I/usr/include/libgnomecanvas-2.0 -I/usr/include/gtk-2.0 -I/usr/include/libart-2.0 -I/usr/include/gconf/2 -I/usr/include/libbonoboui-2.0 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/orbit-2.0 -I/usr/include/libbonobo-2.0 -I/usr/include/gnome-vfs-2.0 -I/usr/lib/gnome-vfs-2.0/include -I/usr/include/bonobo-activation-2.0 -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/lib/gtk-2.0/include -I/usr/X11R6/include -I/usr/include/atk-1.0 -I/usr/include/libxml2 -g -O2 -o libgucharmap.la -rpath /home/behnam/opt/gucharmap//lib -version-info 4:3: 0 gucharmap-marshal.lo gucharmap-intl.lo chartable_accessible.lo charcell_accessible.lo gucharmap-charmap.lo guchar
*** Bug 317164 has been marked as a duplicate of this bug. ***
Noah: I found the problem. Scripts.txt for UCD <= 4.0.0 doesn't contain Common script. So you listed all characters with no script in Common script. UCD >= 4.0.1 has the Common script. I was going to create an Unassigned script and list all characters with no script in that. But Scripts.txt (also 4.1.0) says: # All code points not explicitly listed for Script # have the value Common (Zyyy). So we have two type of characters in Common script: listed and unlisted. I'm going to add listed Common characters in Common block too.
Adding "Unassigned" and "Private Use" characters to "Common" script is not user friendly. I don't like this personally too. Should gucharmap follow unicode's rule in this case?
I'm attaching a patch to list characters without script in "Unassigned" script. It works well with all UCD versions. But another problem is some Private Use characters are listed and some are not! Noah, Behdad, help please! :D
Created attachment 54919 [details] [review] Listing characters without script in "Unassigned" script instead of "Common" Also contains patch 54915 [http://bugzilla.gnome.org/attachment.cgi?id=54915&action=view]. With this patch, there will be no problem with updating unicode-* to 4.1.0 version.
Ah, good old duplication! We've been over Scripts.txt in Pango recently. I believe having the Common value for Script is orthogonal to being Unassigned and to being PrivateUse. So, keep the Common. Have another property for being assigned, another for being private use.
Behdad, so what's your suggestion? Here's mine: Following Scripts.txt's comment and put all unlisted items in Common script. But add two options to View menu like "Show Unassigned Codes" and "Show Private Use Codes". Noah, are you ok with this?
Not sure about "Show ..." menu items. I don't like adding any new menu items. Just go on with Common like Scripts.txt says. I'm not sure what is it that you are suggesting.
Created attachment 54994 [details] [review] Adds Common script support for Scripts.txt (for UCD >= 4.0.1) gucharmap/gucharmap-script-codepoint-list.c: - Merges get_other_chars function (for unlisted ranges) in get_chars_for_script. gucharmap/gen-guch-unicode-tables.pl: - Correcting comments. - Adding UCD version to generated unicode-*.h files.
And here is the generated unicode-*.h files for UCD 4.1.0: http://zwnj.org/proj/gucharmap/zwnj--gucharmap--unicode-.h--UCD-4.1--02.patch Noah, Behdad, would you take a look at previous patch please?
Created attachment 54998 [details] BZ2 compressed of zwnj--gucharmap--unicode-.h--UCD-4.1--02.patch
Thanks Behnam. 2005-11-22 Behdad Esfahbod <behdad@gnome.org> Update to Unicode 4.1 Character Database. (#313411, Behnam Esfahbod) * gucharmap/gucharmap-script-codepoint-list.c, * gucharmap/gen-guch-unicode-tables.pl: Updated to handle 4.1 data. * gucharmap/unicode-blocks.h, * gucharmap/unicode-categories.h, * gucharmap/unicode-names.h, * gucharmap/unicode-nameslist.h, * gucharmap/unicode-scripts.h, * gucharmap/unicode-unihan.h: Updated outputs of above scripts.
I agree with Behnam-- we should make unassigned codepoints, control characters, private use, surrogates, and other noncharacters separate items in the script list. It's ok with me if we indicate somehow in the ui that they're technically part of Common. Right now Common contains many useful characters but is an enormous pain to browse.
A good start is a tree view that each script (eg, Common, Arabic, etc), expands to it's Unicode blocks.