GNOME Bugzilla – Bug 321896
Synch gdkkeysyms.h/gtkimcontextsimple.c with X.org 6.9/7.0
Last modified: 2008-12-10 02:10:07 UTC
In the gtk+ library, the files gdk/gdkkeysyms.h and gtk/gtkimcontextsimple.h contain information which come from the X server. This information should be in synch. These two files are severely out of date when compared to the current X.org (6.9/7.0). Specifically, gdk/gdkkeysyms.h: Has 1341 keysyms, but now X.org defines 1708 keysyms. gtk/gtkimcontextsimple.h: Has 842 compose sequenes, but now X.org defines 5545 of them. There should be a way to easily update these files and keep them in synch with upstream, with X.org.
Created attachment 54955 [details] Updates gdkkeysyms.h with keysymdef.h from X.org 6.9/7.0 Updates http://cvs.gnome.org/viewcvs/gtk%2B/gdk/gdkkeysyms.h from upstream (X.org 6.9/7.0), from http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h Author : Simos Xenitellis <simos at gnome dot org>. Version : 1.0 Input : http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h Output : http://cvs.gnome.org/viewcvs/gtk%2B/gdk/gdkkeysyms.h Notes : It downloads keysymdef.h from the Internet if not found locally Notes : and creates an updated gdkkeysyms.h (checks not to overwrite).
*** Bug 167940 has been marked as a duplicate of this bug. ***
The en_US.UTF-8 Compose file http://cvs.freedesktop.org/xorg/xc/nls/Compose/en_US.UTF-8 appears not to be sync with the keysymdef.h file http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h A bug has been logged for this at the FreeDesktop Bugzilla, https://bugs.freedesktop.org/show_bug.cgi?id=5107
The Compose file, http://cvs.freedesktop.org/xorg/xc/nls/Compose/en_US.UTF-8 contains unicode codepoints in addition to keysyms. <U0313> <Greek_alpha> : "ἀ" U1F00 # GREEK SMALL LETTER ALPHA WITH PSILI U0313 is COMBINING COMMA ABOVE, so a comparison is possible with 0x0313. However, http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h?view=markup has keysyms with values that conflicts with Unicode. For example, in the URL above, search for "Latin 4". You will notice the Latin 4 keysym group conflicts with the Greek Unicode block. Pending this issue, the script is ready to update gtk/gtkimcontextsimple.c.
Created attachment 57953 [details] WORK In PROGRESS - Updates gtkimcontextsimple.c automagically To update the main structure in gtkimcontextsimple.c requires access to several files and combining them together. This script does exactly that. It is marked as work in progress as the Xorg Compose file contains some constructs that I do not know how to process.
Changing status to NEEDINFO. This bug report is almost there to be fixed. Some issues, described above, need to be attended and we are done! :)
the way in which i would like to see this addressed is by keeping the generated files in cvs. therefore, it is not the end of the world if the script output needs some manual tweaking...
(In reply to comment #4) > The Compose file, > http://cvs.freedesktop.org/xorg/xc/nls/Compose/en_US.UTF-8 > contains unicode codepoints in addition to keysyms. > <U0313> <Greek_alpha> : "ἀ" U1F00 # GREEK SMALL LETTER ALPHA WITH PSILI > > U0313 is COMBINING COMMA ABOVE, so a comparison is possible with 0x0313. > > However, > http://cvs.freedesktop.org/xorg/xc/include/keysymdef.h?view=markup > has keysyms with values that conflicts with Unicode. > For example, in the URL above, search for "Latin 4". > You will notice the Latin 4 keysym group conflicts with the Greek Unicode block. > > Pending this issue, the script is ready to update gtk/gtkimcontextsimple.c. The main issue is that the affected keysyms ("Latin 4" group but some others as well) should have 0x1000000 added to their values so that they do not conflict with real Unicode codepoints that may exist. I filed a bug report on this, at https://bugs.freedesktop.org/show_bug.cgi?id=5129
The new upstream location of the Compose files for X.org modular (compared to monolithic) is http://webcvs.freedesktop.org/xorg/lib/X11/nls/ The exact file is http://webcvs.freedesktop.org/xorg/lib/X11/nls/en_US.UTF-8/Compose.pre?view=markup
Bug 155010 has a patch that makes the compose sequences table configurable. That is, the user would be able to override the built-in compose sequences with a configuration file found in, let's say, /etc/gtk+/compose/. I am not sure if there are any performance issues with such a configuration. In any case, both bug 155010 and this bug require to bring from upstream the new Compose file.
Simos, any update on this ? If I understand Daniels comment o the fd.o bug correctly, what you script needs to do is use existing legacy keysyms where they exist, and otherwise use Unicode keysyms with the added 0x100000
Matthias, there are a couple of questions that are still pending. 1. The Compose file has some keysyms of the form "combining_*" that I could not find the value of. I do not know where they are defined so I cannot assign them a value. One option could be to ignore the compose sequences that have them in GTK+ IM. I filed an issue on this, at https://bugs.freedesktop.org/show_bug.cgi?id=5107 2. The Compose file has legacy and Unicode keysyms. The Unicode keysyms do not have 0x100000 added to them yet in the current Compose file. As far as I understand, GTK+ IM does not depend on the content of the Compose file. Is that correct? Therefore, are we not blocked by https://bugs.freedesktop.org/show_bug.cgi?id=5129 ? I assume there is already code in Xorg that understands x+0x100000 keysyms. Once we have a view on the two issues above, it should be easy to get patches.
simos: 1) yeah, ignore this issue for the time being: i'll fix it a bit later on. 2) the Compose file and GTK are independent, so yes, you can freely ignore that. however, as I explained in #5129, some legacy keysyms have co-incidences with Unicode keysyms, and you just need to ignore that: 0x31B2 is not guaranteed to be U+31B2, or whatever.
Created attachment 63781 [details] Patch that updates gtkimcontextsimple.c to the latest Compose file in Xorg 7.x The patch applies to HEAD.
Created attachment 63782 [details] Script that automagically updates gtkimcontextsimple.c from Compose.pre in Xorg. Updates gtk+/gtk/gtkimcontextsimple.c from Compose.pre found at Xorg 7.0.
Created attachment 63783 [details] Generated gdkkeysyms.h from keysymdef.h in Xorg 7.x We used the script that is shown below to autogenerate the header file.
Created attachment 63784 [details] Script that automagically updates gtk+/gdk/gdkkeysyms.h from keysymdef.h in Xorg. This script uses the new location of keysymdef.h of modular Xorg (7.x).
Tor, I am adding you to this report as it affects GTK+/Windows as well (hope that's ok). This report tries to update the compose sequence table in GTK+ (gtk+/gtk/gtkimcontextsimple.c, gtk+/gdk/gdkkeysyms.h) from upstream, Xorg 7.0. Checking the history of http://cvs.gnome.org/viewcvs/gtk+/gtk/gtkimcontextsimple.c I can see that at least two compose sequences specific to Windows were added, as shown in bug 164859. Is there a bigger list that can be merged or is it just the following lines in bug 164859? + GDK_Greek_accentdieresis, GDK_Greek_iota, 0, 0, 0, 0x0390, /* GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS */ + GDK_Greek_accentdieresis, GDK_Greek_upsilon, 0, 0, 0, 0x03B0, /* GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS */
Tor, now I am adding you really. Please see message above.
I only know I added those two entries on Daniel Atallah's request. I don't really have any clue about Greek keyboards otherwise ;) Are the key sequences mentioned in bug #164859 not usable on Greek X11 keyboards?
(In reply to comment #20) > I only know I added those two entries on Daniel Atallah's request. I don't > really have any clue about Greek keyboards otherwise ;) Are the key sequences > mentioned in bug #164859 not usable on Greek X11 keyboards? > I see. Windows has a specific key to produce accentdiaeresis while Xorg is commonly configured to produce one at a time (accent or diaeresis), so the addition of accentdiaeresis makes sense. There will be need for similar work when Greek Polytonic is added with these patches.
I just tried out the patches listed above using jhbuild and they work. I used the released gtk+ (the HEAD version does not compile today). I tested with Hungarian where you can put a dot on letters and now it works (previously it was not available). I tested with Spanish and there was no regression. I also tested with Ancient Greek and it worked well. A few compose sequences though where not available due to incosistencies in the Xorg file which we are working on.
Note that we might need extending gtk+ to support decomposed characters as well (for accented Cyrillic), i.e. a key sequence to result in several unicode characters instead of one (just like Compose files allow).
(In reply to comment #23) > Note that we might need extending gtk+ to support decomposed characters as well > (for accented Cyrillic), i.e. a key sequence to result in several unicode > characters instead of one (just like Compose files allow). > Danilo, Could you please file a bug report about this? I could not create a test case for this. I tried picking the Unicode characters from http://webcvs.freedesktop.org/xorg/lib/X11/nls/en_US.UTF-8/Compose.pre?view=markup and placing them in http://people.w3.org/rishida/scripts/uniview/conversion I tried manually and I could not find characters composed of more than one character. It looks like all are precomposed? This discussion can continue at the new bug report.
Simos, it's reported as bug #341341 (I thought I already discussed this with Owen back in 2003, but I may be lost altogether ;).
ok, in order to stop blocking on this and make progress on this, I compared your gdkkeysyms.h with the current one, and things look mostly fine (ie just additions). The one thing I stumbled over was XK_CURRENCY, where I see -#define GDK_EcuSign 0x20a0 -#define GDK_ColonSign 0x20a1 -#define GDK_CruzeiroSign 0x20a2 -#define GDK_FFrancSign 0x20a3 -#define GDK_LiraSign 0x20a4 -#define GDK_MillSign 0x20a5 -#define GDK_NairaSign 0x20a6 -#define GDK_PesetaSign 0x20a7 -#define GDK_RupeeSign 0x20a8 -#define GDK_WonSign 0x20a9 -#define GDK_NewSheqelSign 0x20aa -#define GDK_DongSign 0x20ab +#define GDK_EcuSign 0x10020a0 +#define GDK_ColonSign 0x10020a1 +#define GDK_CruzeiroSign 0x10020a2 +#define GDK_FFrancSign 0x10020a3 +#define GDK_LiraSign 0x10020a4 +#define GDK_MillSign 0x10020a5 +#define GDK_NairaSign 0x10020a6 +#define GDK_PesetaSign 0x10020a7 +#define GDK_RupeeSign 0x10020a8 +#define GDK_WonSign 0x10020a9 +#define GDK_NewSheqelSign 0x10020aa +#define GDK_DongSign 0x10020ab why is this ? have the legacy keysyms be replaced by unicode ones for XK_CURRENCY ?
Looking at the imcontext simple compose sequences, there is fairly obvious problem: with non-bmp keysyms, we need to go from guint16 to guint32, and we also seem to have a lot more sequences. The table size grows from 10116 to 113520, which is clearly a problem. At this size, we should probably look at going from the flat representation + bsearch to a tree
(In reply to comment #26) > ok, in order to stop blocking on this and make progress on this, > I compared your gdkkeysyms.h with the current one, and things look > mostly fine (ie just additions). The one thing I stumbled over was > XK_CURRENCY, where I see > > -#define GDK_EcuSign 0x20a0 > -#define GDK_ColonSign 0x20a1 > -#define GDK_CruzeiroSign 0x20a2 > -#define GDK_FFrancSign 0x20a3 > -#define GDK_LiraSign 0x20a4 > -#define GDK_MillSign 0x20a5 > -#define GDK_NairaSign 0x20a6 > -#define GDK_PesetaSign 0x20a7 > -#define GDK_RupeeSign 0x20a8 > -#define GDK_WonSign 0x20a9 > -#define GDK_NewSheqelSign 0x20aa > -#define GDK_DongSign 0x20ab > > +#define GDK_EcuSign 0x10020a0 > +#define GDK_ColonSign 0x10020a1 > +#define GDK_CruzeiroSign 0x10020a2 > +#define GDK_FFrancSign 0x10020a3 > +#define GDK_LiraSign 0x10020a4 > +#define GDK_MillSign 0x10020a5 > +#define GDK_NairaSign 0x10020a6 > +#define GDK_PesetaSign 0x10020a7 > +#define GDK_RupeeSign 0x10020a8 > +#define GDK_WonSign 0x10020a9 > +#define GDK_NewSheqelSign 0x10020aa > +#define GDK_DongSign 0x10020ab > > why is this ? have the legacy keysyms be replaced by unicode ones > for XK_CURRENCY ? > According to http://webcvs.freedesktop.org/xorg/proto/X11/keysymdef.h?view=markup only "XK_EuroSign" is a legacy keysym. The rest are Unicode keysyms. Marcus Khun did this change 10 months ago: http://webcvs.freedesktop.org/xorg/proto/X11/keysymdef.h?r1=1.2&r2=1.3 Also notice that (same source as above) #define XK_EcuSign 0x10020a0 /* U+20A0 EURO-CURRENCY SIGN */ #define XK_EuroSign 0x20ac /* U+20AC EURO SIGN */ Also, according to http://www.unicode.org/charts/PDF/U20A0.pdf XK_EuroSign is favoured over XK_EcuSign ("U+20A0 EURO-CURRENCY SIGN").
ok, that sounds good enough to me for the keysyms. I'll commit that part.
2006-05-11 Matthias Clasen <mclasen@redhat.com> * gdk/gdkkeysyms.h: Regenerated from Xorg 7.1 keysyms.h, using gdkkeysyms-update.pl. * gdk/gdkkeysyms-update.pl: Script to sync gdkkeysyms.h with Xorg. (#321896, Simos Xenitellis) * gdk/Makefile.am (EXTRA_DIST): Add gdkkeysyms-update.pl
(In reply to comment #27) > Looking at the imcontext simple compose sequences, there is fairly > obvious problem: with non-bmp keysyms, we need to go from guint16 to > guint32, and we also seem to have a lot more sequences. The table > size grows from 10116 to 113520, which is clearly a problem. At this > size, we should probably look at going from the flat representation > + bsearch to a tree > It might be good to also split the compose sequences in both upstream (Xorg) and GTK+ into groups based on the language and get the end-user "decide" through the configuration which languages to be actually supported. In Ubuntu, for example, you can pick and choose the writing aids for each of the supported languages. As it is now, languages that a user may never write in are potentially available. For example, Ancient Greek (Polytonic) currently takes about 35-40% of the compose sequences. For GNOME to manage different languages, something like bug 155010 would be able to help. Is the imcontext simple compose sequences table loaded just once and shared between GTK+ applications?
The table is compiled into GTK+ itself, as const data. Thus it is shared between apps.
The api affecting part of this has been committed; somebody needs to devise a compact table format for the additional sequences.
As mentioned above, it looks suitable to use a tree structure for the table. Looking at Glib, the N-ary tree (http://developer.gnome.org/doc/API/2.0/glib/glib-N-ary-Trees.html) might be a good option. Is there a good way to represent (serialise?) a tree as text so that it is included verbatim in the GTK+ source code? Should the table be instead saved as is and let GTK+ parse it on startup creating the tree?
No, I don't think using a runtime-generated pointerized tree structure like that is the right approach. It should still be a compiled in array of numbers, just begin interpreted as a tree structure instead of the current flat table. Not sure about the best way of doing that.
Created attachment 68995 [details] Fragment of compose sequence table which shows what we want to convert from. This is a fragment of the compose table (array) that shows what we already have. Notice that the first column has lots of repeats. This is first area of optimisation. Also notice that there are several 0s. This is the second area of optimisation.
Created attachment 68996 [details] Converted version of the preview fragment to optimise on memory. This is the suggested format, that will be generated by a script taking as input the Compose file from Xorg. We save space by reducing the repetitions in the first column. We also save space by eliminating the superfluous zeros. Some figures for the space we save: =====> Some stats for you. We have 4730 lines, with 6 guint32s per line, total 113520 bytes From all keysyms, 14190 have the value of zero and take up 56760 bytes. By optimising on the zeros, we end up occupying 56760 bytes. Also, we optimise on the first column as from each of the 4730 lines, there are less than about 30 different keysyms. So we save a further approx. 18800 bytes. So, total savings are 75560 bytes, we occupy 37960 bytes. Of course, take into account some memory overhead to support the optimisation. =====| The importance is for the data structure to be shared among GTK+ applications. As static const, I believe we achieve this. If there are any comments at this stage to enhance the format, please add here. The next step is to write the conversion script (easy) and then plug the structure in gtkimcontextsimple.c.
Created attachment 69014 [details] Script that automagically updates gtkimcontextsimple.c from Compose.pre in Xorg, for the memory-optimised version of the table. Script that automagically updates gtkimcontextsimple.c from Compose.pre in Xorg, for the memory-optimised version of the table. The script will create a patched up version of gtkimcontextsimple.c with the new data structure. It is not usable yet as the code that implements the searching has not been adapted yet. Will do once I get a buildable GNOME using jhbuild.
Created attachment 69030 [details] gtkimcontextsimple.c with the latest upstream Compose data, arranged to save memory. We obsolete the previous unoptimised file, however there is still a bit of work to do to recode the search algorithm. That is, this file does not let us compile yet.
Thanks for this work. In its current form, this table needs relocations, since it uses pointers to point to the subtables, and thus it won't be shared (unless you use prelink). You need to replace the pointers by offsets to arrive at something that can used without relocations.
Created attachment 91207 [details] [review] Updated generation script, updated compose table, move compose table to separate file. Applies to trunk. This is an updated version of the initial script; a. we take out the compose table by putting in a separate file b. we generate a fresh compose table based on upstream 1. after we 1,$s/U1000/U/g (we verify we did not touch the U1000 character) 2. after we remove U1xxxx (Plane 1) sequences. It's guint16 anyway. 3. after we replace the Greek section with the one from the el_GR.UTF-8/Compose.pre upstream file; c. we add some auxiliary files generated by the script into .cvsignore. (ok for SVN?) Files Patch: /gtk/gtkimcontextsimple.c Added: /gtk/gtkimcontextsimpleseqs.h Added: /gtk/compose-sequence-update.pl Patch: /gtk/.cvsignore Space calculations: a. Currently, the compose table takes up 10164 bytes, with 847 entries (847x6x2) b. New compose table without space optimisations takes up 54120 bytes, with 4510 entries (4510x6x2) c. The first column in the compose table has many repetitions. If we eliminate, the table will take up ~45100 bytes (10164x[5]x2), a saving of about 9000 bytes. Complexity when optimising the table gtkimcontextsimple.c does three operations on the compose table, 1. run bsearch() 2. uses pointer to item 3. get next item 4. get previous item To avoid the repetitions of the first column, we can use separate arrays based on the value of the first column. We generate about 30 such arrays. In order to bsearch() through those arrays, we use a script that implements (=generates C code) binary search through nested conditional statements (done). Overall, I find it would make the code quite complicated at this stage to squeeze these extra bytes. This patch has been tested for the Greek language (Ancient greek now works) and Latin (US International w/ dead keys).
I don't think the "get previous" and "get next" operations are actually necessary for the gtkimcontextsimple use case. What is required is the information "does it not match, match a prefix, or match exactly ?" The patch does not actually work, since it still uses guint16, while some of the keysyms in the table are larger than that by now. Looking a bit closer, there are 44 rows containing non-BMP keysyms. I'd propose to put those into a separate guint32 table to avoid blowing up the data size needlessly. Looking at the remaining BMP keysyms, there are two things we could do to reduce the size: - split the table by length of the sequence, since a lot of the entries are just 2 or 3 keys long. - looking at the first column, there are only 31 different starting symbols, so it might be worthwhile to split the first column off That would lead to a table roughtly of the following form, with off2 in the startkey1 line pointing to the remainder of the first sequence of length 2 starting with startkey1, and so on: { /* offsets */ startkey1, off2, off3, off4, off5, startkey2, off2, off3, off4, off5, ... startkey31, off2, off3, off4, off5, 0, /* sequences of length 2 */ key2, value, ... /* sequences of length 3 */ key2, key3, value, ... } From a quick run over your tables, it looks like this table layout would reduce the size of the BMP table from ~55k to ~30k, with the non-BMP table being at ~1k. It should still be possible to use bsearch() to find seq[0] in the offsets part, and then use bsearch repeatedly to find seq[1]...seq[n] in the tables of the right length.
I think what should be done is to remove those sequences that are painfully self-evident from the table, and instead just add small amount of code to to the logical thing: If we get one or two dead keysyms, and together with the following keysym they combine into a precomposed unicode character, use that. Surely it is possible to deduce this without have explicit entries in the table for each sequence? All the dead keysyms are between 0xFE50 and 0xFE62 (and no other keysyms are in that interval), so it is trivial to determine if a keysym is "dead". Then one just converts the dead keysym(s) into the corresponding Unicode combining mark(s), append them after the letter that follows into a string, and check if that string normalizes (using NFC) into a single, precomposed Unicode character.
Created attachment 92007 [details] Updated version of script that converts Xorg Compose.pre to gtk+ optimised table This script creates a table similar to the description that Matthias gives at comment 42.
Created attachment 92008 [details] Optimised file (generated with above script) The file consists of two tables; a guint16 table with optimisations to reduce size and a guint32 table with the remaining sequences. What is remaining is the glue code in GTK+ to use these tables. Also, the optimisation that Tor describes at comment 43 is not reflected in this table.
Created attachment 92279 [details] [review] Rough implementation of table-less handling of dead accents Here's a first version of a patch that removes the straightforward dead diacritic key sequences from the table, and instead handles them using code. Presumably most of the table entries Simos wants to add can be handled by code like this, without a need for explicit table entries? (The patch still contains debugging printfs.) Comments, please... BTW, g_utf8_normalize(..., G_NORMALIZE_NFC) works a bit odd in my opinion. It normalizes the sequence 03B9 0308 0301 to a single 0390. But not the sequence 03B9 0301 0308 (just swapping the order of the two combining diacritics) even if that should be equivalent? (03B9 = GREEK SMALL LETTER IOTA, 0308 = COMBINING DIAERESIS (Dialytika), 0301 = COMBINING ACUTE ACCENT (Oxia, Tonos), and 0390 = GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS)
Thanks Tor. I did some more work on the patch. g_utf8_normalize() appears to work ok; the function rearranges the sequence as long as the diacritic marks belong to different "canonical combining classes". In the case of GREEK SMALL LETTER IOTA with COMBINING DIAERESIS and COMBINING ACUTE ACCENT, both diacritic marks belong to the same canonical combining class (which has the value 230). In this case we need to try all combinations (n factorial) of diacritic marks in case we find a match. Greek Polytonic as a keyboard layout in Xorg reuses dead_ogonek, etc which are meant for other languages and refer to a different diacritic mark. We just got dead_psili and dead_daseia added to Xorg which is good. This means we should also get dead_perispomeni added instead of dead_tilde we currently use. Taking both these issues in account account, I managed to get my system write Greek Polytonic with GTK+ IM. I'll make a usable patch shortly.
Pretty cool. Do you know how much this will reduce the size of the updated tables ?
*** Bug 504383 has been marked as a duplicate of this bug. ***
(In reply to comment #48) > Pretty cool. > > Do you know how much this will reduce the size of the updated tables ? > The algorithmic function will reduce the size of the updated tables by almost 30% (~21KB) compared to the "unoptimised" solution. Thus, the addition of Tor's algorithmic function and leaving the rest unoptimised would create a table of about 46KB (compared to the totally flat table of about 68KB). If we also add the optimised solution you mentioned above for the sparse table, we shave off a further 17KB. I do not take into account for now compose sequences with guint32 elements; they add to complexity and there are no layouts to used them yet. I prefer to have them at a latter date.
Created attachment 102700 [details] [review] Patch on top of Tor's patch to handle compose sequences algorithmically It has been modified to work well for Greek Polytonic; other scripts may vary. Xorg has just added dead_dasia and dead_psili so the Polytonic compose sequences do not need to re-use dead_ogonek (Polish) and dead_horn; something that causes conflict in the algorithmic function between the scripts. A patch has been submitted to include dead_perispomeni (we abuse dead_tilde in the patch) in Xorg which would let the algorithmic function cover all Greek Polytonic compose sequences, https://bugs.freedesktop.org/show_bug.cgi?id=14013 One can use this code to test other keyboard layouts, as long as they do not require dead_ogonek, dead_horn and dead_tilde.
Created attachment 102702 [details] Script to parse the Xorg compose file, calculate memory savings, verify algorithmic function, etc. This is a Python rewrite of previous Perl script in this bug report. Input: a) Compose file en_US.UTF-8 from Xorg b) keysym to Unicode mapping, now using Marcus Khun's list instead of gdkkeysyms.h Output: a) For each compose sequence in the Xorg Compose file, apply the algorithmic function (create Unicode sequence, normalize, check if it creates precomposed character) b) For the remaining compose sequences, put in a list, sort according to the order described at #42 and calculate roughly the savings.
Created attachment 104041 [details] Updated Python script that parses the Xorg compose file, provides stats, verifies algo-function, etc $ ./compose-parse.py compose-parse available parameters: -h, --help this craft -s, --statistics show overall statistics (both algorithmic, non-algorithmic) -a, --algorithmic show sequences saved with algorithmic optimisation -g, --gtk show entries that go to GTK+ -u, --unicodedatatxt show compose sequences derived from UnicodeData.txt (from unicode.org) -v, --verbose show verbose output -p, --plane1 show plane1 compose sequences -n, --numeric when used with --gtk, create file with numeric values only -e, --gtk-expanded when used with --gtk, create file that repeats first column; not usable in GTK+ Default is to show statistics. $ ./compose-parse.py Total number of compose sequences (from file) : 5020 of which can be expressed algorithmically : 1201 of which cannot be expressed algorithmically : 3819 of which have Multi_key : 3381 Algorithmic (stats for Xorg Compose file) Number of sequences off due to algo from file (len(array)) : 1201 Number of sequences off due to algo (uniq(sort(array))) : 805 of which are for Greek : 176 Unicode statistics from UnicodeData.txt Number of entries that can be algorithmically produced : 925 of which are for Greek : 239 Number of compose sequence combinations requiring : 1323 of which are for Greek : 521 Note: We do not include partial compositions, thus the slight discrepancy in the figures Non-algorithmic (stats from Xorg Compose file) Number of sequences left : 3819 Flat array looks like : 3819 rows of 6 integers (2 bytes per int, or 12 bytes per row) Flat array would have taken up (in bytes) : 45828 bytes from the GTK+ library Number of items (i.e. ints) in flat array : 22914 of which are zeroes : 9350 or 40% Number of different first items : 22 Number of max bytes (if using flat array) : 45828 Number of savings : 18480 Memory needs if both algorithmic+optimised table in latest Xorg compose file : 27348 Existing (old) implementation in GTK+ Number of sequences in old gtkimcontextsimple.c : 691 The existing (old) implementation in GTK+ takes up : 16584 bytes $ _ ----------------- This is the updated compose-parse.py file that automates some of the tasks of the processing of the compose file, a. provides statistics on the benefits of the algorithmic approach on the Xorg compose file, b. uses UnicodeData.txt (from unicode.org) to calculate the full benefit of the algorithmic approach c. outputs the optimized table that GTK+ needs for non-algorithmic sequences The executive summary is that with the suggested implementation (optimised table, as described by Matthias and algorithmic function, as described by Tor), the GTK+ compose sequence table increases in size by 11KB (from 16KB to 27KB), and still it covers the full Xorg compose file sequences.
Created attachment 104043 [details] [review] Patch for gtkimcontextsimple.c to enable optimized/algorithmic Patch applies to GTK+ HEAD. Contains a. patch to gtk/gtkimcontextsimple.c; b. new file gtk/gtkimcontextsimpleseqs.c; optimised table with compose sequences b. patch to gdk/gdkkeysyms.h (required for the version of this table) 1. Tested on Ubuntu Linux, äãâáạȧṗ, ⒼⓃⓄⓂⒺ 2. Requires testing on Win32 (algorithmic, does ´ + ¨ + ι == ΐ ;) 3. Greek Polytonic works apart from dead_psili, dead_dasia (keysyms will be available in new Xorg; did not add them anyway in the GTK+ compose table at this stage). Greek perispomeni works though with entries in GTK+ compose table (normally conflicts with dead_tilde). 4. GTK+ has support (function: gtk_im_context_simple_add_table()) to append to the compose table; I did not fix this functionality at this stage. Any comments would be greatly appreciated.
*** Bug 333710 has been marked as a duplicate of this bug. ***
*** Bug 162845 has been marked as a duplicate of this bug. ***
Simos, this looks very impressive indeed. Here is my take on what is needed to get this over the finish line: 1) reinstate the old check_table function, and use it on tables added by gtk_im_context_simple_add_table(), move the code that works on the compact tables to some new check_compact_table function and use that on gtk_compose_seqs. 2) remove all the debug printfs 3) coding style fixes: no // comments
Created attachment 106472 [details] [review] Patch to gtk+ (HEAD) to update compose table Affects four files, 1) Updated gtk+/gdk/gdkkeysyms.h (using gdkkeysyms-update.pl found in same dir) 2) Updated gtk+/gtk/gtkimcontextsimple.c 3) New file gtk+/gtk/gtkimcontextsimpleseqs.h (sequences now go here) 4) New file gtk+/gtk/compose-parse.py (script that auto-updates sequences). Tested with latin extended (¨~´^`˚¯˝ˇ˘, 12 dead keys), greek polytonic (10 dead keys). I attended all three comments above (did not test though the functionality of adding custom compose tables).
+ """ Grabs and opens the keysyms.txt file that Markus Khun maintains """ His name is Markus Kuhn, I believe My build runs into the following: gtkimcontextsimple.c:62: error: 'gtk_compose_seqs_compact' undeclared here (not in a function) it seems that should be gtk_compose_seqs_optimised The few simple tests that I did seemed to work. I assume you have given it some more extensive testing. Lets get this committed to trunk, and for more widespread testing on the mailing list. Does that sound like a good plan ?
Sounds great. I updated the surname of Markus and fixed the table name (using gtk_compose_seqs_compact[]). "gtk_compose_seqs_optimised" was the previous name of the table which I think was not a good choice (optimised vs optimized). I used the old file when producing the patch. I committed the patch with these changes. I'll mail the mailing list for more testing.
Created attachment 107326 [details] [review] Patch to gtk+ (HEAD) to update compose table (fixes one error, typos) Updated patch which corresponds to what was committed. There are four more occurrences of mispellings of the name of Markus (it's Markus Kuhn) which this patch fixes, but I will commit to SVN on the next opportunity. I requested for testing of this patch in the gtk-i18n-list, at http://blogs.gnome.org/simos/2008/03/05/testing-the-updated-im-support-in-gtk/ and http://simos.info/blog/archives/661
*** Bug 88639 has been marked as a duplicate of this bug. ***
*** Bug 324021 has been marked as a duplicate of this bug. ***
I am closing this report as the patch has been submitted. I suppose that is ok. To summarize, a call for testing has been sent to a. Use JhBuild to create a custom http://blogs.gnome.org/simos/2008/03/05/testing-the-updated-im-support-in-gtk/ b. Creating patched .deb packages for Ubuntu http://simos.info/blog/archives/661 c. Email at gtk-i18n-list.
Heya!. I have a little peeve for the update, I can no longer use <compose><-><n> to produce an ñ. Now I have to use altgr+] which produces an ~, this is in an UK configured keyboard. I guess it was removed from one of the sources used in the update script and hence the new version of the file does not have it. Can it be added? How? Should I bug someone else? It's a big regression for me since the ~ key is far away from the typying area while the - key is just next to it, I can't speak for others but there's a chance of other people using compose like <-><n> for ñ. Don't kill the cute ñ.
(In reply to comment #65) > Heya!. I have a little peeve for the update, I can no longer use > <compose><-><n> to produce an ñ. Now I have to use altgr+] which produces an > ~, this is in an UK configured keyboard. > I guess it was removed from one of the sources used in the update script and > hence the new version of the file does not have it. Can it be added? How? > Should I bug someone else? I couldn't find the the composition <compose><-><n> in Xorg. Since gtk keeps in sync with X, you probably need to track this bug both here and in freedesktop.
(In reply to comment #66) > (In reply to comment #65) > > Heya!. I have a little peeve for the update, I can no longer use > > <compose><-><n> to produce an ñ. Now I have to use altgr+] which produces an > > ~, this is in an UK configured keyboard. > > I guess it was removed from one of the sources used in the update script and > > hence the new version of the file does not have it. Can it be added? How? > > Should I bug someone else? > > I couldn't find the the composition <compose><-><n> in Xorg. Since gtk keeps > in sync with X, you probably need to track this bug both here and in > freedesktop. > That is correct. You would need to file a bug report at freedesktop.org (bugs.freedesktop.org), product xorg, component Lib/Xlib. You may also CC: me. My understanding is that "Compose + -" is more intuitive to be connected to sequences for the macron, as in āēūī which are already available. You can get ñ with Compose + ~ as well, which I use often for the "gb" basic layout.
There is an N with a macron below in Unicode: Ṉ (U+1E48). The guy will probably need to argue why composing to Ñ is better.