GNOME Bugzilla – Bug 104976
pango crashes with gtk IM
Last modified: 2004-12-22 21:47:04 UTC
I have written a Bengali gtk IM (based on imcyrillic-translit.c), but it is crashing in pango when using the latest cvs (glib, gtk+, pango, everything from cvs). However, it works fine when compiled on a stock RH8.0 system. The problem seems to be with Bengali vowel signs only, as everything else (even vowel letters) works fine. FYI, everything is fine when the same vowel signs are typed directly through xkb. Steps to reproduce the problem: 1. Open gedit (or anyother gtk+ based app), and choose Bengali input method 2. Type 'k' and 'a' 3. app seg faults Actual Results: seg fault Expected Results: show unicode character 0x9BE How often does this happen? everytime Additional Information: Here is the code for the Bengali IM (this is a very simplified version, but even this crashes) #include <gdk/gdkkeysyms.h> #include <gtk/gtkimcontextsimple.h> #include <gtk/gtkimmodule.h> GType type_bengali_translit = 0; static void bengali_translit_class_init (GtkIMContextSimpleClass *class); static void bengali_translit_init (GtkIMContextSimple *im_context); static void bengali_translit_register_type (GTypeModule *module) { static const GTypeInfo object_info = { sizeof (GtkIMContextSimpleClass), (GBaseInitFunc) NULL, (GBaseFinalizeFunc) NULL, (GClassInitFunc) bengali_translit_class_init, NULL, /* class_finalize */ NULL, /* class_data */ sizeof (GtkIMContextSimple), 0, (GInstanceInitFunc) bengali_translit_init, }; type_bengali_translit = g_type_module_register_type (module, GTK_TYPE_IM_CONTEXT_SIMPLE, "GtkIMContextBengaliTranslit", &object_info, 0); } /* The sequences here match the sequences used in the emacs quail * mode cryllic-translit; they allow entering all characters * in iso-8859-5 */ static guint16 bengali_compose_seqs[] = { GDK_a, 0, 0, 0, 0, 0x09be, GDK_a, GDK_grave, 0, 0, 0, 0x0986, GDK_k, 0, 0, 0, 0, 0x0995 }; static void bengali_translit_class_init (GtkIMContextSimpleClass *class) { } static void bengali_translit_init (GtkIMContextSimple *im_context) { gtk_im_context_simple_add_table (im_context, bengali_compose_seqs, 4, G_N_ELEMENTS (bengali_compose_seqs) / (4 + 2)); } static const GtkIMContextInfo bengali_translit_info = { "beng", /* ID */ "Bengali (Transliterated)", /* Human readable name */ "gtk+", /* Translation domain */ "/usr/share/locale", /* Dir for bindtextdomain (not strictly needed for "gtk+") */ "bn" /* Languages for which this module is the default */ }; static const GtkIMContextInfo *info_list[] = { &bengali_translit_info }; void im_module_init (GTypeModule *module) { bengali_translit_register_type (module); } void im_module_exit (void) { } void im_module_list (const GtkIMContextInfo ***contexts, int *n_contexts) { *contexts = info_list; *n_contexts = G_N_ELEMENTS (info_list); } GtkIMContext * im_module_create (const gchar *context_id) { if (strcmp (context_id, "beng") == 0) return GTK_IM_CONTEXT (g_object_new (type_bengali_translit, NULL)); else return NULL; } Debugging Information: Backtrace was generated from '/usr/local/gnome/head/INSTALL/bin/gedit' et_cursor_pos (layout=0x82504c0, index=6, strong_pos=0xbffff4bc, weak_pos=0xbffff4ac) at pango-layout.c:1711
+ Trace 33379
Thread 1 (Thread 1024 (LWP 16844))
Can you figure out the exact set of text that is crashing Pango? Pango has no idea about the input method, so it must depend on the text that the input method is inserting.
First of all, after hacking more on this issue I found out that if I don't use the preedit function call of simple im context in my hacked version, then everything is okay, except of course I don't see any preedit string. The crash happens for all the matra's of Bengali vowel letters (they are in the range of U09BE - U09CC) inside the preedit function. I can't pin point the exact reason, but seems like the preedit function maybe getting confused about the cursor location when a matra is added. As you [Owen] are also the author of the simple im context, I think you can understand the problem even better.
Created attachment 15563 [details] [review] fix for the crash
The above patch should fix the problem. I was writing a telugu input module and faced the exact problem. Since for the vowel produced by key 'a' produces a tentative string (due to presence of "a~" as another possibilty) it will be underlined by gtkimcontext. This is a vowel that should exist along with a consonant, the glyph for 'a' will have log_cluster same as the log_cluster for glyph of 'k'. but the attributes will be have the iterations as 0-3 and 3-6 (assuming the string is produced by these two key combinations only). pango_glyph_item_apply_attrs will call for spitting with glyphs as 'ka' but the splitting to be on 0-3. Since for 'a' we have the log_cluster = 0; pango_glyph_item_split will return NULL thinking there is no splitting to do. this NULL is prepended and then a lot of bad things happen. Hope i was clear enough. But there is a another problem. when i type 'ka', the glyphs correspoing to 'a' are underlined and 'a' is then being seperated from 'k' till the underline is somehow gone. this should not happen, 'ka' should be underlined together so that there is no seperation of 'a'. I will report this problem as a seperate bug. Ahmed: may i know what kind of keymap are you preparing? itrans style or the phonetic style?
I'm not sure that the input being fed to Pango here is really sensible. Pango can't underline just half of the 'ka' glyph. But certainly Pango should be robust against such input. The question, is what is the right visual output? Imagine two attributes: ABCDkaHIJK red fg ----- ----- underline Is the 'ka' glyph red? underlined? both? The 'both' answer is what you want in this case, but I don't think it's the logical answer. More logical, is, I think: "The attributes of a cluster are the attributes of the first character in the cluster" If we go that way, then to get an underline in this case, you'll need a more sophisticated input method that understands the clusters in the text and doesn't commit the text until it has an entire cluster. I don't understand your second comment: But there is a another problem. when i type 'ka', the glyphs correspoing to 'a' are underlined and 'a' is then being seperated from 'k' till the underline is somehow gone. this should not happen, 'ka' should be underlined together so that there is no seperation of 'a'. I will report this problem as a seperate bug. The very fact that this crash is occurring indicates to me that Pango is trying to underline the half of a comined glyph.
First of all, "ka" has two chars -> k + a. In my case, what happens is that I commit "k", and set "a" as the pre-edit string. As Sunil mentioned, in real life "a" is always after a consonant, but for a input method module that is not true. For example, if user types "kai" then "ai" will commit a different char, but if user types "kal" then I will commit "a" as "a". I thought the idea of pre-edit string was to show such intermediate states. BTW, RH9 crashes just like Debian Woody. And Sunil, my first target was a phonetic module, but based on the feedback I got from beta testers, they want a configureable module..
Actually, I'm having a lot of trouble understanding how this actually maps onto Bengali, since as I understand it: k - BENGALI_LETTER_KA + BENGALI_SIGN_VIRAMA ka - BENGALI_LETTER_KA So, I don't see how you commit 'k' and not 'a'. I've been trying to discuss this in general terms, but maybe it would be easier to actually be specific about the Unicode characters involved.
I am sorry, I was talking about 'k' and 'a' in terms of the compose sequence I posted originally: GDK_a, 0, 0, 0, 0, 0x09be, GDK_a, GDK_grave, 0, 0, 0, 0x0986, GDK_k, 0, 0, 0, 0, 0x0995 Based on this list if I type k+a then 0x0995 will be committed for 'k', and 0x9be will be set as pre-edit text for 'a'. If user types '`' after 'a' then 0x0986 will be commited, otherwise 0x09be. From my experience I know pango can show the glyph for 0x9be by itself (even though as I mentioned this does not happen in real life as 0x9be is a dependent vowel), so I don't understand why it is crashing here. Am I making any sense to you? :) btw, please ignore my comment about RH9, I am using pango/gtk+ from the cvs so I don't know if stock RH9 has this problem or not.
> From my > experience I know pango can show the glyph for 0x9be by itself (even > though as I mentioned this does not happen in real life as 0x9be is a > dependent vowel), so I don't understand why it is crashing here. The reason why it is crashing is that while you may see the U+0995 0+09BE (BENGALI LETTER KA + BENGALI VOWEL SIGN AA) combination as simply two characters next to each other, Pango considers it a "logical cluster"; the reason for that is probably more apparent if you consider the case of U+09BF (BENGALI VOWEL SIGN I) which goes on the left of the base character, or U+09C1 (BENGALI VOWEL SIGN U) which goes under the character. And Pango currently has a bug where if you underline only part of a logical cluster, it crashes. Which is definitely a bug. But as I was saying above, I think the right behavior is to use the underline status from the first character in the cluster. Which wouldn't work properly for your input method. My only suggestion for the input method is that you don't base it on GtkIMContextSimple, and hold off on committing the cluster until it is complete. Then, you can underline the cluster or not as you desire.
> "The attributes of a cluster are the attributes of the > first character in the cluster" Perfect. This what one would expect pango to do. This was my concern in second paragraph of my earlier post. When the layout underlines the preedit text, if it considers the logical clusters and underline the entire the cluster instead of the just the preedit string, it will also solve the problem and make writing the imcontexts much more simple. I am already working on an im which would not base itself on "GtkIMContextSimple" (due to other limitations). I shall try to hold of the entire cluster and not commit it, incase you feel it is not correct for the layout to underline the entire cluster. Coming to the actual problem, once I have applied the above patch to fix the crash, I have observed that pango_glyph_item_apply_attrs is (correctly) not splitting cluster even if only part of it has underline. But somewhere else, again the split is being performed and the underlining is still happening (having split the cluster, which is not right). where can that be?
Created attachment 16922 [details] [review] Tiny test case (needs Devanagari fonts)
I don't you'd actually get the entire cluster underlined ... becuase the first character in the cluster has already been comitted, so *nothing* will be underlined. I'm still not completely sure what the right fix here is, though the "properties of the first character are properties of the cluster" approach is probably the one I'll take for Pango-1.2.x. An example of where it falls over is if you wanted to display the label with "mnemonic" f_inish So, "i" is underlined. However, in fine roman typography, there is a ligature between f and i, so fi will be one logical cluster. Fixing that would almost certainly require some sort of Pango API addition. (Possibly at the same time, one would want to try and handle partial character selections better than we do currently.)
See bug 113931 for some more thoughts.
Created attachment 16968 [details] [review] Apply all attrs that touch a cluster
With bug 113931 in mind, I decided to go with the rule: "The attributes of a cluster are the the union of all attributes that apply to any character in the cluster" This will allow us, in the future, to look at ranges in attributes and, for instance, underline only part of a cluster. As a side effect, it will do what you want for your input method.
Committed to CVS, testing appreciated. Thu May 29 18:37:58 2003 Owen Taylor <otaylor@redhat.com> * pango/pango-glyph-item.c (pango_glyph_item_apply_attrs): When applying attribute to a glyph item, handle attributes that split clusters by giving the cluster all the attributes that apply to it. (Previously caused a crash, #104976 Taneem Ahmed, Sunil Mohan Adapa)
Created attachment 16971 [details] [review] Small fix needed on top of last patch
Yup, it is working fine. Thanks for the fix, this was giving us problem wtih Bengali GNOME translation too. Translators were adding "_" with out much consideration, and ended up adding them infront of vowel signs which has both pre/post matra form. Thanks! ps. should I close this bug, or someone else needs to do that?
I already closed the bug earlier.