Bug 104976 – pango crashes with gtk IM

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 104976 - pango crashes with gtk IM


Summary:	pango crashes with gtk IM


Status:	RESOLVED FIXED

Product:	pango
Classification:	Platform
Component:	general
Version:	1.2.x
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	Owen Taylor
QA Contact:	Owen Taylor

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-02-01 11:20 UTC by Taneem Ahmed
Modified:	2004-12-22 21:47 UTC

See Also:
GNOME target:	---
GNOME version:	2.1/2.2

Attachments
fix for the crash (1.06 KB, patch) 2003-04-08 16:56 UTC, Sunil Mohan Adapa	none	Details \| Review
Tiny test case (needs Devanagari fonts) (406 bytes, patch) 2003-05-28 21:59 UTC, Owen Taylor	none	Details \| Review
Apply all attrs that touch a cluster (9.96 KB, patch) 2003-05-29 22:20 UTC, Owen Taylor	none	Details \| Review
Small fix needed on top of last patch (840 bytes, patch) 2003-05-30 00:09 UTC, Owen Taylor	none	Details \| Review

Description Taneem Ahmed 2003-02-01 11:20:02 UTC

I have written a Bengali gtk IM (based on imcyrillic-translit.c), but it is
crashing in pango when using the latest cvs (glib, gtk+, pango, everything
from cvs). However, it works fine when compiled on a stock RH8.0 system.

The problem seems to be with Bengali vowel signs only, as everything else
(even vowel letters) works fine.

FYI, everything is fine when the same vowel signs are typed directly
through xkb.

Steps to reproduce the problem:
1. Open gedit (or anyother gtk+ based app), and choose Bengali input method
2. Type 'k' and 'a'
3. app seg faults

Actual Results:
seg fault

Expected Results:
show unicode character 0x9BE

How often does this happen?
everytime

Additional Information:
Here is the code for the Bengali IM (this is a very simplified version, but
even this crashes)

#include <gdk/gdkkeysyms.h>
#include <gtk/gtkimcontextsimple.h>
#include <gtk/gtkimmodule.h>

GType type_bengali_translit = 0;

static void bengali_translit_class_init (GtkIMContextSimpleClass
*class);
static void bengali_translit_init (GtkIMContextSimple *im_context);

static void
bengali_translit_register_type (GTypeModule *module)
{
  static const GTypeInfo object_info =
  {
    sizeof (GtkIMContextSimpleClass),
    (GBaseInitFunc) NULL,
    (GBaseFinalizeFunc) NULL,
    (GClassInitFunc) bengali_translit_class_init,
    NULL,           /* class_finalize */
    NULL,           /* class_data */
    sizeof (GtkIMContextSimple),
    0,
    (GInstanceInitFunc) bengali_translit_init,
  };

  type_bengali_translit =
    g_type_module_register_type (module,
                                 GTK_TYPE_IM_CONTEXT_SIMPLE,
                                 "GtkIMContextBengaliTranslit",
                                 &object_info, 0);
}

/* The sequences here match the sequences used in the emacs quail
 * mode cryllic-translit; they allow entering all characters
 * in iso-8859-5
 */
static guint16 bengali_compose_seqs[] = {
  GDK_a, 0, 0, 0, 0, 0x09be,
  GDK_a, GDK_grave, 0, 0, 0, 0x0986,
  GDK_k, 0, 0, 0, 0, 0x0995
};

static void
bengali_translit_class_init (GtkIMContextSimpleClass *class)
{
}

static void
bengali_translit_init (GtkIMContextSimple *im_context)
{
  gtk_im_context_simple_add_table (im_context,
                                   bengali_compose_seqs,
                                   4,
                                   G_N_ELEMENTS (bengali_compose_seqs) /
(4 + 2));
}

static const GtkIMContextInfo bengali_translit_info = {
  "beng",                  /* ID */
  "Bengali (Transliterated)", /* Human readable name */
  "gtk+",                          /* Translation domain */
   "/usr/share/locale",            /* Dir for bindtextdomain (not
strictly needed for "gtk+") */
  "bn"                             /* Languages for which this module is
the default */
};

static const GtkIMContextInfo *info_list[] = {
  &bengali_translit_info
};

void
im_module_init (GTypeModule *module)
{
  bengali_translit_register_type (module);
}

void
im_module_exit (void)
{
}

void
im_module_list (const GtkIMContextInfo ***contexts,
                int                      *n_contexts)
{
  *contexts = info_list;
  *n_contexts = G_N_ELEMENTS (info_list);
}

GtkIMContext *
im_module_create (const gchar *context_id)
{
  if (strcmp (context_id, "beng") == 0)
    return GTK_IM_CONTEXT (g_object_new (type_bengali_translit, NULL));
  else
    return NULL;
}

Debugging Information:

Backtrace was generated from '/usr/local/gnome/head/INSTALL/bin/gedit'

et_cursor_pos (layout=0x82504c0, index=6, strong_pos=0xbffff4bc,
    weak_pos=0xbffff4ac) at pango-layout.c:1711

+ Trace 33379

Thread 1 (Thread 1024 (LWP 16844))

#0 wait4
from /lib/libc.so.6
#1 __check_rhosts_file
from /lib/libc.so.6
#2 waitpid
from /lib/libpthread.so.0
#3 libgnomeuiPangoAttrIterator *) 0x825bcc0
#9 pango_layout_get_iter
at pango-layout.c line 4069
#10 pango_layout_index_to_line_and_extents
at pango-layout.c line 1094
#11 pango_layout_get_cursor_pos
at pango-layout.c line 1711
#12 add_cursor
at gtktextlayout.c line 1436
#14 gtk_text_layout_real_wrap
at gtktextlayout.c line 937
#15 gtk_text_layout_wrap
at gtktextlayout.c line 554
#16 _gtk_text_btree_validate_line
at gtktextbtree.c line 5199
#17 gtk_text_layout_validate_yrange
at gtktextlayout.c line 853
#18 gtk_text_view_flush_scroll
at gtktextview.c line 1609
#19 gtk_text_view_flush_first_validate
at gtktextview.c line 3079
#20 first_validate_callback
at gtktextview.c line 3102
#21 g_idle_dispatch
at gmain.c line 3164
#22 g_main_dispatch
at gmain.c line 1653
#23 g_main_context_dispatch
at gmain.c line 2197
#24 g_main_context_iterate
at gmain.c line 2278
#25 g_main_loop_run
at gmain.c line 2498
#26 gtk_main
at gtkmain.c line 1092
#27 main
at gedit2.c line 394
#0 wait4
from /lib/libc.so.6

Comment 1 Owen Taylor 2003-02-12 22:25:33 UTC

Can you figure out the exact set of text that is crashing
Pango? Pango has no idea about the input method, so it 
must depend on the text that the input method is inserting.

Comment 2 Taneem Ahmed 2003-02-12 23:17:28 UTC

First of all, after hacking more on this issue I found out that if I 
don't use the preedit function call of simple im context in my hacked 
version, then everything is okay, except of course I don't see any 
preedit string.

The crash happens for all the matra's of Bengali vowel letters (they 
are in the range of U09BE - U09CC) inside the preedit function. I 
can't pin point the exact reason, but seems like the preedit function 
maybe getting confused about the cursor location when a matra is 
added. As you [Owen] are also the author of the simple im context, I 
think you can understand the problem even better.

Comment 3 Sunil Mohan Adapa 2003-04-08 16:56:01 UTC

Created attachment 15563 [details] [review]
fix for the crash

Comment 4 Sunil Mohan Adapa 2003-04-08 17:16:54 UTC

The above patch should fix the problem.
I was writing a telugu input module and faced the exact problem.
Since for the vowel produced by key 'a' produces a tentative string
(due to presence of "a~" as another possibilty) it will be underlined
by gtkimcontext. This is a vowel that should exist along with a
consonant, the glyph for 'a' will have log_cluster same as the
log_cluster for glyph of 'k'. but the attributes will be have the
iterations as 0-3 and 3-6 (assuming the string is produced by these
two key combinations only). 
pango_glyph_item_apply_attrs will call for spitting with glyphs as
'ka' but the splitting to be on 0-3. Since for 'a' we have the
log_cluster = 0; pango_glyph_item_split will return NULL thinking
there is no splitting to do. this NULL is prepended and then a lot of
bad things happen. 
Hope i was clear enough.

But there is a another problem. when i type 'ka', the glyphs
correspoing to 'a' are underlined and 'a' is then being seperated from
'k' till the underline is somehow gone. this should not happen, 'ka'
should be underlined together so that there is no seperation of 'a'. I
will report this problem as a seperate bug.

Ahmed: may i know what kind of keymap are you preparing? itrans style
or the phonetic style?

Comment 5 Owen Taylor 2003-04-15 16:42:07 UTC

I'm not sure that the input being fed to Pango here is
really sensible. Pango can't underline just half of the
'ka' glyph.

But certainly Pango should be robust against such input.
The question, is what is the right visual output? Imagine
two attributes:

       ABCDkaHIJK
red fg -----
            ----- underline

Is the 'ka' glyph red? underlined? both? The 'both' answer
is what you want in this case, but I don't think it's the
logical answer. More logical, is, I think:

 "The attributes of a cluster are the attributes of the
  first character in the cluster"

If we go that way, then to get an underline in this case,
you'll need a more sophisticated input method that understands
the clusters in the text and doesn't commit the text
until it has an entire cluster.

I don't understand your second comment:

 But there is a another problem. when i type 'ka', the glyphs
 correspoing to 'a' are underlined and 'a' is then being seperated from
 'k' till the underline is somehow gone. this should not happen, 'ka'
 should be underlined together so that there is no seperation of 'a'.  I
 will report this problem as a seperate bug.

The very fact that this crash is occurring indicates to me
that Pango is trying to underline the half of a comined
glyph.

Comment 6 Taneem Ahmed 2003-04-16 07:27:11 UTC

First of all, "ka" has two chars -> k + a. In my case, what happens is that I 
commit "k", and set "a" as the pre-edit string. As Sunil mentioned, in real life "a" is 
always after a consonant, but for a input method module that is not true. For 
example, if user types "kai" then "ai" will commit a different char, but if user types 
"kal" then I will commit "a" as "a". I thought the idea of pre-edit string was to show 
such intermediate states.  
 
BTW, RH9 crashes just like Debian Woody. 
 
And Sunil, my first target was a phonetic module, but based on the feedback I got 
from beta testers, they want a configureable module..

Comment 7 Owen Taylor 2003-04-16 15:07:00 UTC

Actually, I'm having a lot of trouble understanding
how this actually maps onto Bengali, since as I
understand it:

  k -  BENGALI_LETTER_KA + BENGALI_SIGN_VIRAMA
  ka - BENGALI_LETTER_KA

So, I don't see how you commit 'k' and not 'a'. I've
been trying to discuss this in general terms, but maybe
it would be easier to actually be specific about
the Unicode characters involved.

Comment 8 Taneem Ahmed 2003-04-16 20:03:32 UTC

I am sorry, I was talking about 'k' and 'a' in terms of the compose 
sequence I posted originally:

  GDK_a, 0, 0, 0, 0, 0x09be,
  GDK_a, GDK_grave, 0, 0, 0, 0x0986,
  GDK_k, 0, 0, 0, 0, 0x0995

Based on this list if I type k+a then 0x0995 will be committed for 
'k', and 0x9be will be set as pre-edit text for 'a'. If user types '`' 
after 'a' then 0x0986 will be commited, otherwise 0x09be. From my 
experience I know pango can show the glyph for 0x9be by itself (even 
though as I mentioned this does not happen in real life as 0x9be is a 
dependent vowel), so I don't understand why it is crashing here. Am I 
making any sense to you? :)

btw, please ignore my comment about RH9, I am using pango/gtk+ from 
the cvs so I don't know if stock RH9 has this problem or not.

Comment 9 Owen Taylor 2003-04-17 14:18:02 UTC

> From my 
> experience I know pango can show the glyph for 0x9be by itself (even 
> though as I mentioned this does not happen in real life as 0x9be is a 
> dependent vowel), so I don't understand why it is crashing here.

The reason why it is crashing is that while you may see the

 U+0995 0+09BE (BENGALI LETTER KA + BENGALI VOWEL SIGN AA)

combination as simply two characters next to each other, 
Pango considers it a "logical cluster"; the reason for that
is probably more apparent if you consider the case of
U+09BF (BENGALI VOWEL SIGN I) which goes on the left of
the base character, or U+09C1 (BENGALI VOWEL SIGN U) which goes
under the character.

And Pango currently has a bug where if you underline only
part of a logical cluster, it crashes. Which is definitely
a bug. But as I was saying above, I think the right behavior
is to use the underline status from the first character
in the cluster. Which wouldn't work properly for your input
method. 

My only suggestion for the input method is that you don't
base it on GtkIMContextSimple, and hold off on committing
the cluster until it is complete. Then, you can underline
the cluster or not as you desire.

Comment 10 Sunil Mohan Adapa 2003-04-17 14:59:41 UTC

 > "The attributes of a cluster are the attributes of the
 >  first character in the cluster"

Perfect. This what one would expect pango to do. This was my concern
in second paragraph of my earlier post. When the layout underlines the
preedit text, if it considers the logical clusters and underline the
entire the cluster instead of the just the preedit string, it will
also solve the problem and make writing the imcontexts much more
simple. I am already working on an im which would not base itself on
"GtkIMContextSimple" (due to other limitations). I shall try to hold
of the entire cluster and not commit it, incase you feel it is not
correct for the layout to underline the entire cluster.

Coming to the actual problem, once I have applied the above patch to
fix the crash, I have observed that pango_glyph_item_apply_attrs is
(correctly) not splitting cluster even if only part of it has
underline. But somewhere  else, again the split is being performed and
the underlining is still happening (having split the cluster, which is
not right). where can that be?

Comment 11 Owen Taylor 2003-05-28 21:59:58 UTC

Created attachment 16922 [details] [review]
Tiny test case (needs Devanagari fonts)

Comment 12 Owen Taylor 2003-05-28 22:14:31 UTC

I don't you'd actually get the entire cluster underlined ...
becuase the first character in the cluster has already
been comitted, so *nothing* will be underlined.

I'm still not completely sure what the right fix here is,
though the "properties of the first character are properties
of the cluster" approach is probably the one I'll take
for Pango-1.2.x.

An example of where it falls over is if you wanted to 
display the label with "mnemonic"

 f_inish

So, "i" is underlined. However, in fine roman typography,
there is a ligature between f and i, so fi will be one
logical cluster.

Fixing that would almost certainly require some sort of
Pango API addition. (Possibly at the same time, one
would want to try and handle partial character selections
better than we do currently.)

Comment 13 Owen Taylor 2003-05-28 22:45:41 UTC

See bug 113931 for some more thoughts.

Comment 14 Owen Taylor 2003-05-29 22:20:20 UTC

Created attachment 16968 [details] [review]
Apply all attrs that touch a cluster

Comment 15 Owen Taylor 2003-05-29 22:22:43 UTC

With bug 113931 in mind, I decided to go with the rule:

 "The attributes of a cluster are the the union of
  all attributes that apply to any character in the
  cluster"

This will allow us, in the future, to look at ranges
in attributes and, for instance, underline only part
of a cluster.

As a side effect, it will do what you want for your input
method.

Comment 16 Owen Taylor 2003-05-29 22:44:25 UTC

Committed to CVS, testing appreciated.

Thu May 29 18:37:58 2003  Owen Taylor  <otaylor@redhat.com>
 
        * pango/pango-glyph-item.c (pango_glyph_item_apply_attrs):
        When applying attribute to a glyph item, handle attributes
        that split clusters by giving the cluster all the attributes
        that apply to it. (Previously caused a crash, #104976
        Taneem Ahmed, Sunil Mohan Adapa)

Comment 17 Owen Taylor 2003-05-30 00:09:01 UTC

Created attachment 16971 [details] [review]
Small fix needed on top of last patch

Comment 18 Taneem Ahmed 2003-06-01 09:25:37 UTC

Yup, it is working fine. Thanks for the fix, this was giving us problem wtih 
Bengali GNOME translation too. Translators were adding "_" with out 
much consideration, and ended up adding them infront of vowel signs 
which has both pre/post matra form. 
Thanks! 
 
ps. should I close this bug, or someone else needs to do that?

Comment 19 Owen Taylor 2003-06-01 14:06:35 UTC

I already closed the bug earlier.