GNOME Bugzilla – Bug 121675
Crash on startup [accessibility]
Last modified: 2004-12-22 21:47:04 UTC
[Originally reported as http://bugs.debian.org/208991] From: Sjoerd Simons <sjoerd@luon.net> Subject: gnumeric: crash on startup Date: Sat, 06 Sep 2003 20:05:15 +0200 Package: gnumeric Version: 1.1.20-1 Severity: normal Hi, When /desktop/gnome/interface/accessibility is enabled in gconf, gnumeric crashed on startup. With the following errors: ----- Bonobo accessibility support initialized GTK Accessibility Module initialized (gnumeric:7645): GLib-GObject-WARNING **: gsignal.c:1082: unable to lookup signal "link-selected" for non instantiatable type `AtkHypertext' ** (gnumeric:7645): WARNING **: Invalid signal type link-selected (gnumeric:7645): GLib-GObject-WARNING **: gsignal.c:1082: unable to lookup signal "link_selected" for non instantiatable type `AtkHypertext' Atk Accessibilty bridge initialized ** ERROR **: error condition on server fd is 0 aborting... Bonobo accessibility support initialized GTK Accessibility Module initialized (gnome_segv:7646): GLib-GObject-WARNING **: gsignal.c:1082: unable to lookup signal "link-selected" for non instantiatable type `AtkHypertext' ** (gnome_segv:7646): WARNING **: Invalid signal type link-selected (gnome_segv:7646): GLib-GObject-WARNING **: gsignal.c:1082: unable to lookup signal "link_selected" for non instantiatable type `AtkHypertext' ----- Sjoerd
The problem is reproducible for me with CVS HEAD.
Considering that "link-selected" appears nowhere in gnumeric code this looks like a non-gnumeric bug. Would you be able to create a backtrace for: (gnumeric:7645): GLib-GObject-WARNING **: gsignal.c:1082: unable to lookup signal "link-selected" for non instantiatable type `AtkHypertext' Thanks (I can't reproduce since I am missing modules required to enable accessibility.)
Oh, also which version of the atk library are you using. (The link-selected signal was added in January 2003, so it seems.)
Okay, the link-selected signal wasn't part of atk until 1.3.x. Since debian has only been packaging 1.2.x in any ot it's releases, it is not available. THe real question of course is: who is using that signal? It is not gnumeric (at least not directly). So, backtrace should show us who is using that signal (and shouuldn't or at least should require atk 1.3.x).
This is the bugbuddy backtrace for 1.1.20-2: Backtrace was generated from '/usr/bin/gnumeric' Loading ~/.gdbinit (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)...[New Thread 16384 (LWP 23345)] (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)... 0x40cc23d1 in waitpid () from /lib/libpthread.so.0
+ Trace 40014
Thread 1 (Thread 16384 (LWP 23345))
This is with the following accessibility-relate packages and versions installed: libatk1.0-0 1.2.4-1 libgail-common 1.2.2-1 libgail-gnome-module 1.0.2-2 libgail17 1.2.2-1 at-spi 1.1.9-1 libatspi1.0-0 1.1.9-1
The only file I can find that refers to the "link-selected" signal is /usr/lib/gtk-2.0/modules/libatk-bridge.so in the at-spi package (version 1.1.9-1). I realise that's a fairly old version of at-spi; there's already a bug on record (http://bugs.debian.org/bug=206545) to get it updated.
If /usr/lib/gtk-2.0/modules/libatk-bridge.so uses the atk signal which wasn't implemented in atk until 1.3.0 we obviously have a problem. The initial warnings all occur inside gnome_program_init so out of the realm of gnumeric. Probably a bug should be filed against /usr/lib/gtk-2.0/modules/libatk-bridge.so if that file in fact uses the link-selected signal with respect to AtkHypertext.
version-skew (or whatever the right term would be): at-spi is in fact to new compared with atk. at-spi 1.1.19 (released mid February 2003) uses the "link-selected" signal that was introduced in atk cvs in late January. The first atk release including this seems to be 1.3.0 (1.2.4 was released prior to that change). So the bug is really in at-spi for using a signal from a non-released version of atk. To fix this in debian, at.spi 1.1.19 should require atk 1.3.0.
(changing title to show scope of crash)
I have confirmed that the crash does not occur with atk 1.3.0.
I've done some tests loading gnumeric against atk 1.3 and 1.4 (the gnumeric binary was built against atk 1.2.4) and I can confirm that there is no longer a crash on 'link-selected'. Unfortunately, it appears to have been exchanged for a heisenbug: loading against atk 1.3 or 1.4, gnumeric starts without errors in about 1 in 4 cases; usually it crashes: zensunni ray 15:07 ../tmp/hack/gnumeric-1.1.90 > env LD_LIBRARY_PATH=/tmp/atk-1.3.0/lib: build/src/gnumeric Bonobo accessibility support initialized GTK Accessibility Module initialized Atk Accessibilty bridge initialized ** ERROR **: error condition on server fd is 0 aborting... Bonobo accessibility support initialized GTK Accessibility Module initialized Atk Accessibilty bridge initialized application finalize called All of the crashes have the message "** ERROR **: error condition on server fd is 0" which is from liblinc.
Since we haven't really figured out this bug, I am reopening it.
*** Bug 122138 has been marked as a duplicate of this bug. ***
I missed this little note: I've done some tests loading gnumeric against atk 1.3 and 1.4 (the gnumeric binary was built against atk 1.2.4) I have run gnumeric (build against atk 1.3) with atk 1.3 and have none of these crashes. I would therefore suspect that there may me a small change in the atk api. (And considering the original problem in this bug report I would not be surprised.) So unless we can confirm these bugs in an gnumeric build against atk 1.3 with atk 1.3 setting, we should probably ignore it.
I am seeing this with: atk-1.4.0-0.ximian.6.1 at-spi-1.3.6-0.ximian.6.1 gail-1.4.0-0.ximian.6.1 gnumeric-1.1.90-0.ximian.6.1 This is the latest stuff from xd2-unstable. So I don't think it's fixed. (My bug report was the dup above, Bug 122138.)
Yep, it's definitely ATK -- I disabled ATK and logged back in, and it fixed the problem (Bug 122138) for me, whereas I had 100% reproducibility before.
*** Bug 122245 has been marked as a duplicate of this bug. ***
The above dup is a seemingly related bug with a different backtrace -- perhaps it will help solve this. (For now I think I'm just going to leave ATK disabled!)
It would be interesting to know against which version of atk gnumeric-1.1.90-0.ximian.6.1 was in fact compiled. I do not have nay problems with gnumeric compiled against and used with 1.3.0 of atk.
i | xd-unstable | atk-devel | 1.4.0-0.ximian.6.1
*** Bug 122545 has been marked as a duplicate of this bug. ***
*** Bug 123743 has been marked as a duplicate of this bug. ***
*** Bug 125225 has been marked as a duplicate of this bug. ***
This report is happening too frequently to ignore as bad packaging. Bill any insight here ? What version of atk should be used ? Why is gnumeric being hit with this ?
I can still reproduce this problem with Debian gnumeric 1.2.1-3 using at-spi 1.3.8-1, libatk1.0-0 1.4.1-1, libgail17 1.4.1-1. A backtrace, using debugging versions of relevant libraries:
+ Trace 41173
I am wondering: - has this bug been filed against atk? We are crashing inside atk. While it could be something gnumeric does or doesn't do that makes it crash, it is defintiely an atk bug to have this effect. Somebody familar with atk code might have a better chance to figure this out.
Andreas: haneman@sun.com mhas been CCed. He's Mr. ATK.
there is no haneman@sun.com but there's a bill.haneman@sun.com, that's me. You should cc gnome-access-bugs@basso.sfbay.sun.com Padraig will see this too, and he's in the ATK code more often than I am these days. However, it is not clear that this is an ATK bug or even an AT-SPI bug (much more likely at-spi than ATK, by the way). at-spi does indeed seem to trigger the bug, we aren't questioning that; however we can't reproduce it since we don't run Debian and it doesn't seem to hit other distros.
>however we can't reproduce it since we don't run Debian and it >doesn't seem to hit other distros. You may want to re-read this report and the associated duplicates then. The duplicates include the following environments: - systems using the Ximian desktop - Red Hat Linux release 9 (Shrike) - gargnome on Mandrake Linux release 9.2rc1 (ken) - Slackware Slackware 9.1.0
I do not think that the stack trace provided is trustworthy. Looking at ORBit2/linc2 I see no evidence that link_servers_move_io_T() calls link_protocol_find_num(). I am building gnumeric deom HEAD to see if I can reproduce the problem.
I have reprodeuced the problem with the stack trace below which looks more sensible. I will see what I can figure out. fe11e444 _lwp_kill (5, 6, a38b38, 0, 2, 0) + 8 fd5f4c98 g_logv (0, 4, fdc2b854, ffbfe820, 0, 0) + 610 fd5f4dd8 g_log (0, 4, fdc2b854, 0, 0, ffbfe860) + 60 fdc19420 link_server_handle_io (0, 0, 8f0910, 0, 0, 0) + 70 fdc1a4e8 link_source_dispatch (93ec98, fdc193b0, 8f0910, ffbfe970, 61696e00, ff3f86cc) + a0 fd5e2aa0 g_main_dispatch (8e8558, 0, 0, fffffff8, ffffffe0, a54b85) + 290 fd5e4a84 g_main_context_dispatch (8e8558, 0, a643a0, c, c, ffbfeaf4) + c4 fd5e5200 g_main_context_iterate (8e8558, 1, 1, 8cab60, 5, 0) + 6d0 fd5e621c g_main_loop_run (a54b70, a54b70, 2f, 0, 0, 0) + 5c4 fe6119cc gtk_main (8d8b38, 8d8b38, 0, 8a3880, 8a38ac, 8a38b8) + 1c4 002d7738 main (1, ffbfec8c, ffbfec94, 888000, 0, 0) + 508 00150f98 _start (0, 0, 0, 0, 0, 0) + 108
I think I have found the cause of this problem and it is in glib.
Created attachment 21032 [details] [review] Proposed patch
Can we have an explanation to go with the patch?
gnumeric calls gtk_events_pending while the program is starting. This function calls g_main_content_pending() which calls g_main_context_iterate() with dispatch set to FALSE. The function g_main_context_iterate() calls g_main_context_check(). If a "check" function called in this this fuction returns TRUE then n_ready is incremented and g_main_context_check() returns TRUE. The value returned by this function is not checked in g_main_context_iterate() so g_main_context_iterate() returns FALSE even though there are events pending. When the event is evtually processed the revents field in the GPollFd data structure has been reset to zero and this is the immediate cause of the crash in the linc2 code.
I've just tested the proposed patch. With an unpatched glib, the crash when a11y was enabled was 100% reproducible for me. With a patched glib installed, gnumeric starts up without any problems whatsoever. Given that the patch is very small and seems innocuous otherwise, it definitely has my vote to be incorporated ASAP. Thanks Padraig!
Created attachment 21113 [details] [review] Patch I'm applying to HEAD
I think your patch is essentially right (I've attached what I'm applying; the return value from check() includes any return value from prepare() so it isn't necessary to look at them both.) But: A) I don't think it's the correct fix for the crash - it's perfectly legitimate to call g_main_context_iterate() with dispatch set to FALSE at any time and ignore the return value. So, any change that simply changes the return value *cannot* be a correct fix for a crash. If something can result in the read condition being cleared between the check() and the subsequent dispatch() then linc must handle that. Once a source returns TRUE from it's check() method, it *will* be dispatched. B) I don't want apply this change to the stable branch, because while it is correct, it's quite believable to me that the change might break some poorly written app. (And it's also just barely possible that I actually had some reason that it should be the prepare() value in mind when I wrote the code. It's a non-trival change in any case.)
The patch applied to HEAD seems to also fix the problem. Anyone who wants to use gnumeric with accessibility enabled on GNOME 2.4 will have to take to risk of applying the patch to gtk-2-2 branch.
Padraig - *please* file a bug against linc and get the real problem fixed there. Imagine if gnumeric was simply ignoring the return value of g_main_context_iterate() - that woould still be 100% legitimate code. This change to GLib just covers up a serious bug in linc.
Owen, I have looked at this some more and I am not sure that there is a bug in linc. The function handle_paint_events() in gnumeric/src/main-application.c contains the code while (!gtk_events_pending()) gtk_main_iteration_do (FALSE); With the new code in gmain.c gtk_events_pending() returns TRUE when an event is pending so gtk_main_iteration_do() is called. With the old code in gmain.c gtk_events_pending() returns FALSE when an event is pending. The next time g_main_context_iterate is called for that context the revents field in the GPollFd data structure is set to 0 in g_main_context_check. It looks like we ought not to poll a file descriptor if its GSource is in the context's pending_dispatches.
I see your argument, but I don't agree with it A) GLib doesn't work that way and there would be a significant run-time performance penalty for implementing it. B) I don't see it as right. After all, if revents got cleared on a subsequent poll, that means that there is no data waiting any more. Why should linc be going ahead and doing something with the assumption that there is data waiting? linc must handle the case of revents being cleared between check() and dispatch().
Logged bug #126209 against linc.
Just to check - is there something we could do in linc-source.c to store the value of revents on the source itself between check and dispatch, such that other people clearing it later doesn't affect us ? or would that lead to other flakiness [ not had time to get my head back around that code ].
*** Bug 126681 has been marked as a duplicate of this bug. ***
You are certainly welcome to store that value. But just remember, if revents is cleared, that means that the data has already been read and there is nothing more to read.
*** Bug 127121 has been marked as a duplicate of this bug. ***
*** Bug 127215 has been marked as a duplicate of this bug. ***
*** Bug 113020 has been marked as a duplicate of this bug. ***
*** Bug 127368 has been marked as a duplicate of this bug. ***
*** Bug 128420 has been marked as a duplicate of this bug. ***
*** Bug 128642 has been marked as a duplicate of this bug. ***
*** Bug 140726 has been marked as a duplicate of this bug. ***
*** Bug 146721 has been marked as a duplicate of this bug. ***
*** Bug 141955 has been marked as a duplicate of this bug. ***
*** Bug 137381 has been marked as a duplicate of this bug. ***
*** Bug 152983 has been marked as a duplicate of this bug. ***