GNOME Bugzilla – Bug 313703
Evolution crash - When running LDTP automation script continously
Last modified: 2006-09-15 14:23:58 UTC
Distribution/Version: SuSE 9.3 1. When running evolution automation script using LDTP, evolution crash. Additional info: When checking the stack trace it happens because of gail library. I checked with evolution accessibility developers, and they have also confirmed the same. Crash 1: Backtrace was generated from '/opt/gnome/bin/evolution-2.4' Using host libthread_db library "/lib/tls/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 1098144576 (LWP 12294)] [New Thread 1123031984 (LWP 12343)] [Thread debugging using libthread_db enabled] [New Thread 1098144576 (LWP 12294)] [New Thread 1123031984 (LWP 12343)] [New Thread 1120574384 (LWP 12318)] [New Thread 1118473136 (LWP 12316)] [New Thread 1115777968 (LWP 12314)] [New Thread 1113676720 (LWP 12313)] [New Thread 1110825904 (LWP 12305)] [New Thread 1108638640 (LWP 12304)] [Thread debugging using libthread_db enabled] [New Thread 1098144576 (LWP 12294)] [New Thread 1123031984 (LWP 12343)] 0xffffe410 in ?? ()
+ Trace 62487
Thread 1 (Thread 1098144576 (LWP 12294))
Crash 2: Backtrace was generated from '/opt/gnome/bin/evolution-2.4' Using host libthread_db library "/lib/tls/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 1098144576 (LWP 8371)] [New Thread 1127631792 (LWP 8422)] [Thread debugging using libthread_db enabled] [New Thread 1098144576 (LWP 8371)] [New Thread 1127631792 (LWP 8422)] [New Thread 1125530544 (LWP 8420)] [New Thread 1123031984 (LWP 8401)] [New Thread 1120930736 (LWP 8400)] [New Thread 1118829488 (LWP 8386)] [New Thread 1115421616 (LWP 8385)] [New Thread 1113320368 (LWP 8383)] [New Thread 1110825904 (LWP 8382)] [New Thread 1108638640 (LWP 8381)] [Thread debugging using libthread_db enabled] [New Thread 1098144576 (LWP 8371)] [New Thread 1127631792 (LWP 8422)] 0xffffe410 in ?? ()
+ Trace 62488
Thread 4 (Thread 1123031984 (LWP 8401))
Second trace looks like a different bug. (evolution bug). Don't know about the first, could be in atk or evo.
When executing the Evolution automation scripts using LDTP, evolution crashes. When the events performed in the script are performed manually, evolution does not crash. Following is the trace got using gdb.
+ Trace 62491
Thread 3 (Thread 1114348464 (LWP 26127))
Thread 1 (Thread 1097948832 (LWP 26120))
I can reproduce the crash using LDTP's testcase. The trace is the same as the first one. The third trace is amlost the same as the first one. The second trace looks like a evoluton internal bug not related to a11y. The first one looks like a gailbutton's problem. In gail_button_init_textutil() of gailbutton.c, it connects to "notify" signal without g_object_ref the gailbutton. In gtkwidget.c's gtk_widget_dispose(), it calles gtk_widget_hide() which will send out the "notify" signal. In gail_button_notify_label_gtk() of gailbutton.c, the data(button) might already be disposed and is a invalid pointer. So comes the crash. I added g_object_ref (button) before connect to the signal in gail_button_init_textutil(). The testcase won't crash here anymore. But surely my fix is not the proper one. Gail maintainers should have a more clean and better way.
Now sometimes it crashes here:
+ Trace 62513
looks like a gailmenuitem problem.
Confirming the bug.
I added g_object_ref (gail_menu_item) in gail_menu_item_do_action() before the line: gail_menu_item->action_idle_handler = g_idle_add (idle_do_action, gail_menu_item); Now evolution won't crash anymore. So this is the same reason as the first crash, the invalid pointer of a gobject.
I think the fix for the menu action problem may be straightforward. Not so sure about the first problem (in gailbutton).
Created attachment 50918 [details] [review] Proposed patch
Created attachment 50919 [details] With proposed patch evolution crashes, but little later I tried pasting the crash in comments, but it doesn't allow me, because the total text size is more than the max allowed size. So attaching the crash.
Created attachment 51125 [details] [review] Patch fixes the gailbutton crash Patch from Varadhan <vvaradhan@novell.com>. Verified with LDTP. Bill: weak_unref has to be called.
Created attachment 51126 [details] [review] Patch fixes the gailmenuitem crash Patch from Rodrigo <rodrigo@novell.com>.
Nagappan: what did your comment about weak_unref mean in #11 above? I see that the patch inserts a call to weak_ref, are you saying that the patch is incomplete, i.e. that weak_unref needs to be called at the appropriate part of the lifecycle?
Bill: I think he meant w.r.t rodrigo's patch that uses g_object_ref, that, doing g_object_ref actually doesn't solve the first crash. Anyway leaving it to Nags to explain it further. Nags: Also, can you repost the patch in unified diff format. Use -Nup switch while generating patches.
Bill: Yes we need to call weak_unref at the appropriate part :) Varadhan: Thanks for giving me a hint. I will attach the patch.
Created attachment 51127 [details] [review] gailbutton patch, diff with -Nup option
Created attachment 51128 [details] [review] gailmenuitem patch, diff with -Nup option
Nags, Bill: The patch doesn't call weak_unref as the intention of the fix is to "disconnect" the notify signal callback whenever, the associated gailbutton object gets finalized. So, in this scenario, I don't think we would need to call weak_unref, ever. Correct me if I am wrong? :)
Veerapuram: I think you are right, there is no need to call weak_unref.
Bill: So both these patches are fine ? Or anything needs to be modified ?
Created attachment 51189 [details] Random crash in gail after applying the patch. Unable to reproduce the crash, with the same steps
Looks like patch is not ready yet.
Nags, Bill: As per the log, the crash occurs in "GTK_IS_WIDGET (data)", which means that the "data" pointer (which is a GtkLabel in our case) has been disposed off, before the GailButton object, which to me appears to be a memory corruption. This can easily be worked-around by getting a weak_ref on that label object and handling it appropriately in the weak_ref_notify of that label. I have a patch for it, however, not sure whether we in gail are suppose to handle it. If its ok, will attach the patch shortly.
Created attachment 51213 [details] [review] Proposed patch that takes care of both the scenarios (gailbutton before and after initial patch) Ok, this is what it does. The *new* crash happened because of the arugment "data" carried a pointer to a "disposed" GtkLabel object. So, this patch will define a weak_ref to the label widget as well as the gailbutton object. When any of the weak_ref is called, the other weak_ref is weak_unreffed, thereby avoiding the other weak_ref to be called. Well, though this patch works fine, I see a very nasty work-around being done here. Or, is this the way it should be handled? Though I could think off another fix that would be lot cleaner than this. Like, if GailButton structure has a member that has an active reference of the GtkLabel, that we use in gail_button_init_textutil() and in the finalize() function of GailButton, just do a g_signal_handlers_disconnect_by_func() on that label and unref it. Will test this tmrw and post a patch, if this is okay.
I don't think the other solution mentioned in the later part of #24 will be a good idea as the whole purpose of maintaining an extra reference to keep that GtkLabel object live is to disconnect its notify handler in the finalize method of GailButton. @Bill: Any suggestions/comments?
Without installing the patch in this bug: I'm getting this crash... Trace looks better. So adding to the bug. Backtrace was generated from '/opt/gnome/bin/evolution' Using host libthread_db library "/lib/tls/libthread_db.so.1". `system-supplied DSO at 0xffffe000' has disappeared; keeping its symbols. [Thread debugging using libthread_db enabled] [New Thread 1096982240 (LWP 14353)] [New Thread 1241340848 (LWP 14948)] [Thread debugging using libthread_db enabled] [New Thread 1096982240 (LWP 14353)] [New Thread 1241340848 (LWP 14948)] [Thread debugging using libthread_db enabled] [New Thread 1096982240 (LWP 14353)] [New Thread 1241340848 (LWP 14948)] [New Thread 1241074608 (LWP 14947)] [New Thread 1200626608 (LWP 14373)] [New Thread 1198525360 (LWP 14369)] [New Thread 1194191792 (LWP 14368)] [New Thread 1192090544 (LWP 14366)] [New Thread 1189989296 (LWP 14365)] [New Thread 1187888048 (LWP 14363)] [New Thread 1147308976 (LWP 14359)] [New Thread 1144814512 (LWP 14358)] 0xffffe410 in __kernel_vsyscall ()
+ Trace 63102
Thread 1 (Thread 1096982240 (LWP 14353))
Created attachment 52660 [details] Gedit crashed Unable to paste the crash log in bugzilla due to size limit, so attaching the stack trace.
Created attachment 53645 [details] Gail button crash
Bill: Based on the new comits done by Michael Meeks on 2005-11-22 in gailmenuitem.c, do I need to take the propose gailmenuitem patch from this bug ?
Hi Nags, I really don't know; this bug has gotten too messy to make much sense of. Can you go through the patches and mark the obsolete ones, etc.? Not sure if I should be looking at more than the last one. I suppose you should re-gen the patch against HEAD as well, if the changes haven't made it in via Michael's commit.
Bug 319299 and bug 327159 do look like duplicates of this one, especially matching the last stacktrace. Can you please have a closer look at them?
Created attachment 62902 [details] [review] updated gailbutton patch based on latest gail
Created attachment 62903 [details] [review] updated gailmenuitem patch based on latest gail
The only comment I have on the gailbutton patch is that GAIL_BUTTON (ATK_OBJECT (data)) should be GAIL_BUTTON (data) and G_OBJECT (ATK_OBJECT (data)) should be G_OBJECT (data) Why is the gailmenuitem patch necessary?
Please take a look at backtrace on comments #5, it may crash in gailmenuitem.
Created attachment 65813 [details] [review] patch to fix this bug
*** Bug 348225 has been marked as a duplicate of this bug. ***
patch committed ( http://cvs.gnome.org/viewcvs/gail/gail/gailbutton.c?r1=1.76&r2=1.77 ), can we close this? nags, sun folks?
Andre: gailmenuitem patch is pending. Once that is fixed, then we can close this bug.
*** Bug 356130 has been marked as a duplicate of this bug. ***