GNOME Bugzilla – Bug 340437
Stability issues (recent GNOME/Gtk+)
Last modified: 2006-12-15 20:03:22 UTC
BEAST has some stability issues on AMD64 with recent GNOME/Gtk+ checkout. I've seen different types of problems 1. problems in g_slice 2. problems in gobject (beast:691): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed 3. problems in glibc: double free or corruption (out): 0x0000000002fc2300 Here is one example for a backtrace (created with a g_logv breakpoint): (gdb) bt
+ Trace 67986
Continuing. (beast-0.7.0:19226): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed but its not really that the backtrace is always the same. However, the stability problem can reliabily be reproduced simply by using BEAST a little, loading some songs, closing them again, loading other songs, ... (it works without loading songs, too; I think its simply a matter of triggering many allocations and frees so that - whatever corruption goes on - actually affects the program).
Here is another stacktrace (this one is in g_slice): Program received signal SIGSEGV, Segmentation fault. slab_allocator_free_chunk (chunk_size=<value optimized out>, mem=0x3ce68c0) at gslice.c:1020 1020 next->prev = prev; (gdb) bt
+ Trace 67988
Created attachment 64687 [details] A valgrind log This is a valgrind log of simply starting and quitting BEAST while telling it to load a song on the command line. Without valgrind, this results in a (beast-0.7.0:7775): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed warning, whereas with valgrind, the warning does not occur. I honestly don't know whats going on here.
please test this with the malloc hooks as described here: http://blogs.gnome.org/view/timj/2006/01/25/0
the valgrind log doesn't show any real bugs btw. the log indicates that you're using LADSPA plugins though, can you confirm that the instabilities happen without LADSPA as well? i.e. when running beast with -:L ?
Yes, the instabilities happen without LADSPA as well. The reason I excluded LADSPA from valgrind is that there are LADSPA related bugs in valgrind. I need to report them upstream seperately. But it unrelated.
I found something. The asynchronous canvas handlers in BEAST work around a canvas bug: Code executed while queueing the handler (bstcanvaslink.c): GnomeCanvasItem* bst_canvas_link_new (GnomeCanvasGroup *group) { [...] g_object_ref (item->canvas); /* GnomeCanvasItem doesn't properly clear its pointers */ Code executed after finishing the handler (bstcanvaslink.c): static gboolean bst_canvas_link_build_async (gpointer data) { [...] GnomeCanvas *canvas = item->canvas; g_object_unref (item); g_object_unref (canvas); /* canvases don't properly protect their items */ However, newer versions of libgnomecanvas *do* clear the item->canvas pointer (libgnomecanvas/gnome-canvas.c CVS HEAD): /* Standard object dispose function for canvas items */ static void gnome_canvas_item_dispose (GObject *object) { GnomeCanvasItem *item; [...] /* items should remove any reference to item->canvas after the first ::destroy */ item->canvas = NULL; } So the effect is that a g_object_unref will be called on a NULL pointer, which explains the gobject errors I've seen. At thats how I explain what happens, but I am not sure I've got the semantics of GObject's _dispose method right. But what I can see using debugging printfs is that item->canvas is definitely NULL, so there is a problem surrounding these handlers that needs to be fixed.
can you try changing the code to this: GnomeCanvas *canvas = item->canvas; g_object_unref (item); if (canvas) g_object_unref (canvas); /* some canvases don't properly protect their items this will leak but shouldn't crash.
Small remark to the last comment: what I use to trigger this is closing the BEAST window while a project is still performing drawing operations in the canvas over idle handlers. Then idle handlers for canvas links will be executed while the canvas is already dead.
When adding the following modifications: Index: bstcanvaslink.c =================================================================== RCS file: /cvs/gnome/beast/beast-gtk/bstcanvaslink.c,v retrieving revision 1.29 diff -u -p -u -r1.29 bstcanvaslink.c --- bstcanvaslink.c 10 Dec 2004 11:19:09 -0000 1.29 +++ bstcanvaslink.c 6 May 2006 14:50:01 -0000 @@ -329,7 +329,8 @@ bst_canvas_link_build_async (gpointer da } GnomeCanvas *canvas = item->canvas; g_object_unref (item); - g_object_unref (canvas); /* canvases don't properly protect their items */ + if (canvas) + g_object_unref (canvas); /* canvases don't properly protect their items */ return FALSE; } Index: bstcanvassource.c =================================================================== RCS file: /cvs/gnome/beast/beast-gtk/bstcanvassource.c,v retrieving revision 1.68 diff -u -p -u -r1.68 bstcanvassource.c --- bstcanvassource.c 19 Dec 2004 01:22:38 -0000 1.68 +++ bstcanvassource.c 6 May 2006 14:50:02 -0000 @@ -874,7 +874,8 @@ bst_canvas_source_build_async (gpointer } GnomeCanvas *canvas = item->canvas; g_object_unref (item); - g_object_unref (canvas); /* canvases don't properly protect their items */ + if (canvas) + g_object_unref (canvas); /* canvases don't properly protect their items */ return FALSE; } there are no more (beast:691): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed messages. However, there are still stability problems with these modifications applied, like ***MEMORY-ERROR***: beast-0.7.0[23114]: GSlice: assertion failed: sinfo->n_allocated > 0
first, removing the _ref() destabelizes beast, there's a reason that references is being held. second, the gslice message looks much like memory corruption, that might be easier to debug with valgrind or the hints given here: http://blogs.gnome.org/view/timj/2006/01/25/0
I am not removing a _ref() here; I am just not calling _unref() if canvas is NULL. I have tried this according to your comment #7. The problem with the patch is not that it drops references that are needed, but that it doesn't dereference stuff (the canvas) that is no longer needed.
sorry, misread your patch, you're not removing _ref() calls. btw, can you please take care that pasting patches doesn't break lines in the fututre? since the patch seems to fix warnings for you, but has a slight leakage problem, can you rework it towards using object data on items to store the canvas reference? i.e.: - when we currently do: g_object_ref (item->canvas); /* GnomeCanvasItem doesn't properly clear its pointers */ we should do: /* some GnomeCanvasItem versions don't properly clear its pointers */ g_object_set_data_full (item, "BstWorkaround-canvas-ref", g_object_ref (item->canvas), g_object_unref); - and where we currently unref item and item->canvas, we should do: GnomeCanvas *canvas = g_object_steal_data (item, "BstWorkaround-canvas-ref"); g_object_unref (item); if (canvas) g_object_unref (canvas); that should essentially do what your patch does, but not leak. if that still fixes things for you and seems to work, you can commit (some slight alterations may be neccessary to please the compiler).
It shows that the stability issues are probably not AMD64 related (or at least not all of them). I could reproduce them on the same system running in 32bit mode with a 32bit version of BEAST. So it *should* be reproducible on pretty much any Debian/unstable system. The steps to do this are easy: 1. open BEAST 2. create a tear-off menu for Demo 3. hit 30 times (or so) on party monster quickly ... now BEAST floods your screen with 30 party monster songs ... 4. close all windows away again quickly This usually results in a crash.
Stefan, any clue on whther the alloc hooks mentioned in comment 3 make a difference? and, can you please outline which version of gtk+ and glib this is? other than just "debian unstable" which is in flux and not readily available for everyone.
The last test (comment #13) was performed with: stefan@lotrien:~$ beast --version BEAST version 0.7.0 (ALPHA) Libraries: GLib 2.10.2, SFI 0.7.0, BSE 0.7.0, Ogg/Vorbis, MAD 0.15.1 (beta), GTK+ 2.8.17, GXK 0.7.0 Compiled for x86_64-unknown-linux-gnu with SSE plugins. Intrinsic code selected according to runtime CPU detection: CPU Architecture: i686 CPU Vendor: AuthenticAMD CPU Features: FPU TSC CPU Integer SIMD: MMX MMXEXT CPU Float SIMD: SSE SSE2 SSESYS CPU Media SIMD: 3DNOW 3DNOWEXT The crashes could no longer be reproduced using the above scheme (comment #13) when using: stefan@lotrien:~$ G_SLICE=always-malloc G_DEBUG=gc-friendly beast -:L As opposed to: stefan@lotrien:~$ beast -:L where the crashes *can* be reproduced. Both tests show: (beast-0.7.0:2960): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed warnings, which however has a known fix discussed earlier (comment #9). What makes me worry a bit is that I am pretty sure that I have seen crashes even with G_SLICE=always-malloc before, however I didn't write down the where, when and why right then, and today I have not been able to reproduce it, although I tried quite some things.
Created attachment 65883 [details] [review] Workaround for the gnomecanvas issue Here is an upgraded patch, which should do what you suggested in comment #12. So it should work with older and newer libgnomecanvas versions. Let me know if its ok to commit.
in general looks good, except that you mixed two consequtive comments at the end: - /* asyncronously rebuild contents */ - g_object_ref (GNOME_CANVAS_ITEM (csource)->canvas); /* GnomeCanvasItem doesn't properly clear its pointers */ + /* asynchronously rebuild contents */ + /* some GnomeCanvasItem versions don't properly clear their pointers */ + GnomeCanvasItem *csource_item = GNOME_CANVAS_ITEM (csource); + g_object_set_data_full (csource_item, "bst-workaround-canvas-ref", g_object_ref (csource_item->canvas), g_object_unref); this should be: + /* asynchronously rebuild contents */ + GnomeCanvasItem *csource_item = GNOME_CANVAS_ITEM (csource); + /* some GnomeCanvasItem versions don't properly clear their pointers */ + g_object_set_data_full (csource_item, "bst-workaround-canvas-ref", g_object_ref (csource_item->canvas), g_object_unref); allthough, with the changes you made, it's probably best to change all the old "some GnomeCanvasItem versions don't properly clear their pointers" comments into "work around stale canvas pointers, see #340437". provided that's fixed anbd tested, please commit.
ok, the canvas patch is now comitted. and FYI, here's a bug that may or may not be related to the crashes you're seeing: http://bugzilla.gnome.org/show_bug.cgi?id=341327
(In reply to comment #18) > and FYI, here's a bug that may or may not be related to the crashes you're > seeing: http://bugzilla.gnome.org/show_bug.cgi?id=341327 I retested with a recent gtk, which has the fix for the #341327 bug: $ beast --version BEAST version 0.7.0 (ALPHA) Libraries: GLib 2.12.1, SFI 0.7.0, BSE 0.7.0, Ogg/Vorbis, MAD 0.15.1 (beta), GTK+ 2.10.1, GXK 0.7.0 [...] and the crash (described in comment #13) can still be reproduced.
since debugging this involves gslice and valgrind, here's the bug report about integrating the two properly: http://bugzilla.gnome.org/show_bug.cgi?id=335126
I'm using Linux on 32-bit Intel. glibc: 2.3.5 libgtk: 1.2.10-18 Trying to open the quick start help after the demo song caused the program to crash. First time it gave this: OSS: underrun detected (diff=528), skipping input ***MEMORY-ERROR***: beast[3846]: GSlice: assertion failed: sinfo->n_allocated > 0 And a second time this: *** glibc detected *** double free or corruption (out): 0x09a45780 ***
(In reply to comment #21) > I'm using Linux on 32-bit Intel. > glibc: 2.3.5 > libgtk: 1.2.10-18 > ***MEMORY-ERROR***: beast[3846]: GSlice: assertion failed: sinfo->n_allocated > > 0 > *** glibc detected *** double free or corruption (out): 0x09a45780 *** for the record, this crash occoured with 0.6.6 (found out on IRC).
Created attachment 77699 [details] Reduced testcase I tried to find an easier way to reproduce the crash: 1. load attached file 2. remove the only snet using project -> remove song or synthesizer 3. undo / redo this operation a few times quickly (using ctrl + y / ctrl + z) 4. wait some time (30 seconds or so) 5. goto 3. That timing plays a role here (or so it seemed to me) may indicate that the gslice timeouts affect whether or not a crash occurs. The error message is: ***MEMORY-ERROR***: beast-0.7.1[32419]: GSlice: assertion failed: sinfo->n_allocated > 0 Beast from SVN. Version information: stefan@lotrien:/usr/local/src/beast$ svn up At revision 4129. stefan@lotrien:/tmp$ beast --version BEAST version 0.7.1 (ALPHA) Libraries: GLib 2.12.3, SFI 0.7.1, BSE 0.7.1, Ogg/Vorbis I 20050304, MAD 0.15.1 (beta), GTK+ 2.10.3, GXK 0.7.1 Compiled for x86_64-unknown-linux-gnu with SSE plugins. Intrinsic code selected according to runtime CPU detection: CPU Architecture: AMD64 CPU Vendor: AuthenticAMD CPU Features: FPU TSC CPU Integer SIMD: MMX MMXEXT CPU Float SIMD: SSE SSE2 SSESYS CPU Media SIMD: 3DNOW 3DNOWEXT Prefix: /usr/local/beast Doc Path: /usr/local/beast/share/beast/v0.7.1/docs Image Path: /usr/local/beast/share/beast/v0.7.1/images Locale Path: /usr/local/beast/share/locale Keyrc Path: /usr/local/beast/share/beast/v0.7.1/keys Skin Path: /usr/local/beast/share/beast/v0.7.1/skins:~/.beast/skins/:~/.beast/skins/*/ Sample Path: /usr/local/beast/share/bse/samples:~/beast//samples Script Path: /usr/local/beast/share/bse/v0.7.1/scripts:~/beast//scripts Effect Path: /usr/local/beast/share/bse/v0.7.1/effects:~/beast//effects Instrument Path: /usr/local/beast/share/bse/v0.7.1/instruments:~/beast//instruments Demo Path: /usr/local/beast/share/bse/v0.7.1/demo Plugin Path: /usr/local/beast/lib/bse/v0.7.1/plugins LADSPA Path: /usr/local/beast/lib/ladspa:$LADSPA_PATH BEAST comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of BEAST under the terms of the GNU General Public License which can be found in the BEAST source package. Sources, examples and contact information are available at http://beast.gtk.org/.
Created attachment 78173 [details] Another valgrind log (+glib hints) Here is a valgrind log using a patched version of glib, when reproducing tho crash with the undo/redo method.
(In reply to comment #24) > Created an attachment (id=78173) [edit] > Another valgrind log (+glib hints) > > Here is a valgrind log using a patched version of glib, when reproducing tho > crash with the undo/redo method. the log is useless, because: a) it doesn't contain all the relevant program output, i.e. mem_error() is called, but its printouts are *not* provided. b) a valgrind/memcheck assertion was triggered: Memcheck: mc_leakcheck.c:817 (vgMemCheck_do_detect_memory_leaks): Assertion 'lc_shadows[i]->data + lc_shadows[i]->size <= lc_shadows[i+1]->data' failed. (b) seems to indicate that your valgrind version or valgrind hints in your code are buggy. for future logs, (a) can be fixed by using something like: valgrind programtodebug 2>&1 | tee logfile
Created attachment 78200 [details] Complete valgrind log (+glib hints) I can't find the cause for the valgrind assertion quickly, but here is a new valgrind log of the undo/redo reproduced crash, including all output.
(In reply to comment #26) > Created an attachment (id=78200) [edit] > Complete valgrind log (+glib hints) > > I can't find the cause for the valgrind assertion quickly, but here is a new > valgrind log of the undo/redo reproduced crash, including all output. thanks, can you please also provide the exact way to reproduce this log? i.e. beast --version, command line args you used and environment variables (meaning G_SLICE, _MALLOC_CHEKC and the like).
(In reply to comment #27) > (In reply to comment #26) > > Created an attachment (id=78200) [edit] > > Complete valgrind log (+glib hints) > > > > I can't find the cause for the valgrind assertion quickly, but here is a new > > valgrind log of the undo/redo reproduced crash, including all output. > > thanks, can you please also provide the exact way to reproduce this log? i.e. > beast --version, command line args you used and environment variables (meaning > G_SLICE, _MALLOC_CHEKC and the like). > Sure; I was following these instructions, without any extra settings (that is, G_SLICE="", _MALLOC_CHEKC=""): ------- the "undo/redo method": ------- 2006-12-05 00:40 UTC [reply] Created an attachment (id=77699) [edit] Reduced testcase I tried to find an easier way to reproduce the crash: 1. load attached file 2. remove the only snet using project -> remove song or synthesizer 3. undo / redo this operation a few times quickly (using ctrl + y / ctrl + z) 4. wait some time (30 seconds or so) 5. goto 3. That timing plays a role here (or so it seemed to me) may indicate that the gslice timeouts affect whether or not a crash occurs. The error message is: ***MEMORY-ERROR***: beast-0.7.1[32419]: GSlice: assertion failed: sinfo->n_allocated > 0 Beast from SVN. Version information: stefan@lotrien:/usr/local/src/beast$ svn up At revision 4129. stefan@lotrien:/tmp$ beast --version BEAST version 0.7.1 (ALPHA) Libraries: GLib 2.12.3, SFI 0.7.1, BSE 0.7.1, Ogg/Vorbis I 20050304, MAD 0.15.1 (beta), GTK+ 2.10.3, GXK 0.7.1 Compiled for x86_64-unknown-linux-gnu with SSE plugins. Intrinsic code selected according to runtime CPU detection: CPU Architecture: AMD64 CPU Vendor: AuthenticAMD CPU Features: FPU TSC CPU Integer SIMD: MMX MMXEXT CPU Float SIMD: SSE SSE2 SSESYS CPU Media SIMD: 3DNOW 3DNOWEXT Prefix: /usr/local/beast Doc Path: /usr/local/beast/share/beast/v0.7.1/docs Image Path: /usr/local/beast/share/beast/v0.7.1/images Locale Path: /usr/local/beast/share/locale Keyrc Path: /usr/local/beast/share/beast/v0.7.1/keys Skin Path: /usr/local/beast/share/beast/v0.7.1/skins:~/.beast/skins/:~/.beast/skins/*/ Sample Path: /usr/local/beast/share/bse/samples:~/beast//samples Script Path: /usr/local/beast/share/bse/v0.7.1/scripts:~/beast//scripts Effect Path: /usr/local/beast/share/bse/v0.7.1/effects:~/beast//effects Instrument Path: /usr/local/beast/share/bse/v0.7.1/instruments:~/beast//instruments Demo Path: /usr/local/beast/share/bse/v0.7.1/demo Plugin Path: /usr/local/beast/lib/bse/v0.7.1/plugins LADSPA Path: /usr/local/beast/lib/ladspa:$LADSPA_PATH BEAST comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of BEAST under the terms of the GNU General Public License which can be found in the BEAST source package. Sources, examples and contact information are available at http://beast.gtk.org/.
Created attachment 78340 [details] Script for reproducing the crash 1. install the script in ~/beast/scripts. 2. run beast 3. create an empty SNet 4. run the script via context menu -> Utilities -> QA -> Canvas Stability Test It takes some minutes until the crash occurs (so the "undo/redo method" described before can be quicker, but requires more manual interaction). Timing /seems/ to matter from all I know, so I made the timing configurable. The default (waiting 10 seconds between the steps) should be okay for a crash, at least it works for me. I do not know if its ideal. What is ideal may also be machine dependant. I am also seeing warnings like: (beast-0.7.1:15161): BEAST-WARNING **: Couldn't figure CanvasSource Item from BSE module "Summation-11" (beast-0.7.1:15161): BEAST-WARNING **: Couldn't figure CanvasSource Item from BSE module "Summation-12" when running the script. The beast SVN revision I used for testing this is revision 4151.
(In reply to comment #29) > Created an attachment (id=78340) [edit] > Script for reproducing the crash > It takes some minutes until the crash occurs (so the "undo/redo method" > described before can be quicker, but requires more manual interaction). thanks, with this script the crasher could finally be triggered reliably. bug fixed: Fri Dec 15 18:21:46 2006 Tim Janik <timj@gtk.org> * beast-gtk/gxk/gxkutils.c (popup_menus_detach): fixed menu_list type which is a GList, not a GSList, and has to be released as a GList. this fixes creeping memory corruption in GSlice.