GNOME Bugzilla – Bug 570008
Segmentation fault on exit (kickstart changes)
Last modified: 2009-04-01 19:39:40 UTC
Since about one week, on trunk, ekiga seg faults on exit. Stack trace below for version 26 january 2009 trunk.
Created attachment 127645 [details] Stack trace
The backtrace is corrupt:
+ Trace 212072
Thread 14 (Thread 0x4301d950 (LWP 7865))
Is that possible to have debug symbols ? It happens in glib, but no idea why...
Created attachment 127650 [details] Stack bt with additional debugging code
Julien, any idea? It happens when creating a glib thread.
There are several striking things : - I don't think we create any glib thread explicitly -- that means a lib we depend on does it (what are your compile options?) ; - especially so on shutdown! - your debug output for the kickstart is wrong : since 20090126, that isn't what ekiga spits out. Notice that I have regular assertion failed crashes since months in opal (reported upstream), but no other crashes on exit.
With today trunk I still have the segm fault and is reproducible every time. ./autogen.sh $(confflags) --prefix=/usr --disable-dbus --sysconfdir=/etc --disable-schemas-install --disable-scrollkeeper #--enable-gstreamer (last time it was WITH -enable-gstreamer I think). I think glib appears because of build dependency libdbus-glib-1-dev or libavahi-glib-dev, most probably the former. Attached, a new bt for today trunk. Does the crash appear in malloc_consolidate, i.e. in glibc?
Created attachment 127959 [details] New stack bt, 04 02 2009
Could you try to compile with less --disable switches ? I compile with generally only --prefix=/usr, and never saw that :-/
It crashes even with "./autogen.sh" and executing src/ekiga! (Anyway, my previous switches were not harmful, and were used mainly during make install.) I attach a valgrind output with information about the crash. It shows many other errors, are they really errors?
Created attachment 128038 [details] Valgrind output
Created attachment 128039 [details] Valgrind output (uncompressed)
I don't understand the backtraces and the valgrind output doesn't look like a startup crash at all!? It looks like ekiga is shutting down instead... What glib version are you running?
ii libglib2.0-0 2.18.4-1 The GLib library of C routines The crash appears in engine.cpp, function engine_stop(), lines: if (service_core) delete service_core; It seems service_core is deleted twice, but I do not know where it is deleted the second time.
Uh... that looks even more like a shutdown!
Deleting twice a pointer is "more like a shutdown"?!
Calling "engine_stop" definitely doesn't look like a startup!
Oh!, I have just understand your comment, Julien! In fact, the title should be "on exit", not "on startup"!!! I change this right now!
Sigh... reading your comments again, you've been pretty clear it was on exit since the beginning -- except in the title, and that's part of what was causing the incomprehension : that's ok now. The other reason for my incomprehension is that the stack trace shows a *starting* thread! And that is definitely not ok! What is starting a thread on shutdown!? I would like to have a backtrace of a smaller ekiga with parts forcibly shut down, like : ./src/ekiga --kickstart-disabled=AVAHI,AVAHIPUB,LDAP,GNOMESESSION Can you provide one?
Created attachment 128893 [details] Backtrace of src/ekiga --kickstart-disabled=AVAHI,AVAHIPUB,LDAP,GNOMESESSION
I attach another stack backtrace with the trunk as of today. With your previous patch, service_core is not deleted twice anymore, but the crash appears in another place. I suppose kickstart change is buggy (but it is just a supposition). Also, isn't valgrind output interesting to you?
Created attachment 129330 [details] As of 23 02 2009
That double deletion wasn't normal : I added things on what are essentially _error_ code paths! Could you test which of them was taken?
For info, svn 7603 is ok, 7614 has the bug.
I committed your s/pop_back/pop_first -- from the look of it, I would say the GUI doesn't take&release references to its deps correctly.
s/pop_first/pop_front !!! Now, the crash has moved forward: eugen3 deleting local-roster-bridge eugen3 refcount= 2 eugen3 deleted eugen3 deleting local-cluster eugen3 refcount= 9 eugen3 deleted eugen3 deleting gtk-frontend eugen3 refcount= 1 eugen8 ~gtk-frontend0x224e000 eugen8-- ~gtk-frontend0x224e000 eugen82 ~gtk-frontend eugen83 ~gtk-frontend eugen3 deleted eugen3 deleting gtk-core eugen3 refcount= 1 eugen3 deleted eugen3 deleting call-history-store eugen3 refcount= 8 eugen3 deleted eugen3 deleting ldap-source eugen3 refcount= 8 eugen3 deleted eugen3 deleting evolution-source eugen3 refcount= 8 eugen3 deleted eugen3 deleting avahi-presence-publisher eugen3 refcount= 2 eugen3 deleted eugen3 deleting avahi-core eugen3 refcount= 8 eugen3 deleted eugen3 deleting opal-component eugen3 refcount= 3 eugen3 deleted eugen3 deleting opal-account-store eugen3 refcount= 2 eugen3 deleted eugen3 deleting ptlib-audio-output eugen3 refcount= 1 eugen3 deleted eugen3 deleting ptlib-audio-input eugen3 refcount= 1 eugen3 deleted eugen3 deleting ptlib-video-input eugen3 refcount= 1 eugen3 deleted eugen3 deleting null-audio-input eugen3 refcount= 1 eugen3 deleted eugen3 deleting presence-core eugen3 refcount= 1 eugen3 deleted eugen3 deleting personal-details eugen3 refcount= 1 eugen3 deleted eugen3 deleting call-core eugen3 refcount= 3 eugen3 deleted eugen3 deleting hal-core eugen3 refcount= 1 eugen3 deleted eugen3 deleting audiooutput-core eugen3 refcount= 1 eugen3 deleted eugen3 deleting audioinput-core eugen3 refcount= 1 eugen3 deleted eugen3 deleting videooutput-core eugen3 refcount= 1 eugen ~videooutputcore eugen ~videooutputcore2 eugen ~videooutputcore3 *** glibc detected *** src/ekiga: corrupted double-linked list: 0x0000000001f60520 *** So now, the crash is in videooutput-core.cpp: VideoOutputCore () { std::cout << "eugen ~videooutputcore" << std::endl; PWaitAndSignal m(core_mutex); std::cout << "eugen ~videooutputcore2" << std::endl; if (videooutput_core_conf_bridge) delete videooutput_core_conf_bridge; std::cout << "eugen ~videooutputcore3" << std::endl; for (std::set<VideoOutputManager *>::iterator iter = managers.begin (); iter != managers.end (); iter++) delete (*iter); std::cout << "eugen ~videooutputcore4" << std::endl; managers.clear(); std::cout << "eugen ~videooutputcore5" << std::endl; } Why all these crashes?! (Why do they appear now, i.e. revision 7614?) Do you know how to remove this one? I will continue tomorrow the investigation.
Sigh... I'll try to have a look. :-( Could you print something in the "delete (*iter);" loop so we know for which one the crash occurs?
Pfff, I think the memory deallocation should be revisited as a whole. With the patch pop_front, and with the patch below, I made several tests by stopping ekiga once the ekiga.net account is registered. for (std::set<VideoOutputManager *>::iterator iter = managers.begin (); iter != managers.end (); iter++){ std::cout << "eugen ~videooutputcore3x" << std::endl; delete (*iter); } Sometimes it stops ok. Sometimes it prints: eugen ~videooutputcore3 eugen ~videooutputcore3x Segmentation fault (or, instead of Segmentation fault: *** glibc detected *** src/ekiga: corrupted double-linked list: 0x000000000168f800 *** ) Sometimes it stops at: eugen3 deleting contact-core eugen3 refcount= 1 *** glibc detected *** src/ekiga: corrupted double-linked list: 0x0000000001312840 *** so there is a problem in contact-core destructor (contact-core comes after videooutput-core). I give up, hoping that someone who knows how memory is managed in ekiga will have a closer look. What I still do not understand is the link between all these destructors and changes made one month ago, when no crash appeared, as shown in comment #23.
The memory deallocation should be : do nothing, let gmref_ptr do everything. One of the problems is that a few of the stacks still don't use it... another problem is that we still have mindless threads in some places -- and those can also explain some of the crashes (though perhaps not that one). The fact that I don't get crashes on exit also doesn't help :-(
If I shut down ekiga at 7-10 seconds after having registered to ekiga.net, then sometimes it does not crashes.
There's something definitely fishy there. A simple memory management bug would give a very reproducible crash, both for you and everyone else. I really think something happens in threads.
Does having video preview on or off change anything?
No. Let's wait until someone else has this problem.
Created attachment 131116 [details] gdb trace I'm seeing the same crash every time I quit on a freshly built ekiga 3.2.0 (from tarballs) on Ubuntu 8.10 x86_64.
I reviewed all the backtraces, and I think I saw a pattern. I'm not 100% sure Howard really has the same problem as Eugen. Eugen's problem is that when some Evolution::Book is destroyed, something is done *by bonobo* in another thread, and we get the crash in that thread. Howard's problem is when some History::Contact is destroyed, but the crash is still in another thread where the trace looks similar ; it's less clear because we don't have as much debug symbols. Eugen, could you try to see if you get the same crash with : ekiga --kickstart-disabled=EVOLUTION my hope is that it will magically disappear. Howard, can you reproduce the problem easily? If so, could you also try the kickstart-disabled trick? If you could get me more debug symbols, that would be nice too.
Uh, I forgot to mark the bug as NEEDINFO, although I asked for more.
The crash still appears with the --kickstart-disabled trick :o(
With the same gdb trace?!
With --kickstart-disabled and the version from 26 03 2009:
+ Trace 213928
After putting cout instructions in VideoInputCore::~VideoInputCore and VideoOutputCore::~VideoOutputCore () like this: VideoInputCore::~VideoInputCore () { PWaitAndSignal m(core_mutex); if (videoinput_core_conf_bridge) delete videoinput_core_conf_bridge; std::cout << "eugen ~videoinputcore" << std::endl; for (std::set<VideoInputManager *>::iterator iter = managers.begin (); iter != managers.end (); iter++) { std::cout << "eugen ~videoinputcore1" << std::endl; delete (*iter); } std::cout << "eugen ~videoinputcore2" << std::endl; managers.clear(); std::cout << "eugen ~videoinputcore3" << std::endl; } VideoOutputCore::~VideoOutputCore () { PWaitAndSignal m(core_mutex); if (videooutput_core_conf_bridge) delete videooutput_core_conf_bridge; std::cout << "eugen ~videooutputcore" << std::endl; for (std::set<VideoOutputManager *>::iterator iter = managers.begin (); iter != managers.end (); iter++) { std::cout << "eugen ~videooutputcore1" << std::endl; delete (*iter); } std::cout << "eugen ~videooutputcore2" << std::endl; managers.clear(); std::cout << "eugen ~videooutputcore3" << std::endl; } it prints: eugen3 deleting videooutput-core eugen3 deleted eugen3 deleting videoinput-core eugen ~videoinputcore eugen ~videoinputcore1 eugen ~videoinputcore1 eugen ~videoinputcore2 eugen ~videoinputcore3 eugen ~videooutputcore eugen ~videooutputcore1 *** glibc detected *** src/ekiga: corrupted double-linked list: 0x00000000019d98e0 *** It's 100% reproducible. In all the cases, the video preview is off.
Could you had g_print messages in runtime-glib.cpp (and more precisely in the Ekiga::Runtime::quit function) so we see whether we're not doing things after it's been shut down?
I put printf/g_print at beginning and end of functions init, finalize and quit. init is calles and finished. finalize and quit are *not* called when quitting ekiga. I put also g_print in run_in_main: void Ekiga::Runtime::run_in_main (sigc::slot0<void> action, unsigned int seconds) { g_print ("eugen runinmain1\n"); g_async_queue_push (queue, (gpointer)(new struct message (action, seconds))); g_print ("eugen runinmain2\n"); } and they are not shown upon quitting ekiga => run_in_main is not called either. Still, the gdb shows that this function is called upon quitting, I do not understand...
+ Trace 213937
Thread 4 (Thread 0x7fc503cae950 (LWP 29045))
I have had a revelation and compiled ekiga with -O0 -g. Here is a new bt with --kickstart-disabled=EVOLUTION, there is a problem in Contact (same as Howard it seems):
+ Trace 213939
Thread 1 (Thread 0x7faf1810c7b0 (LWP 24686))
Sigh... you seem to have the same as Howard, indeed :-/
I just committed a bunch of things which may help ; can you confirm?
I have just made svn up. Still crashes:
+ Trace 213977
1. Could you get me the full backtrace? 2. Are you sure this isn't another crash?
2. I do not know. 1. See below.
Created attachment 131804 [details] new gdb output from execution without any argument
Created attachment 131805 [details] new gdb output from execution without any argument (another one)
Created attachment 131806 [details] new gdb output from execution with argument: --kickstart-disabled=EVOLUTION
What glib version do you have? I'm not convinced the problem is with us :-/ Perhaps you could try running ekiga with G_SLICE=always-malloc and get me yet-another backtrace ?
snoopy:~$ dpkg -l libglib\* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==================-==================-==================================================== ii libglib-perl 1:1.221-1 Perl interface to the GLib and GObject libraries un libglib1.2 <none> (no description available) ii libglib1.2ldbl 1.2.10-19 The GLib library of C routines un libglib1.3-dev <none> (no description available) ii libglib2.0-0 2.20.0-2 The GLib library of C routines ii libglib2.0-0-dbg 2.20.0-2 The GLib libraries and debugging symbols un libglib2.0-data <none> (no description available) un libglib2.0-dbg <none> (no description available) ii libglib2.0-dev 2.20.0-2 Development files for the GLib library un libglib2.0-doc <none> (no description available) un libglibmm-2.4-1 <none> (no description available) un libglibmm-2.4-1c2 <none> (no description available) ii libglibmm-2.4-1c2a 2.20.0-1 C++ wrapper for the GLib toolkit (shared libraries) ii libglibmm-2.4-dev 2.20.0-1 C++ wrapper for the GLib toolkit (development files) un libglibmm-2.4-doc <none> (no description available) un libglibmm2.3-dev <none> (no description available)
Created attachment 131809 [details] gdb avec G_SLICE=always-malloc
Sigh... I studied all the backtraces attached to this bug : - most of them are about a crash in glib's queue code (including Howard's) ; - and then there is (lately) a crash in History::Contact's destruction. So I'm pretty sure we have several issues all mixed in this bug report. The fact that I don't get that crash at all is very annoying : if we could pinpoint exactly what triggers the problem, that could help. Eugen, does the History::Contact crash happen even if your history is empty?
Created attachment 131845 [details] gdb with history list empty Yes, it crashes too.
And this is the glib queue crash again...
Let me split that bug report : the GAsyncQueue problem is now bug #577640 ; and I'm closing that bug as a duplicate of bug #576226, because the crash in the C++ code looks similar (RefLister issue?). *** This bug has been marked as a duplicate of 576226 ***