GNOME Bugzilla – Bug 479929
bug-buddy's gnome-breakpad causes many sun-jdk/sun-jre based java applications to crash when they wouldn't normally
Last modified: 2007-10-09 14:12:42 UTC
Please describe the problem: Eclipse-3.3 was continually crashing a few seconds into starting it. This was eventually tracked down to being because Sun's java can sometimes throw SIGV signals that are caught internally, however bug-buddy's gnome-breakpad inserts a SIGV handler which interferes and then causes java to actually crash. This is not related to google-breakpad, since I used the patch found on another bug here to disable google's breakpad and the crashes still occured. The best diagnosis of this is in an opensolaris bug (link provided below). There's also a workaround which is to ensure that GTK_MODULES doesn't contain gnomebreakpad, disabling the loading of the SIGV handler, after which the programs continue to load perfectly (as they had previously). Steps to reproduce: 1. Install bug-buddy 2.19.0+ 2. Start eclipse-3.3 3. Wait a few seconds. Actual results: Bug-buddy pops up reporting a crash Expected results: The program should continue loading normally. Does this happen every time? Yes, although the time differences between the crash points changes. Other information: Please also see: http://bugs.gentoo.org/show_bug.cgi?id=192310 http://bugs.opensolaris.org/view_bug.do?bug_id=6600538 If there's any further information that I can provide, or tests I can run, please let me know... 5:)
Hi, thank you very much for the report. I was already aware of it and tried to debug it. The problem is that the crash is in the java VM in its current closed-source-binary-only incarnation I was not able to debug it. Also, it seems that, anyways, the java vm is (was) segfaulting with the some combination of gtk 2.11+some theme. Before bug-buddy 2.20 / libgnomeui 2.20 we were installing the segfault handler only for applications linking to libgnomeui and that's why it was not visible. Now that we are imposing this seghandler to any gtk application running under GNOME desktop we can see it. I don't really like the idea of hiding a bug catching segfaults and ignoring them. The good news it that I'm not able to reproduce this bug anymore running latest gtk+ + theme bits from GNOME 2.20 release. Can you still reproduce the problem with these latest releases?
ok. I spoke too soon. I'm able to reproduce it using jre1.6.0_02 and azureus. I'm attaching an insteresting backtrace from it. The pattern here (azureus, eclipse) seems to be SWT libs. So you know any non-swt java applicationg getting this segfault?.
Created attachment 96131 [details] backtrace from crashing application
Hi, I agree, catching your own segfaults does seem a bit crazy. Has anyone been able to test this on openjdk? Is openjdk developed enough to be able to run eclipse etc? Sadly I am still able to recreate this with the following relevant packages: gtk+-2.12.0 libgnomeui-2.20.0 gtk-engines-2.12.1 gnome-themes-2.20.0 and the standard Clearlooks theme. I'm afraid the only program I have to test is eclipse. I don't really use any other java programs, however I've only heard of it affecting SWT programs too...
I forgot to mention, I'm also using the exact same version of the JRE/JDK. I can attach one of my backtraces (although, I compile everything with omit-frame-pointers, so it may not be all that useful) if you think it might help?
I'm trying to compile swt here to get some more debug info.
Well, definitely java vm is installing its own sighandler and then ignoring some of them: strings ./lib/amd64/server/libjvm.so | grep -i segv SIGSEGV recursive segv. expanding stack. unable to find SEGVing vtable stub also this file looks interesting: nm ./lib/amd64/libjsig.so [...] 00000000001014c4 d jvm_signal_installed 00000000001014c0 d jvm_signal_installing [...] so until java is open sourced and we can make it stop catching segfaults :) I think we could workaround this in two ways: a) java vm specific: if dlopening libjsig.so success refuse to install handlers at all b) generic for apps installing their own handlers using sigaction last parameter. (I guess that in b) if the install its own handler after our gtk_module_init it will smash our handler and would work)
Created attachment 96138 [details] [review] Proposed patch not installing the handler for signals that already has a non-default one
Mental note: Notice that this only fix the non-google-breakpad supported path. google-breakpad actually executed all the handlers in an stacked way.
Hiya Fernando, Thanks for the patch, I just tried it out and sure enough it works perfectly (as long as the google-breakpad's disabled, otherwise the crash still occurs). 5:) I'm not sure how to deal with google-breakpad's issues, perhaps it's still worth trying to locate the cause in SWT and see if we can provide a patch upstream? Either that, or try integrating the patch from bug 475507? I've been using that to try out my non-google-breakpad tests and it seems to work flawlessly...
which patch?. I think that the best "workaround" (and all workarounds are ugly, but let's try to figure out the SWT problem later) for the google-breakpad case is again not installing the ExceptionHandler if there is already a handler. Attaching patch.
Created attachment 96139 [details] [review] Patch covering also the google-brekpad path
Sorry, I must've gotten completely the wrong bug number there, off by a digit. I meant the patch from bug 479507. (http://bugzilla.gnome.org/attachment.cgi?id=96055&action=view) I just tried applying your second patch and got warnings around the returns not returning an int. I haven't looked at the code and I'm about to head to bed, I'm afraid, but I'll be happy to give it a try tomorrow. Sounds like we're nearly there! 5:)
(In reply to comment #2) > The pattern here (azureus, eclipse) seems to be SWT libs. So you know any > non-swt java applicationg getting this segfault?. > well, i have experienced this with java webstart apps too
# strace -o javatest -ff javaws Polymnia.jnlp i canceled the download, so the app does not start (only javaws running) # grep SIGSEGV javatest.* javatest.2279:rt_sigaction(SIGSEGV, NULL, {SIG_DFL}, 8) = 0 javatest.2279:rt_sigaction(SIGSEGV, {0x6309d50, ~[RTMIN RT_1], SA_RESTART|SA_SIGINFO}, {SIG_DFL}, 8) = 0 javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2318:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2506:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2506:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2507:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2507:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2508:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2509:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2510:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2510:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2511:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2511:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2512:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2512:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2513:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2513:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2514:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2514:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2515:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2515:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2516:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2516:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2518:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2518:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2519:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2519:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2521:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2521:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2522:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2522:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2523:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2523:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2524:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2524:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2525:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2525:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2526:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2526:--- SIGSEGV (Segmentation fault) @ 0 (0) --- javatest.2527:--- SIGSEGV (Segmentation fault) @ 0 (0) --- so, javaws itself throws segfaults and ignores them ... i dont think swt is involved in this.
I committed last patch. Somehow it's a ugly workaround but in the other hand we don't want to override low level stuff done by applications even before gtk_init. 2007-09-25 Fernando Herrera <fherrera@onirica.com> * gnome-breakpad/gnome-breakpad.cc: Don't install any handler if application has set any of them already (that is before gtk_init). It was causing crashes (exposing bugs?) with SWT applications. Fixes bug #479929
The patch doesn't compile with gcc 4.2.1, the return type for gtk_module_init is an integer, the return statements should return an integer value.
wops, good catch. 2007-09-25 Fernando Herrera <fherrera@onirica.com> * gnome-breakpad/gnome-breakpad.cc: Return always 0 on gtk_module_init function.
Just FYI, java isn't hiding bugs (on purpose) :-) by catching SIGSEGV. The memory manager and garbage collector will use mprotect and SIGSEGV to know when it's time to allocate more memory, extend the heap, etc. Also, I'd imagine that Java uses the handler to also detect null pointer accesses, which are thrown and can be caught in Java code.
*** Bug 482840 has been marked as a duplicate of this bug. ***
*** Bug 485053 has been marked as a duplicate of this bug. ***