GNOME Bugzilla – Bug 594872
Support storing assertion messages into core dump
Last modified: 2010-01-26 13:31:52 UTC
Created attachment 142973 [details] [review] store assert message into glibc Hello! In the current Ubuntu release we extended the Apport crash processing system to also catch assertion messages (which result in SIGABRT). Those are only really useful if you can get the actual message into the bug reports, which is usually just printed to stderr. For that we got a patch into glibc (http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=48dcd0ba) which stores the message of assert() into glibc's __abort_msg global variable (a char*). This works fine for programs which actually use assert(), like pulseaudio or gnome-do (like bug https://launchpad.net/bugs/421696), but not yet for GNOMEish programs. I created a patch for glib's assertion functions to do the same now. With this, the assertion message is kept and can be retrieved from the core dump easily: $ cat test.c #include <glib.h> int main() { g_assert(1 < 0); return 0; } $ ./test **ERROR:test.c:5:main: assertion failed: (1 < 0) Aborted (Core dumped) $ gdb --batch --ex 'print (char*) __abort_msg' ./test core [...] $1 = 0x93bf028 "ERROR:test.c:5:main: assertion failed: (1 < 0)" Since the glibc patch isn't in a released glibc version yet, I added a configure switch --enable-assert-messages, since otherwise glib will fail to build (due to the unresolved symbol on linking).
Review of attachment 142973 [details] [review]: Hi Martin. Thanks for the patch. I can't imagine that there will be too much difficulty getting this (or something like it) included. I've made my comments below. These are just my opinions -- Matthias gets final say. :) ::: glib2.0-2.21.6.orig/configure.in @@ +228,3 @@ [don't use ELF visibility attributes])],, [enable_visibility=yes]) +AC_ARG_ENABLE(assert-messages, i'd personally prefer a feature test for the existence of the feature in glibc and unconditional enabling if available. this may be as simple as a glibc version check if you know for sure that all versions after a certain point have the feature. ::: glib2.0-2.21.6.orig/glib/gtestutils.c @@ +43,3 @@ +#ifdef USE_ABORT_MSG +extern char *__abort_msg; is this supported usage? would it maybe be better just to have our own variable for this purpose? then we could have the feature even if glibc doesn't. @@ +1306,3 @@ + if (__abort_msg) + /* free the old one */ + g_free (__abort_msg); under what circumstances might __abort_msg already be non-NULL? and in that case (since this is a libc variable), are you sure that whoever assigned to it before got their memory from g_malloc() and not the system malloc? i guess the only case to see this be non-NULL is if we are catching SIGABRT... @@ +1314,1 @@ g_free (s); it actually seems slightly silly that we ever free this in the first place... i appreciate that your patch tries not to change anything in the #ifndef case, though :)
Hi Martin. Any updates?
(In reply to comment #1) > i'd personally prefer a feature test for the existence of the feature in glibc > and unconditional enabling if available. Agreed, I'll change the patch accordingly. > this may be as simple as a glibc version check if you know for sure that all versions after a certain point have the feature. 2.11 has it, but it's still relatively new. I think directly testing for its presence is better. > ::: glib2.0-2.21.6.orig/glib/gtestutils.c > @@ +43,3 @@ > > +#ifdef USE_ABORT_MSG > +extern char *__abort_msg; > > is this supported usage? would it maybe be better just to have our own > variable for this purpose? Well, it is not "supported" in the sense of being a public API, it's just a debugging helper. Of course glib could define its own __glib_assert_msg, but then you'd have to try all possible variables when you examine a crash. From a developer's perspective it would be easier if there is just one variable to look at. > then we could have the feature even if glibc doesn't. We could define/use __glib_assert_msg _only_ if glibc doesn't? > > @@ +1306,3 @@ > + if (__abort_msg) > + /* free the old one */ > + g_free (__abort_msg); > > under what circumstances might __abort_msg already be non-NULL? If we caught a previously thrown SIGABRT, right. > and in that > case (since this is a libc variable), are you sure that whoever assigned to it > before got their memory from g_malloc() and not the system malloc? Ah, good catch. Will change to free(). > @@ +1314,1 @@ > g_free (s); > > it actually seems slightly silly that we ever free this in the first place... If the program really terminates on SIGABRT, it doesn't matter of course. It would be a memory leak in the corner case that a program throws and intercepts thousands of SIGABRT. Thanks for the review, will work on an updated patch now.
(In reply to comment #3) > > is this supported usage? would it maybe be better just to have our own > > variable for this purpose? > > Well, it is not "supported" in the sense of being a public API To clarify this: glibc defines it as extern char *__abort_msg; libc_hidden_proto (__abort_msg) in stdlib.h.
Created attachment 150221 [details] [review] git formatted patch Round two, with the following improvements: - Uses proper configure-time feature test instead of --with option - Consistently uses standard C alloc/free on __abort_msg - If libc does not support __abort_msg, use own variable __glib_assert_msg, so that this can be used on all platforms - Add a test script, integrated into "make check" (needs gdb; if gdb is not installed, test is skipped)
Review of attachment 150221 [details] [review]: This is great stuff. Please commit. I've filed https://bugzilla.redhat.com/show_bug.cgi?id=549735 for the abrt side of the story.
I've pushed Martin's patch as commit da66897
The __abort_msg variable that glibc introduced has a symbol version of GLIBC_PRIVATE, which means it is *not* part of libc.so's public interface, and future versions of glibc can remove this symbol. If libglib links against this symbol, libglib.so will declare a dependency on GLIBC_PRIVATE (as you can see using "objdump -x"), which indicates that using it is the wrong thing to do. With future versions of glibc you are in danger of getting an error like this: /lib/libglib-2.0.so.0: symbol __abort_msg, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference __abort_msg was not intended to be written by other libraries. It is intended to be read from a core dump by a debugger; it is only supposed to be written to by libc. See the commit where it was added: http://sourceware.org/git/?p=glibc.git;a=commit;h=48dcd0ba84c5a0fa08a0bd000b24af07d20dce44
(In reply to comment #8) > The __abort_msg variable that glibc introduced has a symbol version of > GLIBC_PRIVATE, which means it is *not* part of libc.so's public interface Right, see comment 3. That's why configure explicitly checks for it, and if it's not there, it uses its own symbol. > , and future versions of glibc can remove this symbol. In theory yes, but I don't see that actually happen soon. > If libglib links against this > symbol, libglib.so will declare a dependency on GLIBC_PRIVATE (as you can see > using "objdump -x"), which indicates that using it is the wrong thing to do. If this is a concern for upstream, I'm fine with sending a followup patch which adds an explicit --with-glibc-abort-msg or similar. Would that suit you better? > __abort_msg was not intended to be written by other libraries. It is intended > to be read from a core dump by a debugger Indeed, and this patch just makes glib work the same way, so that you don't have to check for two or more variables. This entire thing is just a feature for debugging.
(In reply to comment #8) > With future versions of glibc you are in danger of getting an error like this: > /lib/libglib-2.0.so.0: symbol __abort_msg, version GLIBC_PRIVATE not defined in > file libc.so.6 with link time reference That's where "weak" symbols attribute can become handy, iirc.
(In reply to comment #9) > > The __abort_msg variable that glibc introduced has a symbol version of > > GLIBC_PRIVATE, which means it is *not* part of libc.so's public interface > > Right, see comment 3. That's why configure explicitly checks for it, and if > it's not there, it uses its own symbol. That doesn't help, because the version of glibc you're building against might not be the same as the version you link against at runtime, which can be newer. > > , and future versions of glibc can remove this symbol. > > In theory yes, but I don't see that actually happen soon. GLIBC_PRIVATE symbols have been removed before. For example, _dl_tls_get_addr_soft was removed in glibc 2.8. _nss_nisplus_parse_grent, _nss_nisplus_parse_pwent and _nss_nisplus_parse_spent were removed in glibc 2.5. The following were all removed in glibc 2.4: - __libc_open; __libc_close; __libc_read; __libc_write; - __libc_lseek; __libc_fcntl; __libc_open64; __libc_lseek64; - __libc_fsync; __libc_msync; If __abort_msg were intended to be used this way, it would have a symbol version of GLIBC_2.11. Upstream glibc are *very* careful about preserving compatibility of these symbols. > > If libglib links against this symbol, libglib.so will declare a > > dependency on GLIBC_PRIVATE (as you can see using "objdump -x"), > > which indicates that using it is the wrong thing to do. > > > If this is a concern for upstream, I'm fine with sending a followup > patch which adds an explicit --with-glibc-abort-msg or > similar. Would that suit you better? Having an explicit --with-glibc-abort-msg in glib doesn't help because it's not possible to use this option in a correct way that preserves ABI compatibility with future glibc versions. A better way to do this would be to use dlopen()/dlsym() to find __abort_msg. Then you can handle the absence of the symbol. If you don't want to do dlopen()/dlsym() on the error path, you could do it at startup. When you write to __abort_msg you'll be using an unsupported interface but at least this will only be on the error path, not at startup. Or you could try asking upstream glibc to make this a public interface. __libc_stack_end was moved from GLIBC_PRIVATE to GLIBC_2.1 at the request of people using it in garbage collectors. > Indeed, and this patch just makes glib work the same way, so that you don't > have to check for two or more variables. This entire thing is just a feature > for debugging. I know, and it's not worth breaking ABI compatibility with future glibcs just because of this.
(In reply to comment #10) > > With future versions of glibc you are in danger of getting an error > > like this: /lib/libglib-2.0.so.0: symbol __abort_msg, version > > GLIBC_PRIVATE not defined in file libc.so.6 with link time reference > > That's where "weak" symbols attribute can become handy, iirc. Weak symbols don't help here. The dynamic linker will still require that __abort_msg is defined. Using weak symbols with dynamic linking is not very well supported. Also your library will still get a dependency on the symbol version GLIBC_PRIVATE. In principle glibc could remove or rename GLIBC_PRIVATE in the future.
Indeed, trying to build this in our build system gets me a dependency on GLIBC_PRIVATE, which our glibc package doesn't provide, so I end up with an uninstallable glib... Too bad that the configure check happily reports that __abort_msg is available when in practise, it isn't... We probably need to add a configure flag to force the use of __glib_abort_msg, for cases like this. Martin, can you look into that ? While you are at it, I was also seeing your abort msg test fail (only in make distcheck, curiously)
Seems this causing way more grief than it's worth it. I propose I just change the patch to stop using __abort_msg and always use __glib_abort_msg. It's less convenient for developers, but Apport and ABRT can easily be extended to look for both, and it should avoid all those linking/compatibility problems. > While you are at it, I was also seeing your abort msg test fail (only in make distcheck, curiously) Running now, to see whether I can reproduce. So a simple "make check" worked? Is it possible that it was a side effect of using glibc's __abort_msg symbol?
Created attachment 152295 [details] [review] always use our own internal assertion store As discussed, this changes it to always use __glib_assert_msg. It should also fix "make distcheck", the problem was that the expected output didn't work for the different source locations for out of tree builds (which happens with make distcheck). Thus I also re-enabled the test case. Unfortunately distcheck hangs indefinitely at /unix-streams/pipe-io-test (tried it twice), so I could only cancel that and then cd glib-2.23.3/_build/tests and run ../../tests/run-assert-msg-test.sh from there. That previously failed, and works now. Does that look/work ok for you now?
Comment on attachment 152295 [details] [review] always use our own internal assertion store thanks, please commit. thats essentially what I did in the Fedora rpm, anyway.
Committed.