After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 594872 - Support storing assertion messages into core dump
Support storing assertion messages into core dump
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.21.x
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2009-09-11 11:22 UTC by Martin Pitt
Modified: 2010-01-26 13:31 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
store assert message into glibc (2.27 KB, patch)
2009-09-11 11:22 UTC, Martin Pitt
needs-work Details | Review
git formatted patch (6.90 KB, patch)
2009-12-22 10:18 UTC, Martin Pitt
committed Details | Review
always use our own internal assertion store (5.54 KB, patch)
2010-01-26 10:24 UTC, Martin Pitt
accepted-commit_now Details | Review

Description Martin Pitt 2009-09-11 11:22:45 UTC
Created attachment 142973 [details] [review]
store assert message into glibc

Hello!

In the current Ubuntu release we extended the Apport crash processing system to
also catch assertion messages (which result in SIGABRT). Those are only really
useful if you can get the actual message into the bug reports, which is usually
just printed to stderr.

For that we got a patch into glibc
(http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=48dcd0ba) which stores
the message of assert() into glibc's __abort_msg global variable (a char*).
This works fine for programs which actually use assert(), like pulseaudio or
gnome-do (like bug https://launchpad.net/bugs/421696), but not yet for GNOMEish
programs.

I created a patch for glib's assertion functions to do the same now. With this,
the assertion message is kept and can be retrieved from the core dump easily:

$ cat test.c 
#include <glib.h>

int main()
{
    g_assert(1 < 0);
    return 0;
}

$ ./test
**ERROR:test.c:5:main: assertion failed: (1 < 0)
Aborted (Core dumped)

$ gdb --batch --ex 'print (char*) __abort_msg' ./test core
[...]
$1 = 0x93bf028 "ERROR:test.c:5:main: assertion failed: (1 < 0)"


Since the glibc patch isn't in a released glibc version yet, I added a
configure switch --enable-assert-messages, since otherwise glib will fail to
build (due to the unresolved symbol on linking).
Comment 1 Allison Karlitskaya (desrt) 2009-12-01 15:24:44 UTC
Review of attachment 142973 [details] [review]:

Hi Martin.

Thanks for the patch.  I can't imagine that there will be too much difficulty getting this (or something like it) included.  I've made my comments below.  These are just my opinions -- Matthias gets final say. :)

::: glib2.0-2.21.6.orig/configure.in
@@ +228,3 @@
                               [don't use ELF visibility attributes])],,
               [enable_visibility=yes])
+AC_ARG_ENABLE(assert-messages,

i'd personally prefer a feature test for the existence of the feature in glibc and unconditional enabling if available.  this may be as simple as a glibc version check if you know for sure that all versions after a certain point have the feature.

::: glib2.0-2.21.6.orig/glib/gtestutils.c
@@ +43,3 @@
 
+#ifdef USE_ABORT_MSG
+extern char *__abort_msg;

is this supported usage?  would it maybe be better just to have our own variable for this purpose?

then we could have the feature even if glibc doesn't.

@@ +1306,3 @@
+  if (__abort_msg)
+      /* free the old one */
+      g_free (__abort_msg);

under what circumstances might __abort_msg already be non-NULL?  and in that case (since this is a libc variable), are you sure that whoever assigned to it before got their memory from g_malloc() and not the system malloc?

i guess the only case to see this be non-NULL is if we are catching SIGABRT...

@@ +1314,1 @@
   g_free (s);

it actually seems slightly silly that we ever free this in the first place...   i appreciate that your patch tries not to change anything in the #ifndef case, though :)
Comment 2 Allison Karlitskaya (desrt) 2009-12-21 02:42:59 UTC
Hi Martin.  Any updates?
Comment 3 Martin Pitt 2009-12-22 08:32:12 UTC
(In reply to comment #1)

> i'd personally prefer a feature test for the existence of the feature in glibc
> and unconditional enabling if available.

Agreed, I'll change the patch accordingly.

> this may be as simple as a glibc version check if you know for sure that all versions after a certain point have the feature.

2.11 has it, but it's still relatively new. I think directly testing for its presence is better.
 
> ::: glib2.0-2.21.6.orig/glib/gtestutils.c
> @@ +43,3 @@
> 
> +#ifdef USE_ABORT_MSG
> +extern char *__abort_msg;
> 
> is this supported usage?  would it maybe be better just to have our own
> variable for this purpose?

Well, it is not "supported" in the sense of being a public API, it's just a debugging helper. Of course glib could define its own __glib_assert_msg, but then you'd have to try all possible variables when you examine a crash. From a developer's perspective it would be easier if there is just one variable to look at.

> then we could have the feature even if glibc doesn't.

We could define/use __glib_assert_msg _only_ if glibc doesn't?
> 
> @@ +1306,3 @@
> +  if (__abort_msg)
> +      /* free the old one */
> +      g_free (__abort_msg);
> 
> under what circumstances might __abort_msg already be non-NULL?

If we caught a previously thrown SIGABRT, right.

> and in that
> case (since this is a libc variable), are you sure that whoever assigned to it
> before got their memory from g_malloc() and not the system malloc?

Ah, good catch. Will change to free().

> @@ +1314,1 @@
>    g_free (s);
> 
> it actually seems slightly silly that we ever free this in the first place...  

If the program really terminates on SIGABRT, it doesn't matter of course. It would be a memory leak in the corner case that a program throws and intercepts thousands of SIGABRT.

Thanks for the review, will work on an updated patch now.
Comment 4 Martin Pitt 2009-12-22 09:47:50 UTC
(In reply to comment #3)
> > is this supported usage?  would it maybe be better just to have our own
> > variable for this purpose?
> 
> Well, it is not "supported" in the sense of being a public API

To clarify this: glibc defines it as

  extern char *__abort_msg;
  libc_hidden_proto (__abort_msg)

in stdlib.h.
Comment 5 Martin Pitt 2009-12-22 10:18:40 UTC
Created attachment 150221 [details] [review]
git formatted patch

Round two, with the following improvements:

- Uses proper configure-time feature test instead of --with option
- Consistently uses standard C alloc/free on __abort_msg
- If libc does not support __abort_msg, use own variable __glib_assert_msg, so that this can be used on all platforms
- Add a test script, integrated into "make check" (needs gdb; if gdb is not installed, test is skipped)
Comment 6 Matthias Clasen 2009-12-22 14:32:18 UTC
Review of attachment 150221 [details] [review]:

This is great stuff. Please commit.
I've filed https://bugzilla.redhat.com/show_bug.cgi?id=549735 for the abrt side of the story.
Comment 7 Chris Coulson 2009-12-23 15:58:26 UTC
I've pushed Martin's patch as commit da66897
Comment 8 Mark Seaborn 2009-12-29 12:39:14 UTC
The __abort_msg variable that glibc introduced has a symbol version of GLIBC_PRIVATE, which means it is *not* part of libc.so's public interface, and future versions of glibc can remove this symbol.  If libglib links against this symbol, libglib.so will declare a dependency on GLIBC_PRIVATE (as you can see using "objdump -x"), which indicates that using it is the wrong thing to do.

With future versions of glibc you are in danger of getting an error like this:
/lib/libglib-2.0.so.0: symbol __abort_msg, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

__abort_msg was not intended to be written by other libraries.  It is intended to be read from a core dump by a debugger; it is only supposed to be written to by libc.  See the commit where it was added:
http://sourceware.org/git/?p=glibc.git;a=commit;h=48dcd0ba84c5a0fa08a0bd000b24af07d20dce44
Comment 9 Martin Pitt 2009-12-29 16:53:54 UTC
(In reply to comment #8)
> The __abort_msg variable that glibc introduced has a symbol version of
> GLIBC_PRIVATE, which means it is *not* part of libc.so's public interface

Right, see comment 3. That's why configure explicitly checks for it, and if it's not there, it uses its own symbol.

> , and future versions of glibc can remove this symbol.

In theory yes, but I don't see that actually happen soon.

> If libglib links against this
> symbol, libglib.so will declare a dependency on GLIBC_PRIVATE (as you can see
> using "objdump -x"), which indicates that using it is the wrong thing to do.

If this is a concern for upstream, I'm fine with sending a followup patch which adds an explicit --with-glibc-abort-msg or similar. Would that suit you better?

> __abort_msg was not intended to be written by other libraries.  It is intended
> to be read from a core dump by a debugger

Indeed, and this patch just makes glib work the same way, so that you don't have to check for two or more variables. This entire thing is just a feature for debugging.
Comment 10 Marc-Andre Lureau 2009-12-29 19:41:35 UTC
(In reply to comment #8)

> With future versions of glibc you are in danger of getting an error like this:
> /lib/libglib-2.0.so.0: symbol __abort_msg, version GLIBC_PRIVATE not defined in
> file libc.so.6 with link time reference

That's where "weak" symbols attribute can become handy, iirc.
Comment 11 Mark Seaborn 2009-12-30 11:46:33 UTC
(In reply to comment #9)
> > The __abort_msg variable that glibc introduced has a symbol version of
> > GLIBC_PRIVATE, which means it is *not* part of libc.so's public interface
> 
> Right, see comment 3. That's why configure explicitly checks for it, and if
> it's not there, it uses its own symbol.

That doesn't help, because the version of glibc you're building
against might not be the same as the version you link against at
runtime, which can be newer.

> > , and future versions of glibc can remove this symbol.
> 
> In theory yes, but I don't see that actually happen soon.

GLIBC_PRIVATE symbols have been removed before.

For example, _dl_tls_get_addr_soft was removed in glibc 2.8.

_nss_nisplus_parse_grent, _nss_nisplus_parse_pwent and
_nss_nisplus_parse_spent were removed in glibc 2.5.

The following were all removed in glibc 2.4:
-    __libc_open; __libc_close; __libc_read; __libc_write;
-    __libc_lseek; __libc_fcntl; __libc_open64; __libc_lseek64;
-    __libc_fsync; __libc_msync;

If __abort_msg were intended to be used this way, it would have a
symbol version of GLIBC_2.11.  Upstream glibc are *very* careful about
preserving compatibility of these symbols.

> > If libglib links against this symbol, libglib.so will declare a
> > dependency on GLIBC_PRIVATE (as you can see using "objdump -x"),
> > which indicates that using it is the wrong thing to do.  >
>
> If this is a concern for upstream, I'm fine with sending a followup
> patch which adds an explicit --with-glibc-abort-msg or
> similar. Would that suit you better?

Having an explicit --with-glibc-abort-msg in glib doesn't help because
it's not possible to use this option in a correct way that preserves
ABI compatibility with future glibc versions.

A better way to do this would be to use dlopen()/dlsym() to find
__abort_msg.  Then you can handle the absence of the symbol.  If you
don't want to do dlopen()/dlsym() on the error path, you could do it
at startup.  When you write to __abort_msg you'll be using an
unsupported interface but at least this will only be on the error
path, not at startup.

Or you could try asking upstream glibc to make this a public
interface.  __libc_stack_end was moved from GLIBC_PRIVATE to GLIBC_2.1
at the request of people using it in garbage collectors.

> Indeed, and this patch just makes glib work the same way, so that you don't
> have to check for two or more variables. This entire thing is just a feature
> for debugging.

I know, and it's not worth breaking ABI compatibility with future
glibcs just because of this.
Comment 12 Mark Seaborn 2009-12-30 12:04:15 UTC
(In reply to comment #10)
> > With future versions of glibc you are in danger of getting an error
> > like this: /lib/libglib-2.0.so.0: symbol __abort_msg, version
> > GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
> 
> That's where "weak" symbols attribute can become handy, iirc.

Weak symbols don't help here.  The dynamic linker will still require
that __abort_msg is defined.  Using weak symbols with dynamic linking
is not very well supported.  Also your library will still get a
dependency on the symbol version GLIBC_PRIVATE.  In principle glibc
could remove or rename GLIBC_PRIVATE in the future.
Comment 13 Matthias Clasen 2010-01-25 23:23:23 UTC
Indeed, trying to build this in our build system gets me a dependency on GLIBC_PRIVATE, which our glibc package doesn't provide, so I end up with an uninstallable glib... 

Too bad that the configure check happily reports that __abort_msg is available when in practise, it isn't...
 
We probably need to add a configure flag to force the use of __glib_abort_msg, for cases like this. Martin, can you look into that ? 

While you are at it, I was also seeing your abort msg test fail (only in make distcheck, curiously)
Comment 14 Martin Pitt 2010-01-26 09:35:18 UTC
Seems this causing way more grief than it's worth it. I propose I just change the patch to stop using __abort_msg and always use __glib_abort_msg. It's less convenient for developers, but Apport and ABRT can easily be extended to look for both, and it should avoid all those linking/compatibility problems.

> While you are at it, I was also seeing your abort msg test fail (only in make
distcheck, curiously)

Running now, to see whether I can reproduce. So a simple "make check" worked? Is it possible that it was a side effect of using glibc's __abort_msg symbol?
Comment 15 Martin Pitt 2010-01-26 10:24:11 UTC
Created attachment 152295 [details] [review]
always use our own internal assertion store

As discussed, this changes it to always use __glib_assert_msg. It should also fix "make distcheck", the problem was that the expected output didn't work for the different source locations for out of tree builds (which happens with make distcheck). Thus I also re-enabled the test case.

Unfortunately distcheck hangs indefinitely at /unix-streams/pipe-io-test (tried it twice), so I could only cancel that and then cd glib-2.23.3/_build/tests and run ../../tests/run-assert-msg-test.sh from there. That previously failed, and works now.

Does that look/work ok for you now?
Comment 16 Matthias Clasen 2010-01-26 13:26:49 UTC
Comment on attachment 152295 [details] [review]
always use our own internal assertion store

thanks, please commit. thats essentially what I did in the Fedora rpm, anyway.
Comment 17 Martin Pitt 2010-01-26 13:31:52 UTC
Committed.