GNOME Bugzilla – Bug 724590
GSlice slab_stack corruption
Last modified: 2018-05-24 16:18:14 UTC
This is rather an odd corner case: When building on OS X 10.9 (Mavericks) with Xcode3's gcc-4.2 (that's the real gcc, not llvm-gcc), I encountered a segfault: Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0xa47fe801 0x00400c4f in slab_allocator_alloc_chunk (chunk_size=56) at gslice.c:1312 1312 if (!allocator->slab_stack[ix] || !allocator->slab_stack[ix]->chunks)
+ Trace 233183
which I eventually traced to gcc interpreting this line: g_mutex_lock_a (&allocator->magazine_mutex, &allocator->contention_counters[ix]); (which is line gslice.c:719, in magazine_cache_pop_magazine) in an unfortunate way. In particular, it's the &allocator->contention_counters[ix] statement: Instead of dereferencing allocator->contention_counters, finding the ix offset, and returning the address thereof, it's getting the address of allocator->contention_counters and offsetting that by ix; when ix is 6, that happens to be allocator->slab_stack. Since g_mutex_lock_a does predictable things with a counter, having that counter actually be a pointer to somewhere in the heap has predictable results.
Created attachment 269483 [details] [review] Dereference allocator->contention_counters before trying to index the array. Rather than trying to screw around with forcing nesting and casting and such, I just dereference allocator->contention_counters into a new variable and index that to get the address.
Comment on attachment 269483 [details] [review] Dereference allocator->contention_counters before trying to index the array. Silence means assent. Pushed to master: c49ec3c8d
this got reverted: https://git.gnome.org/browse/glib/commit/?id=d93458d97d74ef12a08de20c1f27ea1cfa54447f because it's completely equivalent to the code that was changed, so there must be more to the story. Normally a crash in an allocator function is a sign of heap corruption (caused by a double free or so somewhere else in the code). Can you: 1) run your program with G_SLICE=always-malloc to disable the slice allocator and see if the crash still hapens with your program 2) run your program through valgrind (assuming valgrind works on OS X) to have it report any double free's or other memory errors in your program It's also possible that this is a compiler bug, but pretty unlikely, given how common an idiom &foo[i] is.
Well, for one thing it's not *my* program, it's g-ir-compiler: GICOMP gir/DBus-1.0.gir GICOMP gir/DBusGLib-1.0.gir /bin/sh: line 1: 98873 Segmentation fault: 11 env PATH=".libs:/Users/john/Development/gtk-build/gtk-unstable-10.5-i386/inst/bin:/Users/john/.local/bin:/Volumes/SSD/XCode3/usr/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin" ./g-ir-compiler --includedir=. --includedir=./gir --includedir=. --includedir=. --includedir=. gir/DBusGLib-1.0.gir -o gir/DBusGLib-1.0.typelib make[2]: *** [gir/DBusGLib-1.0.typelib] Error 139 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x838fe800 0x00350a1f in slab_allocator_alloc_chunk () at gslice.c:1296 1296 if (!allocator->slab_stack[ix] || !allocator->slab_stack[ix]->chunks) (gdb) bt
+ Trace 233484
For another, I'm pretty confident that it's not a stack-smash, and I know from stepping through in the debugger that in this case gslice is using the heap. I was mistaken, though, about which compiler was to blame. It is in fact llvm-gcc-4.2, not gcc-4.2. Here's the problem, when glib is built with llvm-gcc-4.2: Breakpoint 1, magazine_cache_pop_magazine (ix=6, countp=0xabac3c) at gslice.c:718 718 g_mutex_lock_a (&allocator->magazine_mutex, &allocator->contention_counters[ix]); (gdb) p ix $1 = 6 (gdb) p &allocater->contention_counters[ix] No symbol "allocater" in current context. (gdb) p &allocator->contention_counters[ix] $2 = (guint *) 0x64f478 (gdb) p allocator->contention_counters $3 = (guint *) 0x64f460 (gdb) p &allocator->contention_counters $5 = (guint **) 0x4e8f0c (gdb) s g_mutex_lock_a (mutex=0x4e8f00, contention_counter=0x4e8f24) at gslice.c:478 478 gboolean contention = FALSE; This pretty clearly shows that the compiler is applying the '&' operator *before* the '[]' operator, even though the '[]' operator has higher precedence. Here's the same bit, compiled with gcc-4.2: Breakpoint 1, magazine_cache_pop_magazine (ix=6, countp=0xafde3c) at gslice.c:718 718 g_mutex_lock_a (&allocator->magazine_mutex, &allocator->contention_counters[ix]); (gdb) p allocator->contention_counters $1 = (guint *) 0x67ae60 (gdb) p &allocator->contention_counters $2 = (guint **) 0x475d4c (gdb) s g_mutex_lock_a (mutex=0x475d40, contention_counter=0x67ae78) at gslice.c:478 478 gboolean contention = FALSE; So while breaking out guint* counters = allocator->contention_counters; *should be* equivalent, in this case it isn't. Unfortunately no amount of fiddling with parentheses convinces llvm-gcc to interpret it differently; only creating the local counters works. If you don't want to work around the compiler bug, configure should test for it.
llvm-gcc is known to be terribly buggy. Can't you use clang instead?
(In reply to comment #5) > llvm-gcc is known to be terribly buggy. Can't you use clang instead? I take it that's a vote for a configure check to block building GLib on compilers with this bug rather than work around it.
Yes.
Created attachment 277093 [details] [review] Test that the array reference is correctly interpreted in configure. Getting the test to fail was a bit more convoluted than I expected, though I suppose I should have known that the compiler would handle the more obvious constructs correctly. The test fails with i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5646) (LLVM build 2207.5) and passes with i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659) and Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn).
Review of attachment 277093 [details] [review]: the configure-time check is definitely better. some minor remarks. ::: configure.ac @@ +170,3 @@ AC_SUBST(LIB_EXE_MACHINE_FLAG) +AC_MSG_CHECKING([For waldo = &foo->bar; &foo->bar[[baz]] == waldo[[baz]] ]) this should probably be something more like: AC_MSG_CHECKING([For reference access to array element]) @@ +172,3 @@ +AC_MSG_CHECKING([For waldo = &foo->bar; &foo->bar[[baz]] == waldo[[baz]] ]) + +llvm_gcc_bug="yes" we're testing for a specific compiler bug, not for a specific compiler. better to use the bug id for the variable, so we can easily have a reference: bug_724590="yes" also, leaving a comment above with a URL to the bug will reduce the chances of this check getting dropped without proper QA. if we wanted to be on the safe side, we could even call the variable glib_cv_bug_724590, cache it, and let cross-compilers or auto-builders override it when building GLib. not strictly necessary, tho.
Created attachment 285521 [details] [review] Revised patch to address Emmanuele's comments. All good comments, thanks. Revised patch attached. I'm particularly concerned that I got the AC_CACHE_CHECK part right, it's the first time I've tried that.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/831.