After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 724590 - GSlice slab_stack corruption
GSlice slab_stack corruption
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gslice
2.39.x
Other Mac OS
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2014-02-17 23:51 UTC by John Ralls
Modified: 2018-05-24 16:18 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Dereference allocator->contention_counters before trying to index the array. (957 bytes, patch)
2014-02-17 23:55 UTC, John Ralls
rejected Details | Review
Test that the array reference is correctly interpreted in configure. (1.37 KB, patch)
2014-05-24 01:04 UTC, John Ralls
reviewed Details | Review
Revised patch to address Emmanuele's comments. (1.65 KB, patch)
2014-09-05 20:10 UTC, John Ralls
none Details | Review

Description John Ralls 2014-02-17 23:51:05 UTC
This is rather an odd corner case: When building on OS X 10.9 (Mavericks) with Xcode3's gcc-4.2 (that's the real gcc, not llvm-gcc), I encountered a segfault:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xa47fe801
0x00400c4f in slab_allocator_alloc_chunk (chunk_size=56) at gslice.c:1312
1312	  if (!allocator->slab_stack[ix] || !allocator->slab_stack[ix]->chunks)
  • #0 slab_allocator_alloc_chunk
    at gslice.c line 1312
  • #1 magazine_cache_pop_magazine
    at gslice.c line 732
  • #2 thread_memory_magazine1_reload
    at gslice.c line 807
  • #3 g_slice_alloc
    at gslice.c line 1005

which I eventually traced to gcc interpreting this line:
  g_mutex_lock_a (&allocator->magazine_mutex, &allocator->contention_counters[ix]);
(which is line gslice.c:719, in magazine_cache_pop_magazine) in an unfortunate way. In particular, it's the &allocator->contention_counters[ix] statement: Instead of dereferencing allocator->contention_counters, finding the ix offset, and returning the address thereof, it's getting the address of allocator->contention_counters and offsetting that by ix; when ix is 6, that happens to be allocator->slab_stack. Since g_mutex_lock_a does predictable things with a counter, having that counter actually be a pointer to somewhere in the heap has predictable results.
Comment 1 John Ralls 2014-02-17 23:55:30 UTC
Created attachment 269483 [details] [review]
Dereference allocator->contention_counters before trying to index the array.

Rather than trying to screw around with forcing nesting and casting and such, I just dereference allocator->contention_counters into a new variable and index that to get the address.
Comment 2 John Ralls 2014-04-16 00:28:27 UTC
Comment on attachment 269483 [details] [review]
Dereference allocator->contention_counters before trying to index the array.

Silence means assent. Pushed to master: c49ec3c8d
Comment 3 Ray Strode [halfline] 2014-04-16 12:12:51 UTC
this got reverted:

https://git.gnome.org/browse/glib/commit/?id=d93458d97d74ef12a08de20c1f27ea1cfa54447f

because it's completely equivalent to the code that was changed, so there must be more to the story. Normally a crash in an allocator function is a sign of heap corruption (caused by a double free or so somewhere else in the code).  Can you:

1) run your program with G_SLICE=always-malloc to disable the slice allocator and see if the crash still hapens with your program

2) run your program through valgrind (assuming valgrind works on OS X) to have it report any double free's or other memory errors in your program

It's also possible that this is a compiler bug, but pretty unlikely, given how common an idiom &foo[i] is.
Comment 4 John Ralls 2014-04-17 18:22:54 UTC
Well, for one thing it's not *my* program, it's g-ir-compiler:
  GICOMP   gir/DBus-1.0.gir
  GICOMP   gir/DBusGLib-1.0.gir
/bin/sh: line 1: 98873 Segmentation fault: 11  env PATH=".libs:/Users/john/Development/gtk-build/gtk-unstable-10.5-i386/inst/bin:/Users/john/.local/bin:/Volumes/SSD/XCode3/usr/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin" ./g-ir-compiler --includedir=. --includedir=./gir --includedir=. --includedir=. --includedir=. gir/DBusGLib-1.0.gir -o gir/DBusGLib-1.0.typelib
make[2]: *** [gir/DBusGLib-1.0.typelib] Error 139
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x838fe800
0x00350a1f in slab_allocator_alloc_chunk () at gslice.c:1296
1296	  if (!allocator->slab_stack[ix] || !allocator->slab_stack[ix]->chunks)
(gdb) bt
  • #0 slab_allocator_alloc_chunk
    at gslice.c line 1296
  • #1 g_slice_alloc
    at gslice.c line 726
  • #2 g_hash_table_new_full
    at ghash.c line 656
  • #3 _g_ir_node_new
    at girnode.c line 198
  • #4 start_function
    at girparser.c line 871
  • #5 start_element_handler
    at girparser.c line 2849
  • #6 emit_start_element
    at gmarkup.c line 1048
  • #7 g_markup_parse_context_parse
    at gmarkup.c line 1143
  • #8 _g_ir_parser_parse_string
    at girparser.c line 3495
  • #9 parse_include
    at girparser.c line 2653
  • #10 start_element_handler
    at girparser.c line 2810
  • #11 emit_start_element
    at gmarkup.c line 1048
  • #12 g_markup_parse_context_parse
    at gmarkup.c line 1143
  • #13 _g_ir_parser_parse_string
    at girparser.c line 3495
  • #14 parse_include
    at girparser.c line 2653
  • #15 start_element_handler
    at girparser.c line 2810
  • #16 emit_start_element
    at gmarkup.c line 1048
  • #17 g_markup_parse_context_parse
    at gmarkup.c line 1143
  • #18 _g_ir_parser_parse_string
    at girparser.c line 3495
  • #19 _g_ir_parser_parse_file
    at girparser.c line 3581
  • #20 main
    at compiler.c line 183

For another, I'm pretty confident that it's not a stack-smash, and I know from stepping through in the debugger that in this case gslice is using the heap. 

I was mistaken, though, about which compiler was to blame. It is in fact llvm-gcc-4.2, not gcc-4.2.

Here's the problem, when glib is built with llvm-gcc-4.2:
Breakpoint 1, magazine_cache_pop_magazine (ix=6, countp=0xabac3c) at gslice.c:718
718	  g_mutex_lock_a (&allocator->magazine_mutex, &allocator->contention_counters[ix]);
(gdb) p ix
$1 = 6
(gdb) p &allocater->contention_counters[ix]
No symbol "allocater" in current context.
(gdb) p &allocator->contention_counters[ix]
$2 = (guint *) 0x64f478
(gdb) p allocator->contention_counters
$3 = (guint *) 0x64f460
(gdb) p &allocator->contention_counters
$5 = (guint **) 0x4e8f0c
(gdb) s
g_mutex_lock_a (mutex=0x4e8f00, contention_counter=0x4e8f24) at gslice.c:478
478	  gboolean contention = FALSE;

This pretty clearly shows that the compiler is applying the '&' operator *before* the '[]' operator, even though the '[]' operator has higher precedence.

Here's the same bit, compiled with gcc-4.2:
Breakpoint 1, magazine_cache_pop_magazine (ix=6, countp=0xafde3c) at gslice.c:718
718	  g_mutex_lock_a (&allocator->magazine_mutex, &allocator->contention_counters[ix]);
(gdb) p allocator->contention_counters
$1 = (guint *) 0x67ae60
(gdb) p &allocator->contention_counters
$2 = (guint **) 0x475d4c
(gdb) s
g_mutex_lock_a (mutex=0x475d40, contention_counter=0x67ae78) at gslice.c:478
478	  gboolean contention = FALSE;

So while breaking out
  guint* counters = allocator->contention_counters;
*should be* equivalent, in this case it isn't. Unfortunately no amount of fiddling with parentheses convinces llvm-gcc to interpret it differently; only creating the local counters works.

If you don't want to work around the compiler bug, configure should test for it.
Comment 5 Jasper St. Pierre (not reading bugmail) 2014-04-20 22:38:08 UTC
llvm-gcc is known to be terribly buggy. Can't you use clang instead?
Comment 6 John Ralls 2014-04-21 01:02:16 UTC
(In reply to comment #5)
> llvm-gcc is known to be terribly buggy. Can't you use clang instead?

I take it that's a vote for a configure check to block building GLib on compilers with this bug rather than work around it.
Comment 7 Jasper St. Pierre (not reading bugmail) 2014-04-21 01:03:07 UTC
Yes.
Comment 8 John Ralls 2014-05-24 01:04:34 UTC
Created attachment 277093 [details] [review]
Test that the array reference is correctly interpreted in configure.

Getting the test to fail was a bit more convoluted than I expected, though I suppose I should have known that the compiler would handle the more obvious constructs correctly.

The test fails with i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5646) (LLVM build 2207.5) and passes with i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659) and Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn).
Comment 9 Emmanuele Bassi (:ebassi) 2014-09-03 12:51:56 UTC
Review of attachment 277093 [details] [review]:

the configure-time check is definitely better. some minor remarks.

::: configure.ac
@@ +170,3 @@
 AC_SUBST(LIB_EXE_MACHINE_FLAG)
 
+AC_MSG_CHECKING([For waldo = &foo->bar; &foo->bar[[baz]] == waldo[[baz]] ])

this should probably be something more like:

  AC_MSG_CHECKING([For reference access to array element])

@@ +172,3 @@
+AC_MSG_CHECKING([For waldo = &foo->bar; &foo->bar[[baz]] == waldo[[baz]] ])
+
+llvm_gcc_bug="yes"

we're testing for a specific compiler bug, not for a specific compiler.

better to use the bug id for the variable, so we can easily have a reference:

  bug_724590="yes"

also, leaving a comment above with a URL to the bug will reduce the chances of this check getting dropped without proper QA.

if we wanted to be on the safe side, we could even call the variable glib_cv_bug_724590, cache it, and let cross-compilers or auto-builders override it when building GLib. not strictly necessary, tho.
Comment 10 John Ralls 2014-09-05 20:10:49 UTC
Created attachment 285521 [details] [review]
Revised patch to address Emmanuele's comments.

All good comments, thanks. Revised patch attached. I'm particularly concerned that I got the AC_CACHE_CHECK part right, it's the first time I've tried that.
Comment 11 GNOME Infrastructure Team 2018-05-24 16:18:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/831.