GNOME Bugzilla – Bug 331367
gslice requires more POSIX-like semantics for GPrivate destructors
Last modified: 2006-02-20 03:30:08 UTC
Unfortunately I wasn't able to reproduce this problem with any small test program. I see it in evolution-data-server which is a rather complicated application... The problem is that eds's main() causes various GLib functions that allocate memory through gslice to be called before g_thread_init(NULL) finally gets called through gnome_program_init(). Before g_thread_init() has been called thread_memory_from_self() in gslice.c will use the non-threaded fallback implementations of g_private_get() and g_private_set(). After g_thread_init() has been called, the real g_private_get() and _set() implementations are used, and a different ThreadMemory struct is then taken into use for the main thread. This apparently mixes up gslice's memory management. (It should probably be noted that on Win32 there obviously is no posix_memalign(), and not memalign() or valloc() either, so gslice uses the "simplistic non-freeing page allocator". Reproducing this problem on Unix might require manually forcing it to be used. Also to be noted is that if I set G_SLICE=always-malloc the problem doesn't appear.) The below log excerpt has debugging output I added to gslice.c and gthread-win32.c and shows that in the same thread (the main thread), thread_memory_from_self() returns different values before and after g_thread_init(). 00000748:00000D88 is the process id and thread id. 00000748:00000D88: thread_memory_from_self: tmem=000366A0 evolution-data-server-Message: Starting server 00000748:00000D88: g_thread_impl_init 00000748:00000D88: g_thread_self_win32_impl:401 00000748:00000D88: _g_slice_thread_init_nomessage: private_thread_memory=00000001 00000748:00000D88: thread_memory_from_self: tmem=028A4C90 and then later eds eventually crashes in magazine_chain_pop_head():
+ Trace 66223
So, what to do? For evolution-data-server, the obvious solution is to call g_thread_init(NULL) right at the start of main(). But it would be nice if gslice would detect this problem and call g_warning(), or even g_error(). Or maybe gslice should be fixed to cope with the situation. A fix might be to store the ThreadMemory pointer for the main thread separately and not in a thread-private slot. thread_memory_from_self() should be able to recognize when being called in the main thread, and whether g_thread_init() has been called or not. I will try to come up with a patch.
Hmm, doing a quick test with a gslice.c hacked to use the same ThreadMemory for the main thread both before and after g_thread_init() didn't help, unfortunately. Maybe there isn't any problem in gslice after all, but some "normal" heap corruption somewhere else. Sigh, I will have to spend more time on this. Unless TJ immediately gets a revelation that what I have tried to explain above indeed could be a problem in gslice, feel free to ignore this bug until I have more information... Setting to NEEDINFO.
Ha! I think I realize what the problem is. It's really quite simple: _g_slice_thread_init_nomessage() registers private_thread_memory_cleanup() as the destructor for the GPrivate private_thread_memory. However, nothing guarantees that private_thread_memory_cleanup() will be the last gslice function called in the thread. There might be other GPrivates registered elsewhere in the process that also call functions that eventually call for instance g_slice_free1(). Boom. In my case, it's ORBit2's giop.c that registers a destructor giop_thread_free() for an own GPrivate. giop_thread_free() calls g_cond_free() which on Win32 is implemented as calling g_ptr_array_free(), which calls g_slice_free1(). When the crash happens, private_thread_memory_cleanup() has already been called for that thread, and then giop_thread_free() causes g_slice_free1() to be called. The specification for pthread_key_create() (http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_key_create.html) says that "The order of destructor calls is unspecified if more than one destructor exists for a thread when it exits." As it happens, the Win32 GThreads implementation calls the destructors in the same order as the GPrivates were allocated, so giop_thread_free() will be called after private_thread_memory_cleanup(). But even reversing the order in g_thread_exit_win32_impl() doesn't help. The problem is then g_thread_cleanup() in gthread.c. That function is registered as a GPrivate destructor in g_thread_init_glib() before _g_slice_thread_init_nomessage() is called, so if private_thread_memory_cleanup() is called before g_thread_cleanup() we have an even more obvious problem involving just GLib functions. I have no idea how to fix this problem. I will try to write a minimal test program that demonstrates it on both Linux and Win32. At least I assume it is a cross-platform problem.
Would it work to make g_slice_alloc and g_slice_free1 treat tmem == NULL like acat == 2 ?
This is after all a Win32-specific bug. The code in g_thread_exit_win32_impl() that calls the GPrivate destructors doesn't obey the implied requirement to have the same semantics as in POSIX pthreads: It should set each GPrivate value to NULL before calling its destructor (with the original non-NULL value as argument). Also, it should loop as long as some of the GPrivate values are non-NULL (in case some destructor causes a GPrivate value (its own or another) to be set back to non-NULL). With these changes, I no longer get any crash in evolution-data-server. Fixed in HEAD and glib-2-8: 2006-02-20 Tor Lillqvist <tml@novell.com> * gthread-win32.c (g_thread_exit_win32_impl): Make the implementation of GPrivate behave more closely as in POSIX threads: The value associacted with a GPrivate must be set to NULL before calling the destructor. (The destructor gets the original value as argument.) A destructor might re-associate a non-NULL value with some GPrivate. To deal with this, if after all destructors have been called, there still are some non-NULL values, the process is repeated. (#331367)