Bug 331367 – gslice requires more POSIX-like semantics for GPrivate destructors

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 331367 - gslice requires more POSIX-like semantics for GPrivate destructors


Summary:	gslice requires more POSIX-like semantics for GPrivate destructors


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	win32
Version:	2.8.x
Hardware:	Other Windows

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtk-win32 maintainers
QA Contact:	gtk-win32 maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-02-16 04:19 UTC by Tor Lillqvist
Modified:	2006-02-20 03:30 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Tor Lillqvist 2006-02-16 04:19:30 UTC

Unfortunately I wasn't able to reproduce this problem with any small test program. I see it in evolution-data-server which is a rather complicated application...

The problem is that eds's main() causes various GLib functions that allocate memory through gslice to be called before g_thread_init(NULL) finally gets called through gnome_program_init(). 

Before g_thread_init() has been called thread_memory_from_self() in gslice.c will use the non-threaded fallback implementations of g_private_get() and g_private_set().

After g_thread_init() has been called, the real g_private_get() and _set() implementations are used, and a different ThreadMemory struct is then taken into use for the main thread. This apparently mixes up gslice's memory management.

(It should probably be noted that on Win32 there obviously is no posix_memalign(), and not memalign() or valloc() either, so gslice uses the "simplistic non-freeing page allocator". Reproducing this problem on Unix might require manually forcing it to be used. Also to be noted is that if I set G_SLICE=always-malloc the problem doesn't appear.)

The below log excerpt has debugging output I added to gslice.c and gthread-win32.c and shows that in the same thread (the main thread), thread_memory_from_self() returns different values before and after g_thread_init(). 00000748:00000D88 is the process id and thread id.

00000748:00000D88: thread_memory_from_self: tmem=000366A0
evolution-data-server-Message: Starting server
00000748:00000D88: g_thread_impl_init
00000748:00000D88: g_thread_self_win32_impl:401
00000748:00000D88: _g_slice_thread_init_nomessage: private_thread_memory=00000001
00000748:00000D88: thread_memory_from_self: tmem=028A4C90


and then later eds eventually crashes in magazine_chain_pop_head():

+ Trace 66223

#0 magazine_chain_pop_head
at gslice.c line 425
#1 magazine_chain_prepare_fields
at gslice.c line 507
#2 magazine_cache_push_magazine
at gslice.c line 581
#3 thread_memory_magazine2_unload
at gslice.c line 700
#4 g_slice_free1
at gslice.c line 809
#5 g_ptr_array_free
at garray.c line 400
#6 g_cond_free_win32_impl
at gthread-win32.c line 325
#7 giop_thread_free
at giop.c line 354
#8 g_thread_exit_win32_impl
at gthread-win32.c line 435
#9 g_thread_proxy
at gthread-win32.c line 475


So, what to do? For evolution-data-server, the obvious solution is to call g_thread_init(NULL) right at the start of main().

But it would be nice if gslice would detect this problem and call g_warning(), or even g_error().

Or maybe gslice should be fixed to cope with the situation. A fix might be to store the ThreadMemory pointer for the main thread separately and not in a thread-private slot. thread_memory_from_self() should be able to recognize when being called in the main thread, and whether g_thread_init() has been called or not. I will try to come up with a patch.

Comment 1 Tor Lillqvist 2006-02-16 11:53:05 UTC

Hmm, doing a quick test with a gslice.c hacked to use the same ThreadMemory for the main thread both before and after g_thread_init() didn't help, unfortunately. Maybe there isn't any problem in gslice after all, but some "normal" heap corruption somewhere else. Sigh, I will have to spend more time on this. Unless TJ immediately gets a revelation that what I have tried to explain above indeed could be a problem in gslice, feel free to ignore this bug until I have more information... Setting to NEEDINFO.

Comment 2 Tor Lillqvist 2006-02-20 01:21:17 UTC

Ha! I think I realize what the problem is. It's really quite simple: _g_slice_thread_init_nomessage() registers private_thread_memory_cleanup() as the destructor for the GPrivate private_thread_memory. However, nothing guarantees that private_thread_memory_cleanup() will be the last gslice function called in the thread. There might be other GPrivates registered elsewhere in the process that also call functions that eventually call for instance g_slice_free1(). Boom.

In my case, it's ORBit2's giop.c that registers a destructor giop_thread_free() for an own GPrivate. giop_thread_free() calls g_cond_free() which on Win32 is implemented as calling g_ptr_array_free(), which calls g_slice_free1(). When the crash happens, private_thread_memory_cleanup() has already been called for that thread, and then giop_thread_free() causes g_slice_free1() to be called.

The specification for pthread_key_create() (http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_key_create.html) says that "The order of destructor calls is unspecified if more than one destructor exists for a thread when it exits." As it happens, the Win32 GThreads implementation calls the destructors in the same order as the GPrivates were allocated, so giop_thread_free() will be called after private_thread_memory_cleanup().

But even reversing the order in g_thread_exit_win32_impl() doesn't help. The problem is then g_thread_cleanup() in gthread.c. That function is registered as a GPrivate destructor in g_thread_init_glib() before _g_slice_thread_init_nomessage() is called, so if private_thread_memory_cleanup() is called before g_thread_cleanup() we have an even more obvious problem involving just GLib functions.

I have no idea how to fix this problem. I will try to write a minimal test program that demonstrates it on both Linux and Win32. At least I assume it is a cross-platform problem.

Comment 3 Matthias Clasen 2006-02-20 02:35:56 UTC

Would it work to make g_slice_alloc and g_slice_free1 treat tmem == NULL
like acat == 2 ?

Comment 4 Tor Lillqvist 2006-02-20 02:51:22 UTC

This is after all a Win32-specific bug. The code in g_thread_exit_win32_impl() that calls the GPrivate destructors doesn't obey the implied requirement to have the same semantics as in POSIX pthreads: It should set each GPrivate value to NULL before calling its destructor (with the original non-NULL value as argument). Also, it should loop as long as some of the GPrivate values are non-NULL (in case some destructor causes a GPrivate value (its own or another) to be set back to non-NULL).

With these changes, I no longer get any crash in evolution-data-server. Fixed in HEAD and glib-2-8:

2006-02-20  Tor Lillqvist  <tml@novell.com>

	* gthread-win32.c (g_thread_exit_win32_impl): Make the
	implementation of GPrivate behave more closely as in POSIX
	threads: The value associacted with a GPrivate must be set to NULL
	before calling the destructor. (The destructor gets the original
	value as argument.)  A destructor might re-associate a non-NULL
	value with some GPrivate. To deal with this, if after all
	destructors have been called, there still are some non-NULL
	values, the process is repeated. (#331367)