Bug 583511 – race condition means g_main_loop_quit() does not work to exit loop

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 583511 - race condition means g_main_loop_quit() does not work to exit loop


Summary:	race condition means g_main_loop_quit() does not work to exit loop


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	mainloop
Version:	2.14.x
Hardware:	Other All

Importance:	Normal major
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Duplicates:	632301 (view as bug list)
Depends on:
Blocks:

Reported:	2009-05-22 00:27 UTC by Eric House
Modified:	2011-09-10 02:32 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Eric House 2009-05-22 00:27:10 UTC

Please describe the problem:
If one thread calls g_main_loop_run(), and another calls g_main_loop_quit(), it is possible that g_main_loop_run() will never return, being stuck in poll().

Steps to reproduce:
There is a small window of opportunity for glib functions g_main_loop_quit and g_main_loop_wakeup to fail to function.

** Steps To Reproduce:
NOTE: it takes 36 hours to reproduce this problem using stress testing, but the addition of a sleep() to the glib source makes it happen every time.
1) In g_main_context_iterate(), put a sleep(10); statement between UNLOCK_CONTEXT, and g_main_context_prepare().
2) In one thread, do g_main_loop_run(), In a second thread, do g_main_loop_quit() such that it's called 5 seconds later (after g_main_loop_run() was called in the first thread.)

The variable poll_waiting will be false when g_main_loop_quit / g_main_loop_wakeup executes and therefore no write() will be done to the wake_up_pipe.  When the main loop finally reaches poll(), it will be stuck forever.

Actual results:
Unless there are other sources, the poll() in the main loop never returns in spite of g_main_loop_quit() having been called.  The app is hung.

Expected results:
g_main_loop_quit() always causes the main loop to exit.

Does this happen every time?
yes (provided the sleep() call is added to exacerbate the race condition.)

Other information:

Comment 1 Colin Walters 2009-05-22 16:11:37 UTC

Isn't it a programmer error to call g_main_loop_quit without having ensured that g_main_loop_run was already called?

Comment 2 Eric House 2009-05-22 16:29:21 UTC

g_main_loop_run() has been called; it hasn't necessarily returned yet, but it's been called in this scenario.  Since it's being called from a different thread than is calling g_main_loop_quit() it's hard to guarantee the order.  If there's to be synchronization required, does it belong in glib or in calling apps?

Comment 3 Eric House 2009-05-22 17:57:44 UTC

Just to be clear, in the scenario that demonstrates this bug Thread_A calls g_main_loop_run() *before* Thread_B calls g_main_loop_quit().  Thread_A is
inside g_main_context_iterate() and has not yet reached the poll() call; instead it's about to call g_main_context_prepare() when it gets swapped out and Thread_B starts executing inside g_main_loop_quit().  This is a race condition that you won't see unless you call g_main_loop_quit() fairly soon after another thread calls g_main_loop_run().  It's not something the programmer can fix.

Comment 4 Eric House 2009-05-22 22:44:30 UTC

Just to be clear, in the scenario that demonstrates this bug Thread_A calls g_main_loop_run() *before* Thread_B calls g_main_loop_quit().  Thread_A is
inside g_main_context_iterate() and has not yet reached the poll() call; instead it's about to call g_main_context_prepare() when it gets swapped out and Thread_B starts executing inside g_main_loop_quit().  This is a race condition that you won't see unless you call g_main_loop_quit() fairly soon after another thread calls g_main_loop_run().  It's not something the programmer can fix.

Comment 5 Colin Walters 2009-05-26 16:19:42 UTC

(Note I'm not a glib developer, just an observer)  I see, that sounds plausible.

When I write threaded code interacting with a mainloop, pretty much the only function I use is g_idle_add (or the more lowlevel g_source_attach that it wraps) from threads other than the one running the loop.

In this case a possible workaround would be to add an idle to the mainloop, and inside that idle call g_main_loop_quit.  

But it would be good to fix this bug.  I wonder if removing the guard on poll_waiting for the write to wake_up_pipe would work; possibly would result in multiple wakeups in some scenarios?

Comment 6 Ed Wei 2009-05-26 17:42:18 UTC

Note that g_idle_add() also will sometimes fail because g_idle_add invokes g_source_attach(), which then calls g_main_context_wakeup().

Any code that calls g_main_context_wakeup() may fail in a multi-threaded applications.  This is why g_main_loop_quit() also fails.

Comment 7 Owen Taylor 2009-05-26 21:02:27 UTC

There's no race condition with the idle - the wakeup isn't needed if we haven't run the source prepare functions yet.

I think you are right about the possibility of a race condition in the case of quit(). Probably the easiest way to handle it would be to get rid of poll_waiting. (Some discussion at the end of bug 320888.) There are also possibilities for making poll_waiting a count/serial instead of a boolean.

If poll_waiting is eliminated, care must be taken that adding a source (say a timeout) *from the main thread* does not cause the next poll to wake up immediately. (Probably OK to require that g_main_context_acquire() is called *before any loop where you do while (!condition) g_main_context_iterate()))

Comment 8 Allison Karlitskaya (desrt) 2011-09-10 00:46:04 UTC

*** Bug 632301 has been marked as a duplicate of this bug. ***

Comment 9 Allison Karlitskaya (desrt) 2011-09-10 01:50:02 UTC

Bug 320888 has a patch that should fix this issue too.