After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 583511 - race condition means g_main_loop_quit() does not work to exit loop
race condition means g_main_loop_quit() does not work to exit loop
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: mainloop
2.14.x
Other All
: Normal major
: ---
Assigned To: gtkdev
gtkdev
: 632301 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2009-05-22 00:27 UTC by Eric House
Modified: 2011-09-10 02:32 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Eric House 2009-05-22 00:27:10 UTC
Please describe the problem:
If one thread calls g_main_loop_run(), and another calls g_main_loop_quit(), it is possible that g_main_loop_run() will never return, being stuck in poll().

Steps to reproduce:
There is a small window of opportunity for glib functions g_main_loop_quit and g_main_loop_wakeup to fail to function.

** Steps To Reproduce:
NOTE: it takes 36 hours to reproduce this problem using stress testing, but the addition of a sleep() to the glib source makes it happen every time.
1) In g_main_context_iterate(), put a sleep(10); statement between UNLOCK_CONTEXT, and g_main_context_prepare().
2) In one thread, do g_main_loop_run(), In a second thread, do g_main_loop_quit() such that it's called 5 seconds later (after g_main_loop_run() was called in the first thread.)

The variable poll_waiting will be false when g_main_loop_quit / g_main_loop_wakeup executes and therefore no write() will be done to the wake_up_pipe.  When the main loop finally reaches poll(), it will be stuck forever.

Actual results:
Unless there are other sources, the poll() in the main loop never returns in spite of g_main_loop_quit() having been called.  The app is hung.

Expected results:
g_main_loop_quit() always causes the main loop to exit.

Does this happen every time?
yes (provided the sleep() call is added to exacerbate the race condition.)

Other information:
Comment 1 Colin Walters 2009-05-22 16:11:37 UTC
Isn't it a programmer error to call g_main_loop_quit without having ensured that g_main_loop_run was already called?
Comment 2 Eric House 2009-05-22 16:29:21 UTC
g_main_loop_run() has been called; it hasn't necessarily returned yet, but it's been called in this scenario.  Since it's being called from a different thread than is calling g_main_loop_quit() it's hard to guarantee the order.  If there's to be synchronization required, does it belong in glib or in calling apps?
Comment 3 Eric House 2009-05-22 17:57:44 UTC
Just to be clear, in the scenario that demonstrates this bug Thread_A calls g_main_loop_run() *before* Thread_B calls g_main_loop_quit().  Thread_A is
inside g_main_context_iterate() and has not yet reached the poll() call; instead it's about to call g_main_context_prepare() when it gets swapped out and Thread_B starts executing inside g_main_loop_quit().  This is a race condition that you won't see unless you call g_main_loop_quit() fairly soon after another thread calls g_main_loop_run().  It's not something the programmer can fix.
Comment 4 Eric House 2009-05-22 22:44:30 UTC
Just to be clear, in the scenario that demonstrates this bug Thread_A calls g_main_loop_run() *before* Thread_B calls g_main_loop_quit().  Thread_A is
inside g_main_context_iterate() and has not yet reached the poll() call; instead it's about to call g_main_context_prepare() when it gets swapped out and Thread_B starts executing inside g_main_loop_quit().  This is a race condition that you won't see unless you call g_main_loop_quit() fairly soon after another thread calls g_main_loop_run().  It's not something the programmer can fix.
Comment 5 Colin Walters 2009-05-26 16:19:42 UTC
(Note I'm not a glib developer, just an observer)  I see, that sounds plausible.

When I write threaded code interacting with a mainloop, pretty much the only function I use is g_idle_add (or the more lowlevel g_source_attach that it wraps) from threads other than the one running the loop.

In this case a possible workaround would be to add an idle to the mainloop, and inside that idle call g_main_loop_quit.  

But it would be good to fix this bug.  I wonder if removing the guard on poll_waiting for the write to wake_up_pipe would work; possibly would result in multiple wakeups in some scenarios?  
Comment 6 Ed Wei 2009-05-26 17:42:18 UTC
Note that g_idle_add() also will sometimes fail because g_idle_add invokes g_source_attach(), which then calls g_main_context_wakeup().

Any code that calls g_main_context_wakeup() may fail in a multi-threaded applications.  This is why g_main_loop_quit() also fails.
Comment 7 Owen Taylor 2009-05-26 21:02:27 UTC
There's no race condition with the idle - the wakeup isn't needed if we haven't run the source prepare functions yet.

I think you are right about the possibility of a race condition in the case of quit(). Probably the easiest way to handle it would be to get rid of poll_waiting. (Some discussion at the end of bug 320888.) There are also possibilities for making poll_waiting a count/serial instead of a boolean.

If poll_waiting is eliminated, care must be taken that adding a source (say a timeout) *from the main thread* does not cause the next poll to wake up immediately. (Probably OK to require that g_main_context_acquire() is called *before any loop where you do while (!condition) g_main_context_iterate()))
Comment 8 Allison Karlitskaya (desrt) 2011-09-10 00:46:04 UTC
*** Bug 632301 has been marked as a duplicate of this bug. ***
Comment 9 Allison Karlitskaya (desrt) 2011-09-10 01:50:02 UTC
Bug 320888 has a patch that should fix this issue too.