After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 720186 - g_main_context_unref() versus g_source_unref() race
g_main_context_unref() versus g_source_unref() race
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: mainloop
2.36.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2013-12-10 11:16 UTC by Ognyan Tonchev (redstar_)
Modified: 2018-05-24 16:09 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
example program showing the problem (1.58 KB, text/plain)
2013-12-10 11:16 UTC, Ognyan Tonchev (redstar_)
Details

Description Ognyan Tonchev (redstar_) 2013-12-10 11:16:56 UTC
Created attachment 263908 [details]
example program showing the problem

The sample program creates an idle source and attaches it to a main loop context. Then it unrefs the idle source while the main loop is quitting and unreffing its context which leads to the context being freed along with its mutex lock. At the very same time the source unref tries to lock the mutex (which no longer exists).

A normal execution of the program give the following log output:

main loop started
unreffing main loop            1
source unreffed               2
we are now done!

But when run for a long time either the program will hang, or
it segfaults like this:

main loop started
unreffing main loop            1
Segmentation fault (core dumped)

or libc will detect that the mutex is invalid and glib will abort:

main loop started
unreffing main loop            1
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument.  Aborting.

I think that one of the problems could be that GSource has a pointer to GMainContext, but it doesn't have a reference to it. Is adding a reference enough to fix this problem in your opinion? If so I can attach a proposed patch.
Comment 1 Dan Winship 2014-02-16 15:10:31 UTC
(In reply to comment #0)
> I think that one of the problems could be that GSource has a pointer to
> GMainContext, but it doesn't have a reference to it. Is adding a reference
> enough to fix this problem in your opinion? If so I can attach a proposed
> patch.

That would create a reference-counting loop and would mean that you couldn't destroy a GMainContext while there were still GSources attached to it.

But I think we have to do something like that... basically treat each attached source as a sort of "ref". g_main_context_unref() would only fully destroy the context if there were no sources attached to it at the end. Otherwise, it would have to leave context and context->mutex allocated until the last source was freed, at which point we could finish destroying the context.
Comment 2 Allison Karlitskaya (desrt) 2014-02-22 05:29:26 UTC
We could also simplify the relationship of sources and contexts.

It seems like the context's lock is used mostly because the source doesn't have atomic integer operations in use for its refcounting operations.

If we changed the source to use atomics, then we could just have the maincontext own a normal ref on the source which could be atomically dropped when the context was no longer interested.  Destroying a source with the last reference gone would then have absolutely nothing to do with a context.

This would require another simplification: GMainContext keeps a source around in a quasi-dead state, even after it was destroyed until the last reference is actually dropped.  It would be much easier if the source were just completely disassociated with the context at the time it was destroyed.
Comment 3 Allison Karlitskaya (desrt) 2014-02-22 14:13:21 UTC
So after some attempts to untangle this, I start to understand why the situation is how it is.

It comes down to the fact that in a great many cases, GMainContext wants to drop references on a GSource while holding its own lock -- and we can't call the source's finalize in that case, so we must pass in the context lock so that it can be unlocked before callbacks are done.

The reason for this is that GMainContext drops the lock while making prepare/check/dispatch calls to the GSource.  During the time that the lock is dropped, the source could be destroyed -- possibly from another thread that's not aware what's going on.  The prepare/check/dispatch code must obviously be able to assume that it has a reference, so this is why references are held in the sourceiter and dispatch list.

Trying to drop the reference on the sources before reacquiring the lock is difficult because some internal state manipulations need to be performed on the source in pretty much all cases, and you would possibly be dropping the last reference just before you reacquire the lock to do those manipulations.

You might get the idea to check if the source is destroyed and decide how to act but this is not really possible either.  If you determine destroyed state and drop the reference before entering the locked section again there is a race whereby the source can be destroyed after you check its state, but before you acquire the lock.

So here are some ideas on how we can move to a "only unref sources while the context is not locked" world.  I try to split the ideas into dealing with the dispatch list and dealing with iterating over the sources for check/prepare separately.

First assumption:

 - destroying a source should, under lock, remove all references that the
   "core" of the main context has to a source by transferring these
   references to the calling thread, where they will be dropped once the
   lock has been dropped, before returning to the user.  Other references
   may still be held by non-"core" parts such as the dispatch list or a
   source iter in used in another thread.

Ideas for dealing with dispatching -- will probably want some mix of these:

 - returning FALSE from a dispatch should just be handled by a direct call
   to g_source_destroy() at the end of the dispatch, before reacquiring the
   lock -- this may result in the last reference being dropped

 - blocking and unblocking could also be turned into a 'private external'
   operation in the sense that it's not a public API, but it acquires locks
   as if it were

 - don't hold the lock at any time during the dispatch process except in
   the case that we need to manipulate the actual maincontext state (ie:
   changing lists of polled fds in response to blocking/unblocking of a
   non-recursive source).  This could be safe if we ensure that the
   dispatch list is guarded by the main context having been acquired (as
   I believe is already the case).

Ideas for dealing with check/prepare:

 - hold locks while calling prepare/check callbacks.  For "nice" sources
   that only do things like check fds and timeouts, this wouldn't be a
   problem.  For sources that do more complicated things that may risk
   indirectly re-entering the maincontext, this could be a problem.  This
   would prevent a source from being destroyed during check/prepare and
   would mean that we don't need to keep an extra ref around.

 - do a lock-then-maybe-unlock-again scheme whereby we do allow refs to
   be held by the iters but then before we do the unref we check if the
   source has been destroyed.  If it has, we release the lock to do the
   unref, knowing that it may result in a finalize callback.  If we hold
   the lock and the source has not been destroyed then we can be certain
   that there are at least two refs: the one held by the iter and the one
   held by the context itself -- so releasing the one held by the iter is
   "safe".  This lock-then-maybe-unlock would only occur in cases where
   the source is destroyed (probably from another thread) during check or
   prepare, so it would be quite rare.  We'd want to test it...

The lock-and-unlock scheme (which could also apply in dispatch) sounds a bit inefficient but it ends up, in practice, being almost exactly the same thing as the current situation: we have an unref operation that drops the locks of the context in case of the last ref already...
Comment 4 Allison Karlitskaya (desrt) 2014-02-22 15:07:46 UTC
One more idea: since the only thing that we really need the lock for during dispatch is blocking/unblocking, let's just stop doing that.

It's exceptionally unlikely that for a given dispatch we will re-enter the mainloop so why are we doing all of this work up front?

We have the currently-dispatching source in a thread local variable, so in the case that we actually do mainloop recursion, we can check if we have a current source that's non-recursive and block it at that point...
Comment 5 GNOME Infrastructure Team 2018-05-24 16:09:31 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/803.