Bug 736683 – Thread safety issues with g_main_context_find_source_by_id

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 736683 - Thread safety issues with g_main_context_find_source_by_id


Summary:	Thread safety issues with g_main_context_find_source_by_id


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	mainloop
Version:	2.41.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-09-15 15:43 UTC by Kalev Lember
Modified:	2014-09-19 17:39 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gsource: clarify restrictions on non-existant IDs (2.97 KB, patch) 2014-09-16 23:50 UTC, Allison Karlitskaya (desrt)	reviewed	Details \| Review
gmain: improve g_source_set_name thread safety (2.26 KB, patch) 2014-09-16 23:50 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review
gsource: clarify restrictions on non-existant IDs (3.96 KB, patch) 2014-09-19 17:05 UTC, Allison Karlitskaya (desrt)	committed	Details \| Review

Description Kalev Lember 2014-09-15 15:43:20 UTC

I'm seeing valgrind warnings about invalid reads in GSource code. If my understanding is right, the source gets destroyed in another thread after g_main_context_find_source_by_id() releases the context mutext and before the SOURCE_DESTROYED (source) test, which then accesses freed memory.

==3731== Thread 11 pool:
==3731== Invalid read of size 4
==3731==    at 0x7DBD7A2: g_main_context_find_source_by_id (gmain.c:2063)
==3731==    by 0x7DBD3F5: g_source_set_name_by_id (gmain.c:1916)
==3731==    by 0x415E0F: gs_app_queue_notify (gs-app.c:335)
==3731==    by 0x41630A: gs_app_set_state (gs-app.c:492)
==3731==    by 0x1C8DBBDF: gs_plugin_packagekit_add_results (packagekit-common.c:159)
==3731==    by 0x1E115730: gs_plugin_add_installed (gs-plugin-packagekit.c:134)
==3731==    by 0x43BBF6: gs_plugin_loader_run_results_plugin (gs-plugin-loader.c:351)
==3731==    by 0x43BE91: gs_plugin_loader_run_results (gs-plugin-loader.c:399)
==3731==    by 0x43D7A2: gs_plugin_loader_get_installed_thread_cb (gs-plugin-loader.c:1187)
==3731==    by 0x77F8A52: g_task_thread_pool_thread (gtask.c:1215)
==3731==    by 0x7DED552: g_thread_pool_thread_proxy (gthreadpool.c:307)
==3731==    by 0x7DECF74: g_thread_proxy (gthread.c:764)
==3731==  Address 0x22251b1c is 44 bytes inside a block of size 96 free'd
==3731==    at 0x4C2CCE9: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3731==    by 0x7DC6A65: g_free (gmem.c:190)
==3731==    by 0x7DBD677: g_source_unref_internal (gmain.c:2005)
==3731==    by 0x7DBE526: g_main_dispatch (gmain.c:3091)
==3731==    by 0x7DBF17D: g_main_context_dispatch (gmain.c:3663)
==3731==    by 0x7DBF36F: g_main_context_iterate (gmain.c:3734)
==3731==    by 0x7DBF433: g_main_context_iteration (gmain.c:3795)
==3731==    by 0x7827E09: g_application_run (gapplication.c:2282)
==3731==    by 0x449C15: main (gs-main.c:49)

Comment 1 Kalev Lember 2014-09-15 20:30:39 UTC

Even worse, g_source_set_name doesn't even attempt to be threadsafe, leading to crashes when called in a thread:

+ Trace 234095

#0 raise
from /lib64/libc.so.6
#1 abort
from /lib64/libc.so.6
#2 __libc_message
from /lib64/libc.so.6
#3 free_check
from /lib64/libc.so.6
#4 free
from /lib64/libc.so.6
#5 g_free
at gmem.c line 190
#6 g_source_set_name
at gmain.c line 1873
#7 g_source_set_name_by_id
at gmain.c line 1920
#8 gs_app_queue_notify
at gs-app.c line 335
#9 gs_app_set_kind
at gs-app.c line 568
#10 gs_plugin_refine_item
at gs-plugin-appstream.c line 434
#11 gs_plugin_refine_from_pkgname
at gs-plugin-appstream.c line 682
#12 gs_plugin_refine
at gs-plugin-appstream.c line 721
#13 gs_plugin_loader_run_refine_plugin
at gs-plugin-loader.c line 184
#14 gs_plugin_loader_run_refine
at gs-plugin-loader.c line 247
#15 gs_plugin_loader_run_results
at gs-plugin-loader.c line 413
#16 gs_plugin_loader_get_installed_thread_cb
at gs-plugin-loader.c line 1187
#17 g_task_thread_pool_thread
at gtask.c line 1215
#18 g_thread_pool_thread_proxy
at gthreadpool.c line 307
#19 g_thread_proxy
at gthread.c line 764
#20 start_thread
from /lib64/libpthread.so.0
#21 clone
from /lib64/libc.so.6

Comment 2 Matthias Clasen 2014-09-16 10:43:29 UTC

For g_source_set_name, I would say: just call it before you attach the source. At that point its yours, exclusively, and thread-safety is not a concern. The docs should make that clearer.

Comment 3 Richard Hughes 2014-09-16 14:56:23 UTC

I think basically the issue is that we can't attach things like g_timeout_add_seconds() and g_idle_add() manually.

Comment 4 Matthias Clasen 2014-09-16 15:35:53 UTC

Why not ?

Comment 5 Richard Hughes 2014-09-16 15:54:51 UTC

I suppose I could call g_timeout_source_new_seconds() myself, and attach the source, but then g_source_set_name() needs a big fat warning in the docs "DO NOT USE THIS MULTITHEADED".

Comment 6 Allison Karlitskaya (desrt) 2014-09-16 23:16:10 UTC

I think we've been pretty clear about this point so far: GSource is not threadsafe until you attach it to a maincontext, at which point it is.

We must therefore either:

 1) disallow use of g_source_set_name() after attach

or

 2) make it threadsafe


I prefer 1 but I'm okay with 2 if we've already gotten ourselves into trouble here.

Comment 7 Allison Karlitskaya (desrt) 2014-09-16 23:22:39 UTC

About the original report: I think this problem is actually impossible to fix except by way of documentation.  We're returning a reference to a source here when the source may destroyed as the source is being returned.

We must therefore only use this function if we know that the source cannot possibly be destroyed (ie: we know a ref is held).  That makes this function a bit useless in multi-thread situations, admittedly (since if you have the ref you could use it directly).  I guess we could document that.

We may also consider adding a full-strength thread-safe version that returns a ref if someone feels that they need it.

Comment 8 Allison Karlitskaya (desrt) 2014-09-16 23:27:59 UTC

One more thought: we get ourselves into a bit more "culpable" trouble when we look at APIs like g_source_remove() which call g_main_context_find_source_by_id() internally.  These APIs are not forced to operate in the unsafe manner illustrated above (since they could easily do everything under a lock).

Consider the docs already on g_source_remove, however:

 * It is a programmer error to attempt to remove a non-existent source.


This restriction was added when we decided that we needed to recycle source ID numbers.  Accessing a source by ID when that ID may have been recycled cannot possibly be safe anymore because the ID may now belong to a different source.

Extending this concept not much further it becomes clear that all ID-based source operations should only be valid as long as the source is still living.  This is equivalent to the main context holding a ref on the source, which makes g_main_context_find_source_by_id() safe again.

So to put it another way: it's only safe to use source IDs at all under exactly the circumstance that g_main_context_find_source_by_id() is guaranteed to be threadsafe.

Comment 9 Matthias Clasen 2014-09-16 23:40:45 UTC

    I guess the one-line summary: don't use the by-id convenience apis from
    multiple threads. The only safe thing to do with the id returned by g_idle_add
    in a multi-threaded situation is to throw it away unseen.

Comment 10 Allison Karlitskaya (desrt) 2014-09-16 23:50:15 UTC

Created attachment 286329 [details] [review]
gsource: clarify restrictions on non-existant IDs

Document that one must not use the "by id" source APIs with non-existent
IDs.  The real justification behind this restriction is that the reuse
of source ids makes it unsafe to call these functions unless you're
absolutely sure that the source exists and it belongs to you.  If you
call one of these functions on a source that may already have been
removed then you run the risk of finding someone else's source (with
your reused id).

This also bails us out of a slightly tricky situation with respect to
the threadsafety of g_main_context_find_source_by_id().  The fact that
this function doesn't return a reference implies that its return value
cannot be safely accessed unless we already know for sure that a
reference is being held elsewhere (by example, by the main context
itself if we know that the source has not been removed).  The function
itself, however, performs an access to the value, which could result in
a crash.

If we mandate that it is only valid to call this function on
known-to-exist source IDs then we dodge this problem.

Comment 11 Allison Karlitskaya (desrt) 2014-09-16 23:50:19 UTC

Created attachment 286330 [details] [review]
gmain: improve g_source_set_name thread safety

Step up thread safety on g_source_set_name() to the same standard as all
other GSource functions: after we are attached to a main context, this
function should be threadsafe.

Comment 12 Matthias Clasen 2014-09-17 00:01:41 UTC

Review of attachment 286329 [details] [review]:

sure, although the wording seems slightly legalistic: '...use in such a way that...'. I'll also point out that 'non-existent source IDs' is not 100% clear - what is non-existent here, the source or the id ? Do non-existing sources have IDs ? And how does this fit the case that a source for the ID exists, only it is no longer the one we mean to refer to, but a different one ?

Comment 13 Allison Karlitskaya (desrt) 2014-09-17 00:02:19 UTC

We may also want to add some g_critical() in case of using these APIs with non-existent source IDs, but I'd prefer to save that until after the release.  Addition of new criticals at this point in the game is not a very friendly proposition.

Comment 14 Matthias Clasen 2014-09-17 00:05:47 UTC

Review of attachment 286330 [details] [review]:

::: glib/gmain.c
@@ +1862,3 @@
+ * the value, and changing the value will free it while the other thread
+ * may be attempting to use it.
+ *

Should we recommend the best practice of setting the name before attaching the source to a context ?

@@ +1876,3 @@
+
+  if (context)
+    LOCK_CONTEXT (context);

Does this change the semantics of the function ? It can block now... maybe a harmless change.

Comment 15 Allison Karlitskaya (desrt) 2014-09-17 00:07:19 UTC

(In reply to comment #12)
> Review of attachment 286329 [details] [review]:
> 
> sure, although the wording seems slightly legalistic: '...use in such a way
> that...'. I'll also point out that 'non-existent source IDs' is not 100% clear
> - what is non-existent here, the source or the id ? Do non-existing sources
> have IDs ? 
A non-existent source ID is one that no longer exists in the main context:

 - was never registered

or

 - was registered, but has since been destroyed.


Obviously it's the second case that we care about more.

Perhaps more interesting (although not really at issue here) is that "exists in the main context" is slightly confusing, since the source can stick around after it has been destroyed but not finalized -- which is the purpose for the extra "is destroyed" check before return.  I consider that to be strictly an implementation detail, however.

> And how does this fit the case that a source for the ID exists, only
> it is no longer the one we mean to refer to, but a different one ?

This really goes to the crux of my argument: the real restriction is that you cannot use a source ID unless you are absolutely sure that you still own it.  The uncertainty comes specifically because the two cases that you mention cannot be reliably separated: if my source ID has been freed then I have no way to know that it hasn't been reissued to someone else.

Maybe we could change the wording to focus more on this point: that it is no longer permitted to refer to my source by its ID after my source may have been destroyed.  Perhaps we even mentioning that the reason behind this is that someone else may have added a new source which reused the ID?

Comment 16 Allison Karlitskaya (desrt) 2014-09-17 00:10:28 UTC

(In reply to comment #14)
> Review of attachment 286330 [details] [review]:
> 
> ::: glib/gmain.c
> @@ +1862,3 @@
> + * the value, and changing the value will free it while the other thread
> + * may be attempting to use it.
> + *
> 
> Should we recommend the best practice of setting the name before attaching the
> source to a context ?

That would sort of remove the entire purpose of set_name_by_id, which is to be able to set names after adding.

It is probably worth adding a specific mention there that it is unsafe to use the function in the case of attaching a source which is destined to be run (and destroyed-by-return-value) in another thread.

> @@ +1876,3 @@
> +
> +  if (context)
> +    LOCK_CONTEXT (context);
> 
> Does this change the semantics of the function ? It can block now... maybe a
> harmless change.

This is not really blocking.  This is not the context "acquire"; it's only a short-held mutex to ensure atomicity of operations.

Comment 17 Matthias Clasen 2014-09-17 00:30:42 UTC

(In reply to comment #15)

> Maybe we could change the wording to focus more on this point: that it is no
> longer permitted to refer to my source by its ID after my source may have been
> destroyed.  Perhaps we even mentioning that the reason behind this is that
> someone else may have added a new source which reused the ID?

I think that would be clearer, indeed.

Comment 18 Richard Hughes 2014-09-17 09:38:20 UTC

(In reply to comment #16)
> That would sort of remove the entire purpose of set_name_by_id, which is to be
> able to set names after adding.

Right.

> It is probably worth adding a specific mention there that it is unsafe to use
> the function in the case of attaching a source which is destined to be run (and
> destroyed-by-return-value) in another thread.

Agreed, this would have avoided this bug if there was anything in the docs. In hindsight it is obvious, but a big fat red warning would have helped immensely.

Richard

Comment 19 Allison Karlitskaya (desrt) 2014-09-19 17:05:54 UTC

Created attachment 286650 [details] [review]
gsource: clarify restrictions on non-existant IDs

Document that one must not use the "by id" source APIs with non-existent
IDs.  The real justification behind this restriction is that the reuse
of source ids makes it unsafe to call these functions unless you're
absolutely sure that the source exists and it belongs to you.  If you
call one of these functions on a source that may already have been
removed then you run the risk of finding someone else's source (with
your reused id).

This also bails us out of a slightly tricky situation with respect to
the threadsafety of g_main_context_find_source_by_id().  The fact that
this function doesn't return a reference implies that its return value
cannot be safely accessed unless we already know for sure that a
reference is being held elsewhere (by example, by the main context
itself if we know that the source has not been removed).  The function
itself, however, performs an access to the value, which could result in
a crash.

If we mandate that it is only valid to call this function on
known-to-exist source IDs then we dodge this problem.

Comment 20 Kalev Lember 2014-09-19 17:25:22 UTC

Thanks for the patches, these look like a nice improvement.

We should probably come up with some recommendations how to best set up idles to marshal stuff between threads. As a side note, gnome-software was only naming sources because it seemed to be the best practice, not because we actually needed this for anything.

Currently most gnome apps seem to be using the following pattern:

id = g_idle_add (run_in_idle_cb, data);
g_source_set_name_by_id (id, "[gnome-foobar-app] run_in_idle_cb");

Would be nice to have something threadsafe for doing the above. I suppose it would involve deprecating g_source_set_name_by_id and maybe creating a convenience function that creates a new source and names it and attaches it to the main context. Might also make sense to deprecate the guint-returning functions and replace them with something that returns GSource * instead.

Something to think about for the next cycle. I'm happy to help implement things too.

In any case, I've "fixed" this in gnome-software by just killing all the g_source_set_name_by_id calls, so gnome-software should be fine for 3.14.

Comment 21 Allison Karlitskaya (desrt) 2014-09-19 17:39:35 UTC

Attachment 286330 [details] pushed as dceff8f - gmain: improve g_source_set_name thread safety
Attachment 286650 [details] pushed as 1cbdbef - gsource: clarify restrictions on non-existant IDs