GNOME Bugzilla – Bug 736683
Thread safety issues with g_main_context_find_source_by_id
Last modified: 2014-09-19 17:39:43 UTC
I'm seeing valgrind warnings about invalid reads in GSource code. If my understanding is right, the source gets destroyed in another thread after g_main_context_find_source_by_id() releases the context mutext and before the SOURCE_DESTROYED (source) test, which then accesses freed memory. ==3731== Thread 11 pool: ==3731== Invalid read of size 4 ==3731== at 0x7DBD7A2: g_main_context_find_source_by_id (gmain.c:2063) ==3731== by 0x7DBD3F5: g_source_set_name_by_id (gmain.c:1916) ==3731== by 0x415E0F: gs_app_queue_notify (gs-app.c:335) ==3731== by 0x41630A: gs_app_set_state (gs-app.c:492) ==3731== by 0x1C8DBBDF: gs_plugin_packagekit_add_results (packagekit-common.c:159) ==3731== by 0x1E115730: gs_plugin_add_installed (gs-plugin-packagekit.c:134) ==3731== by 0x43BBF6: gs_plugin_loader_run_results_plugin (gs-plugin-loader.c:351) ==3731== by 0x43BE91: gs_plugin_loader_run_results (gs-plugin-loader.c:399) ==3731== by 0x43D7A2: gs_plugin_loader_get_installed_thread_cb (gs-plugin-loader.c:1187) ==3731== by 0x77F8A52: g_task_thread_pool_thread (gtask.c:1215) ==3731== by 0x7DED552: g_thread_pool_thread_proxy (gthreadpool.c:307) ==3731== by 0x7DECF74: g_thread_proxy (gthread.c:764) ==3731== Address 0x22251b1c is 44 bytes inside a block of size 96 free'd ==3731== at 0x4C2CCE9: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3731== by 0x7DC6A65: g_free (gmem.c:190) ==3731== by 0x7DBD677: g_source_unref_internal (gmain.c:2005) ==3731== by 0x7DBE526: g_main_dispatch (gmain.c:3091) ==3731== by 0x7DBF17D: g_main_context_dispatch (gmain.c:3663) ==3731== by 0x7DBF36F: g_main_context_iterate (gmain.c:3734) ==3731== by 0x7DBF433: g_main_context_iteration (gmain.c:3795) ==3731== by 0x7827E09: g_application_run (gapplication.c:2282) ==3731== by 0x449C15: main (gs-main.c:49)
Even worse, g_source_set_name doesn't even attempt to be threadsafe, leading to crashes when called in a thread:
+ Trace 234095
For g_source_set_name, I would say: just call it before you attach the source. At that point its yours, exclusively, and thread-safety is not a concern. The docs should make that clearer.
I think basically the issue is that we can't attach things like g_timeout_add_seconds() and g_idle_add() manually.
Why not ?
I suppose I could call g_timeout_source_new_seconds() myself, and attach the source, but then g_source_set_name() needs a big fat warning in the docs "DO NOT USE THIS MULTITHEADED".
I think we've been pretty clear about this point so far: GSource is not threadsafe until you attach it to a maincontext, at which point it is. We must therefore either: 1) disallow use of g_source_set_name() after attach or 2) make it threadsafe I prefer 1 but I'm okay with 2 if we've already gotten ourselves into trouble here.
About the original report: I think this problem is actually impossible to fix except by way of documentation. We're returning a reference to a source here when the source may destroyed as the source is being returned. We must therefore only use this function if we know that the source cannot possibly be destroyed (ie: we know a ref is held). That makes this function a bit useless in multi-thread situations, admittedly (since if you have the ref you could use it directly). I guess we could document that. We may also consider adding a full-strength thread-safe version that returns a ref if someone feels that they need it.
One more thought: we get ourselves into a bit more "culpable" trouble when we look at APIs like g_source_remove() which call g_main_context_find_source_by_id() internally. These APIs are not forced to operate in the unsafe manner illustrated above (since they could easily do everything under a lock). Consider the docs already on g_source_remove, however: * It is a programmer error to attempt to remove a non-existent source. This restriction was added when we decided that we needed to recycle source ID numbers. Accessing a source by ID when that ID may have been recycled cannot possibly be safe anymore because the ID may now belong to a different source. Extending this concept not much further it becomes clear that all ID-based source operations should only be valid as long as the source is still living. This is equivalent to the main context holding a ref on the source, which makes g_main_context_find_source_by_id() safe again. So to put it another way: it's only safe to use source IDs at all under exactly the circumstance that g_main_context_find_source_by_id() is guaranteed to be threadsafe.
I guess the one-line summary: don't use the by-id convenience apis from multiple threads. The only safe thing to do with the id returned by g_idle_add in a multi-threaded situation is to throw it away unseen.
Created attachment 286329 [details] [review] gsource: clarify restrictions on non-existant IDs Document that one must not use the "by id" source APIs with non-existent IDs. The real justification behind this restriction is that the reuse of source ids makes it unsafe to call these functions unless you're absolutely sure that the source exists and it belongs to you. If you call one of these functions on a source that may already have been removed then you run the risk of finding someone else's source (with your reused id). This also bails us out of a slightly tricky situation with respect to the threadsafety of g_main_context_find_source_by_id(). The fact that this function doesn't return a reference implies that its return value cannot be safely accessed unless we already know for sure that a reference is being held elsewhere (by example, by the main context itself if we know that the source has not been removed). The function itself, however, performs an access to the value, which could result in a crash. If we mandate that it is only valid to call this function on known-to-exist source IDs then we dodge this problem.
Created attachment 286330 [details] [review] gmain: improve g_source_set_name thread safety Step up thread safety on g_source_set_name() to the same standard as all other GSource functions: after we are attached to a main context, this function should be threadsafe.
Review of attachment 286329 [details] [review]: sure, although the wording seems slightly legalistic: '...use in such a way that...'. I'll also point out that 'non-existent source IDs' is not 100% clear - what is non-existent here, the source or the id ? Do non-existing sources have IDs ? And how does this fit the case that a source for the ID exists, only it is no longer the one we mean to refer to, but a different one ?
We may also want to add some g_critical() in case of using these APIs with non-existent source IDs, but I'd prefer to save that until after the release. Addition of new criticals at this point in the game is not a very friendly proposition.
Review of attachment 286330 [details] [review]: ::: glib/gmain.c @@ +1862,3 @@ + * the value, and changing the value will free it while the other thread + * may be attempting to use it. + * Should we recommend the best practice of setting the name before attaching the source to a context ? @@ +1876,3 @@ + + if (context) + LOCK_CONTEXT (context); Does this change the semantics of the function ? It can block now... maybe a harmless change.
(In reply to comment #12) > Review of attachment 286329 [details] [review]: > > sure, although the wording seems slightly legalistic: '...use in such a way > that...'. I'll also point out that 'non-existent source IDs' is not 100% clear > - what is non-existent here, the source or the id ? Do non-existing sources > have IDs ? A non-existent source ID is one that no longer exists in the main context: - was never registered or - was registered, but has since been destroyed. Obviously it's the second case that we care about more. Perhaps more interesting (although not really at issue here) is that "exists in the main context" is slightly confusing, since the source can stick around after it has been destroyed but not finalized -- which is the purpose for the extra "is destroyed" check before return. I consider that to be strictly an implementation detail, however. > And how does this fit the case that a source for the ID exists, only > it is no longer the one we mean to refer to, but a different one ? This really goes to the crux of my argument: the real restriction is that you cannot use a source ID unless you are absolutely sure that you still own it. The uncertainty comes specifically because the two cases that you mention cannot be reliably separated: if my source ID has been freed then I have no way to know that it hasn't been reissued to someone else. Maybe we could change the wording to focus more on this point: that it is no longer permitted to refer to my source by its ID after my source may have been destroyed. Perhaps we even mentioning that the reason behind this is that someone else may have added a new source which reused the ID?
(In reply to comment #14) > Review of attachment 286330 [details] [review]: > > ::: glib/gmain.c > @@ +1862,3 @@ > + * the value, and changing the value will free it while the other thread > + * may be attempting to use it. > + * > > Should we recommend the best practice of setting the name before attaching the > source to a context ? That would sort of remove the entire purpose of set_name_by_id, which is to be able to set names after adding. It is probably worth adding a specific mention there that it is unsafe to use the function in the case of attaching a source which is destined to be run (and destroyed-by-return-value) in another thread. > @@ +1876,3 @@ > + > + if (context) > + LOCK_CONTEXT (context); > > Does this change the semantics of the function ? It can block now... maybe a > harmless change. This is not really blocking. This is not the context "acquire"; it's only a short-held mutex to ensure atomicity of operations.
(In reply to comment #15) > Maybe we could change the wording to focus more on this point: that it is no > longer permitted to refer to my source by its ID after my source may have been > destroyed. Perhaps we even mentioning that the reason behind this is that > someone else may have added a new source which reused the ID? I think that would be clearer, indeed.
(In reply to comment #16) > That would sort of remove the entire purpose of set_name_by_id, which is to be > able to set names after adding. Right. > It is probably worth adding a specific mention there that it is unsafe to use > the function in the case of attaching a source which is destined to be run (and > destroyed-by-return-value) in another thread. Agreed, this would have avoided this bug if there was anything in the docs. In hindsight it is obvious, but a big fat red warning would have helped immensely. Richard
Created attachment 286650 [details] [review] gsource: clarify restrictions on non-existant IDs Document that one must not use the "by id" source APIs with non-existent IDs. The real justification behind this restriction is that the reuse of source ids makes it unsafe to call these functions unless you're absolutely sure that the source exists and it belongs to you. If you call one of these functions on a source that may already have been removed then you run the risk of finding someone else's source (with your reused id). This also bails us out of a slightly tricky situation with respect to the threadsafety of g_main_context_find_source_by_id(). The fact that this function doesn't return a reference implies that its return value cannot be safely accessed unless we already know for sure that a reference is being held elsewhere (by example, by the main context itself if we know that the source has not been removed). The function itself, however, performs an access to the value, which could result in a crash. If we mandate that it is only valid to call this function on known-to-exist source IDs then we dodge this problem.
Thanks for the patches, these look like a nice improvement. We should probably come up with some recommendations how to best set up idles to marshal stuff between threads. As a side note, gnome-software was only naming sources because it seemed to be the best practice, not because we actually needed this for anything. Currently most gnome apps seem to be using the following pattern: id = g_idle_add (run_in_idle_cb, data); g_source_set_name_by_id (id, "[gnome-foobar-app] run_in_idle_cb"); Would be nice to have something threadsafe for doing the above. I suppose it would involve deprecating g_source_set_name_by_id and maybe creating a convenience function that creates a new source and names it and attaches it to the main context. Might also make sense to deprecate the guint-returning functions and replace them with something that returns GSource * instead. Something to think about for the next cycle. I'm happy to help implement things too. In any case, I've "fixed" this in gnome-software by just killing all the g_source_set_name_by_id calls, so gnome-software should be fine for 3.14.
Attachment 286330 [details] pushed as dceff8f - gmain: improve g_source_set_name thread safety Attachment 286650 [details] pushed as 1cbdbef - gsource: clarify restrictions on non-existant IDs