GNOME Bugzilla – Bug 579984
alternate GMainContext support
Last modified: 2009-07-03 12:57:26 UTC
There needs to be a way to make gio methods running in other threads use a non-default GMainContext. This was originally discussed in the gnio bug (bug 515973).
The initial suggestion was to have a stack of GMainContexts per thread. I later suggested having just g_io_set_context_for_thread() which could only be called once (per thread), to avoid the problem of different parts of the stack wanting to use different contexts at the same time. However, this wouldn't allow for porting libsoup to use gnio, because adding a requirement that the app call g_io_set_context_for_thread() would be an API break. Maybe the best thing to do would be to have the GMainContext stack, but make it very clear in the docs that only the creator of a thread can create a new GMainContext for that thread, and other code has to simply accept whatever GMainContext is given to it (either explicitly, as in libsoup, or implicitly, via the context stack, for gio/gnio).
Step 2 is figuring out exactly what methods should use the thread context rather than the main context. Particularly with the weird stuff like gasynchelper and gioscheduler.
In libsoup I ended up adding:
GSource *soup_add_io_watch (GMainContext *async_context,
GSource *soup_add_idle (GMainContext *async_context,
GSource *soup_add_completion (GMainContext *async_context,
GSource *soup_add_timeout (GMainContext *async_context,
(_add_completion is _add_idle except with G_PRIORITY_DEFAULT instead of G_PRIORITY_DEFAULT_IDLE)
We'll probably want some similar convenience methods in gio, though exactly what depends on what it's currently using. (eg, we might need to expose the _full() versions).
Created attachment 135534 [details] [review]
patch with suggested API and some patches that probably don't work yet
Here's a suggested API. I put it at the glib level rather than gio, because it fits in cleanly there, and because there don't currently seem to be any libglib APIs that would need to take it into account anyway, so adding it there doesn't create any additional ABI issues/API bloat.
In gio, it turns out that just fixing g_simple_async_result_complete_in_idle() handles a lot of the cases.
The docs for g_main_context_push_thread_default() note:
+ * Beware that libraries that predate this function may not correctly
+ * handle being run in a thread with a thread-default context, and may
+ * erroneously send their results to the main context instead.
This is not really an ABI break; no existing program would be calling g_main_context_push_thread_default(), so no existing programs would break. It just means that some programs may not be able to use the new API until the libraries they depend on have been updated to be aware of it.
+ * some libraries may do some of their event handling in the main
+ * context even for operations started from another context, in which
+ * case operations started in other threads may not complete if the
+ * main thread is not running.
Eg, several of the implementations of GFileMonitor have a single global monitoring object, and this patch leaves those in the main GMainContext, with individual file monitor objects emitting their own signals in whatever thread they're created in.
Does this overall approach look right?
I didn't review the code in general, but it looks good so far.
However, its missing one of the things mentioned in bug 517137, comment 11.
Basically, we want to make it somewhat defined exactly what APIs support this. APIs come in two forms here.
Either they are just a direct set of calls that run a static piece of code. These are documented to support g_main_context_push_thread_default() or not, and once we have a release with this we can never change it (as it would be a unsafe for apps calling it with the context set and assuming the old behaviour).
However, another form of API is the open-ended classes and functionallity that have multiple implementations. For instance streams and GFile ops. What we want to do for these is define which ones allow support for alternative contexts (most of them) and then add a class member for these classes/interfaces that are set to zero by default, but classes handling alternative main contexts set this to non-zero. We then make the basic function calls (e.g. g_input_stream_read_async) verify that if an alternative main context is set then the implementation must handle it, otherwise we return an error. This way we can safely extend the set of implementations that support this and be backwards compat.
Since we introduce this functionallity now we need to ensure that we get good coverage of this in all the APIs we have, since its incompatible to add this later. Although by "add" we may just mean error out if an alternative context is set for now.
(In reply to comment #2)
> We then make the basic function calls (e.g.
> g_input_stream_read_async) verify that if an alternative main context is set
> then the implementation must handle it, otherwise we return an error. This way
> we can safely extend the set of implementations that support this and be
> backwards compat.
Hm... that works for the case where someone is subclassing a gio class, but it doesn't work in the case where you have your own class with its own async ops, which use gio async ops inside. Eg, I call foo_bar_do_something_async(), and it creates a GFile and does some file/stream ops, and then returns an answer. FooBar doesn't know anything at all about alternative contexts, so it doesn't know to do anything special, and doesn't bail out with an error. And when it calls the GFile ops, they don't know that they're being called from naive code, so they *do* do the special alternative main context stuff, and you end up with a mess.
Although... the naive code still has to create a GAsyncResult at some point. GAsyncResultIface doesn't have any expansion slots, but we could add stuff to GSimpleAsyncResult at least; have there be some way of working with it that indicates that you know about alternative contexts (g_simple_async_result_new_for_context(), or g_simple_async_result_bind_to_context(), or g_simple_async_result_complete_in_context() or whatever), and if the caller tries to complete a GSimpleAsyncResult incorrectly while there's a non-default context active, we return an error.
(Well, that only works if they create the GSimpleAsyncResult from the do_something_async() method, rather than, say, queueing an idle handler and then creating the result from there. But in general, most code probably does create the result right away, because otherwise you need to create some *other* object to store the callback pointer and user_data, which is just silly.)
Either way, we need to keep the warning about "libraries that predate this function may not correctly handle being run in a thread with a thread-default context"
GAsyncResultIface is a GInterface, so it is safe to expand without breaking API. (This is debatable, but debates has been had and this is the current standpoint, it works fine for GObject itself, but may be a bit weird for e.g. c++ bindings)
Hm... I didn't remember bug 517137. Should it be marked a dup of this now? (Or vice versa.)
(This is kind of a brain-dumpy comment; I was figuring this all out as I wrote it, and then rewriting and re-rewriting.)
If the sanity-checking happens in GSimpleAsyncResult as suggested in comment #3, then we can only g_return_if_fail(); we can't cleanly return a GError as suggested in bug 517137.
OTOH, if the sanity-checking happens in the wrapper functions as suggested in bug 517137, then we have the problem with async-ops-that-wrap-other-async-ops mentioned in comment #3.
> However, another form of API is the open-ended classes and functionality that
> have multiple implementations. For instance streams and GFile ops.
But even here there are two different cases; if you call g_unix_input_stream_new(), you know you're getting back a GUnixInputStream, and so you know whether or not it supports thread-default contexts. It's only when you do things like g_file_read() that you get back a "mystery" object that may or may support thread-default contexts. But while *you* don't know what kind of stream you're getting back, *the GFile* does know. So we don't need to add supports_thread_default_contexts checks to every class, just the ones that could be returned directly from a call involving an extension point. So if you have a GFile, and g_file_supports_thread_default_contexts(file) returns TRUE, then that means that the GFile guarantees that the streams it returns for read/append/replace/etc support thread default contexts, and likewise with the GFileEnumerators it returns. (But not necessarily its GFileMonitors, since there's another extension point there where it could end up returning some strange third-party file monitor.)
Using that rule, I think all we need is g_file_supports_thread_default_contexts(), g_file_monitor_supports_thread_default_contexts(), and g_volume_monitor_supports_thread_default_contexts().
(If anyone has a good way of saying _supports_thread_default_contexts in fewer words, that would be great :)
We can also do some of the GSimpleAsyncResult-level sanity checking suggested before, so we can spew g_warnings if the app messes up and calls non-thread-default-context-aware APIs from within a thread-default context. (But this would just be to make it easier to debug; fundamentally, the programmer has committed an error, equivalent to calling a non-thread-safe method from a random thread without doing any locking, and so we would just be pointing out that the code is buggy, not trying to actually make it work.)
(In reply to comment #5)
> Using that rule, I think all we need is
> g_file_monitor_supports_thread_default_contexts(), and
As discussed in IRC the other day, we don't actually need a test for GFileMonitor, because only GLocalFileMonitor is extensible, but glocalfilemonitor.h isn't installed, so all extensions have to be in-tree.
Also, I didn't bother with GVolumeMonitor; for now I just left it documented that it's not thread-default-context aware. We could change it later.
About to attach patches...
Created attachment 137034 [details] [review]
Remove some unused code
Created attachment 137035 [details] [review]
Use low-level GSource methods in a few more places
(in preparation for thread-default context support)
Created attachment 137036 [details] [review]
Add g_main_context_push_thread_default() and friends
This allows applications to use async methods from other threads, or
in multiple independent main loops.
Created attachment 137037 [details] [review]
GFile allows for the possibility that external implementations may not
support thread-default contexts yet, via
g_file_supports_thread_contexts(). GVolumeMonitor is not yet
Add a test program to verify that basic gio async ops work correctly
in non-default contexts.
*** Bug 517137 has been marked as a duplicate of this bug. ***
About the conversion to GSource, generally you unref the source after attaching it and then destroy it when done. This is the pattern used when using tags. Is this safe in the GSource case though? I mean, you're keeping a pointer but have no ref. What if the source is removed from the mainloop for some other reason (I dunno, can this happen?) then you will access the freed source.
In g_source_remove() we protect by this by looking for the tag and only destroying if its found.
Anyway, not saying its wrong, just needs some consideration.
I think there are some timeouts in the inotify code too.
I didn't see any warning code in GSimpleAsync for the "mismatched API" case.
(In reply to comment #12)
> About the conversion to GSource, generally you unref the source after attaching
> it and then destroy it when done. This is the pattern used when using tags. Is
> this safe in the GSource case though?
Actually, I originally wrote it to keep the extra ref, and then saw that the existing GSource-using code in gio didn't do that, so I changed my code to match.
> I mean, you're keeping a pointer but have
> no ref. What if the source is removed from the mainloop for some other reason
> (I dunno, can this happen?)
Hm... someone could use g_main_context_find_source_by_user_data(), although they sort of deserve to lose then. The sources don't take a ref on their GMainContext though, so if someone started an operation in an alternate context and then destroyed that context before the op finished, that could unref the source. That also seems like a weird thing to do though.
Still, keeping the extra ref is safer. I'll change everything to do that.
(In reply to comment #13)
> I think there are some timeouts in the inotify code too.
Right, but there's only one fd and corresponding GSource receiving notifications from the kernel, which gets shared by all GInotifyFileMonitors. So that runs in the global default context and eventually bubbles events up to ih_event_callback() which passes them to g_file_monitor_emit_event(), which does an emit_in_idle(), which emits the signal in the correct context. The other file monitor implementations work similarly, and the GFileMonitor docs warn about this:
* (though if the global default main context is blocked, this may
* cause notifications to be blocked even if the thread-default
* context is still running).
> I didn't see any warning code in GSimpleAsync for the "mismatched API" case.
Yeah, we didn't end up needing any API additions to GSimpleAsyncResult, so when the app calls g_simple_async_result_complete_in_idle(), there's no way to know for sure if it's using it "correctly" or not. So if we want to emit a warning in only the bad cases, we'd either have to deprecate complete_in_idle() in favor of "complete_in_thread_context()" or something, or else we'd have to add a g_simple_async_result_I_know_what_I'm_doing_so_don't_warn() method to set a flag to suppress the warning. And that didn't seem worth it, since this wouldn't actually make the bad code work, it would just make it spew a warning before crashing/hanging.
However, we *can* easily emit a warning in the case where the app calls complete() directly from one of its own idle/timeout/watch handlers, since in that case we'd be able to tell if it was doing it from the wrong context.
Created attachment 137321 [details] [review]
Keep a ref on GSources that we might want to explicitly destroy later
Ah, my bad; the existing code was consistent about keeping the ref on
sources that it was keeping a pointer to, and was only unreffing after
attaching in the case that it wasn't keeping track of the source after
Created attachment 137323 [details] [review]
Warn when calling g_simple_async_result_complete() from wrong thread
Is that warning really right?
I mean, you do need to run g_simple_async_result_complete from some kind of callback on the specified context. However, on that thread g_main_context_get_thread_default() may not return anything, since the thread specific context need only be set in the thread that called the initial async operation.
Maybe something like g_source_get_context (g_main_current_source()) instead?
Otherwise this looks good to me, and we should get it in.
Created attachment 137659 [details] [review]
Use low-level GSource methods in a few more places
Now with fixed refcounting...
Created attachment 137661 [details] [review]
Add g_main_context_push_thread_default() and friends
Main difference from before is that g_main_context_push_thread_default()
now does a g_main_context_acquire(), and pop does a release; having a
thread-default context that was running in another thread could cause
the mainloop-based bits of an async operation to start running before
the foo_async() method had returned, which makes things pointlessly
complicated, so just forbid that.
Created attachment 137663 [details] [review]
Includes fixed g_simple_async_result_complete() warning, and a heavy
rewrite of GUnixResolver; the old code needed to be reorganized to ensure
that the libasyncns *_done() methods would be called right away even when
we weren't calling g_simple_async_result_complete() right away.
Except as noted, the patches are basically the same as the obsoleted versions. I think this is ready to go in.
I'm not sure about the g_main_context_acquire() call. If you just temporary push/pop around your async call this will work, but if you set a default for a whole thread it will make it impossible for another thread to service that mainloop.
For instance, if you call g_main_context_push_thread_default(NULL) in a thread you will make the main thread block on entering the main context.
I understand the problem though. In a case like the above with a thread default main context the result can start running in the other thread before the async operation has returned. Not sure what a good solution would be though...
(In reply to comment #22)
> if you set a default for a
> whole thread it will make it impossible for another thread to service that
Yes, that's intentional.
> For instance, if you call g_main_context_push_thread_default(NULL) in a thread
> you will make the main thread block on entering the main context.
Right, but would you expect that call to do anything useful? Having a NULL thread-default context is the same as being in the pre-thread-default-context glib/gio, and there's currently no expectation that you're allowed to start a gio async method from any random thread and have it always work. You have to start async ops in the thread that they're going to run and finish in, because (unless the op itself does g_simple_async_result_run_in_thread() or something), the op is not going to be written thread-safely and might fail arbitrarily if you run different parts in different threads.
g_main_context_push_thread_default() solves the problem of "I want to run async ops in a main loop in another thread", not "I want to be able to send async ops to run in arbitrary threads". (If you want to do that, you can just add an idle handler to that context and start the async op from there...)
Actually, this could be another warning: if you're starting an async op in a thread other than the main thread, and g_main_context_get_thread_default() returns NULL, then the caller messed up.
Ah, yeah. Then this looks ok to me. Please commit.
(In reply to comment #23)
Very good explanation, esp. this catch:
> Actually, this could be another warning: if you're starting an async op in a
> thread other than the main thread, and g_main_context_get_thread_default()
> returns NULL, then the caller messed up.
It would be good if this made it into the documentation.
Thanks Dan & Alex !