Bug 704374 – [PATCH] Main loop cannot handle recursion in poll

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 704374 - [PATCH] Main loop cannot handle recursion in poll


Summary:	[PATCH] Main loop cannot handle recursion in poll


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	mainloop
Version:	2.37.x
Hardware:	Other Mac OS

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:	701571

Reported:	2013-07-17 08:38 UTC by Kristian Rietveld
Modified:	2018-05-24 15:31 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed patch (2.55 KB, patch) 2013-07-17 08:38 UTC, Kristian Rietveld	needs-work	Details \| Review

Description Kristian Rietveld 2013-07-17 08:38:14 UTC

Created attachment 249367 [details] [review]
Proposed patch

g_main_context_iterate() creates a cached_poll_array that is passed to the various main loop functions. One of the problems is that g_main_context_iterate() keeps the address to this array around in a local variable. This way, the function cannot handle recursion in g_main_context_poll(), which happens on OS X.
The recursion might re-allocate the cached poll array, so the call after g_main_context_poll() at the base level will access freed memory.

The proposed solution is to always use context->cached_poll_array. This is done in the attached patch. I chose to continue to use nfds instead of context->cached_poll_array_size, so that the number of monitored FDs does not suddenly increase in the base level invocation of g_main_context_iterate().

Pitfall: what happens if FDs are removed for monitoring in a recursing poll? We do know that the cached_poll_array never shrinks, so that's good. But what does this mean for the usage of nfds at the base level of recursion?

Comment 1 Dan Winship 2013-07-18 12:15:13 UTC

Why does g_main_context_poll() get recursively entered on OS X?

Comment 2 Kristian Rietveld 2013-07-18 12:41:02 UTC

GDK's OS X backend has special code, written by Owen Taylor, to integrate the glib and Cocoa main loops. (See gtk+/gdk/quartz/gdkeventloop-quartz.c).  The poll function is replaced with a custom poll function that runs the Cocoa main loop by asking for a next event within a specified time out. When an OS X menu item in the global menu bar is activated, the callback for that menu item activation is run while in the Cocoa main loop, the "next event" function does not return. From within this callback, you are free to run the main loop (for instance, GIMP does this), in which case g_main_context_poll() will be entered recursively.

So the call stack more or less looks like:

 + g_main_context_iterate
   + g_main_context_poll
     + custom poll func
       + Cocoa get next event method
          + handling menu bar events, activating menu item
             + menu item activation callback
                + iterate main loop: g_main_context_iterate

Comment 3 Allison Karlitskaya (desrt) 2013-07-24 02:43:54 UTC

Can we dispatch the menubar events via idles instead of risking the possibility of reentering the mainloop like this?

Comment 4 Michael Natterer 2013-07-24 06:58:03 UTC

Sure, and this is what e.g. GIMP and Ardour are doing currently in order
to avoid the crash, but GLib should not explode in your face when you
forget to idle-dispatch the menu callback ;)

Also, it's not only menubar stuff, whatever native API is called
that maybe under the hoods runs a recursive native main loop is at risk
of crashing unpredictably.

Comment 5 Dan Winship 2013-07-24 14:39:47 UTC

(In reply to comment #4)
> Sure, and this is what e.g. GIMP and Ardour are doing currently in order
> to avoid the crash, but GLib should not explode in your face when you
> forget to idle-dispatch the menu callback ;)

I think he meant, can't gtk idle-dispatch the callbacks itself, in the integrating-with-the-Cocoa-menubar code?

> Also, it's not only menubar stuff, whatever native API is called
> that maybe under the hoods runs a recursive native main loop is at risk
> of crashing unpredictably.

Recursive *native* main loop? Isn't it only a problem if it runs a recursive glib main loop? And why would a native API do that? Can't the gtk/cocoa integration just make this all disappear somehow?


(In reply to comment #0)
> Pitfall: what happens if FDs are removed for monitoring in a recursing poll? We
> do know that the cached_poll_array never shrinks, so that's good. But what does
> this mean for the usage of nfds at the base level of recursion?

If fds are removed in the inner iteration, then when the outer g_main_context_poll() returns, it may end up writing the revents to the wrong pollfds. If fds are added in the inner iteration, then cached_poll_array may be reallocated, in which case when the outer g_main_context_poll() returns, it will write revents to freed memory.

Comment 6 Michael Natterer 2013-07-24 14:57:08 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > Sure, and this is what e.g. GIMP and Ardour are doing currently in order
> > to avoid the crash, but GLib should not explode in your face when you
> > forget to idle-dispatch the menu callback ;)
> 
> I think he meant, can't gtk idle-dispatch the callbacks itself, in the
> integrating-with-the-Cocoa-menubar code?

This is all non-GTK+ code. It either lives in GIMP, or in an external
integration package. All I'm saying is: yes this is a good idea, but
even in the absence of that idle, GTK+ should not crash.

> > Also, it's not only menubar stuff, whatever native API is called
> > that maybe under the hoods runs a recursive native main loop is at risk
> > of crashing unpredictably.
> 
> Recursive *native* main loop? Isn't it only a problem if it runs a recursive
> glib main loop?

Actually, both variants are a problem.

> And why would a native API do that?

To wait for some return values, just like GTK+ runs main looks. Happens
for iirc with DND operations.

> Can't the gtk/cocoa
> integration just make this all disappear somehow?

This is precisely the purpose of Kris' patch :)

I think I better let Kris answer to the last paragraph.

Comment 7 Michael Natterer 2013-07-24 14:59:24 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > Sure, and this is what e.g. GIMP and Ardour are doing currently in order
> > to avoid the crash, but GLib should not explode in your face when you
> > forget to idle-dispatch the menu callback ;)
> 
> I think he meant, can't gtk idle-dispatch the callbacks itself, in the
> integrating-with-the-Cocoa-menubar code?

This is all non-GTK+ code. It either lives in GIMP, or in an external
integration package. All I'm saying is: yes this is a good idea, but
even in the absence of that idle, GTK+ should not crash.

> > Also, it's not only menubar stuff, whatever native API is called
> > that maybe under the hoods runs a recursive native main loop is at risk
> > of crashing unpredictably.
> 
> Recursive *native* main loop? Isn't it only a problem if it runs a recursive
> glib main loop?

Actually, both variants are a problem.

> And why would a native API do that?

To wait for some return values, just like GTK+ runs main looks. Happens
for iirc with DND operations.

> Can't the gtk/cocoa
> integration just make this all disappear somehow?

This is precisely the purpose of Kris' patch :)

I think I better let Kris answer to the last paragraph.

Comment 8 Kristian Rietveld 2013-07-24 20:09:06 UTC

(In reply to comment #5)
> (In reply to comment #4)
> > Sure, and this is what e.g. GIMP and Ardour are doing currently in order
> > to avoid the crash, but GLib should not explode in your face when you
> > forget to idle-dispatch the menu callback ;)
> 
> I think he meant, can't gtk idle-dispatch the callbacks itself, in the
> integrating-with-the-Cocoa-menubar code?

Idle-dispatch does solve the problems reliably, GIMP is using this currently. The integration code, however, is not part of GTK+. It does live in a separate repository, but it is also being copied into other projects (e.g. GIMP). 

Furthermore, everybody is free to implement their own Cocoa menu handling. And there may be other cases in which an idle-dispatch would have to be added manually, drag-and-drop, dock icon and menu bar status item interaction all come to mind as potential candidates.

I have to concur with Mitch that GLib should be able to handle this kind of recursion.


> > Also, it's not only menubar stuff, whatever native API is called
> > that maybe under the hoods runs a recursive native main loop is at risk
> > of crashing unpredictably.
> 
> Recursive *native* main loop? Isn't it only a problem if it runs a recursive
> glib main loop?

I *think* only the case the glib mainloop is run recursively from a native callback is problematic at the moment.

> Can't the gtk/cocoa
> integration just make this all disappear somehow?

As far as I know, there is no other common layer where this can be fixed apart from GLib and GDK. If we resort to idle-dispatch, people will have to patch up their own projects manually.

Comment 9 Allison Karlitskaya (desrt) 2013-07-25 02:21:23 UTC

Gtk has code for making menubars on MacOS these days and it's improving lately.  I'd prefer if we just dealt with it there and put up a sign post warning anyone else playing around with this sort of stuff that they have to be very very careful not to recurse the mainloop or to make callbacks to code that might.

Comment 10 Michael Natterer 2013-07-25 09:41:44 UTC

I have to disagree here. You cannot know which native APIs you call
have this property. Moreover the problems are not always occuring
reproducably but vary based on timing and whatnot.

IMO we really need to fix this generically in GLib, because it's
not fixable otherwise.

Also, what is the problem with simply fixing it?

Comment 11 Kristian Rietveld 2013-07-25 09:46:52 UTC

(In reply to comment #9)
> Gtk has code for making menubars on MacOS these days and it's improving lately.

That doesn't help GTK+2, and the majority of GTK+ applications deployed on OS X are based on GTK+2. Secondly, there are cases next to native menus where problems could turn up, we have not identified all of these yet. So, not all possible cases can be fixed within this integration layer.


>  I'd prefer if we just dealt with it there and put up a sign post warning
> anyone else playing around with this sort of stuff that they have to be very
> very careful not to recurse the mainloop or to make callbacks to code that
> might.

We are not playing around here, we have Quartz main loop code carefully engineered by Owen Taylor, that overrides the poll func. If this is indeed too troublesome to fix, the GLib main loop documentation will need a fat warning that poll func overrides should not be leading to recursive invocation in the main loop.

Comment 12 Dan Winship 2013-07-25 12:30:27 UTC

we probably ought to get Owen's take on this...

Comment 13 Dan Winship 2013-07-25 12:34:41 UTC

Comment on attachment 249367 [details] [review]
Proposed patch

also, as mentioned above, this particular patch is wrong, because context->cached_poll_array might change during the call to g_main_context_poll(), causing the revents to be written to the wrong location when it returns.

(and if the behavior change is going to go in, we should mention in g_main_context_set_poll_func() that it's safe to recurse like this, and we should test it in glib/tests/mainloop.c to ensure it doesn't get broken in the future)

Comment 14 Kristian Rietveld 2013-08-10 06:09:38 UTC

Yes, I felt the patch was not 100% right. But I want to get word from the maintainers that this patch is going to be accepted, before spending any more time on it. I hope we get Owen's take on it soonish.

Comment 15 Kristian Rietveld 2013-12-08 16:13:11 UTC

Recently, I have very closely studied Owen's original code. It turns out that Owen's original code already noted that you cannot iterate the GLib main loop from a callback from the OS X run loop while GLib is polling:

 * The main known limitation of this code is that if a callback is triggered
 * via the OS X run loop while we are "polling" (in either case described
 * above), iteration of the GLib main loop is not possible from within
 * that callback. If the programmer tries to do so explicitly, then they
 * will get a warning from GLib "main loop already active in another thread".

The mentioned warning however, was removed from GLib 9 Sep 2011. So I *think* that we have been falsely under the impression that this warning didn't occur and thus that something else must be up :)

So I assume Owen's take on it would be: this is not supported. I guess we have to see if there's a generic mechanism that can be used to delay GLib callbacks to until after the polling phase.

Comment 16 John Ralls 2013-12-08 17:07:39 UTC

This doesn't surprise me at all, and I think it's the root cause of Bug 701571, Bug 674108, and probably many others.

The underlying cause is that nextEventMatchingMask: doesn't just retrieve the event, it dispatches it. That's where our recursion in the poll function is coming from.

The way most of the gtk callback code works is to queue an idle event that will fire after the poll function returns. Mitch did that when he wrote the original ige-mac-menu code, and Paul and I copied that when we moved it from Carbon to Cocoa. That's unfortunately not an obvious thing to do when handling a dialog box, particularly a modal one with a nested mainloop, especially if one is primarily writing for Linux and not experienced with integrating the mainloop with the native event loops on other platforms.

ISTM there are three ways to go: 

* Restore the warning, but make it obvious that the issue is recursion in the poll function rather than another thread. IIRC it was removed when GTK was made single-threaded. This is the simplest.

* Find a way to detect an available NSEvent without using nextEventMatchingMask: or interrupt the dispatch in Cocoa or CF so that it can be deferred to gdk_event_dispatch. I don't know if this is possible, but it's the way that best matches the way mainloop works.

* Move nextEventMatchingMask: to gdk_event_dispatch() and run that function on every iteration. This may well have other consequences and would require thorough testing.

*

Comment 17 Allison Karlitskaya (desrt) 2013-12-18 12:57:14 UTC

(In reply to comment #16)
> * Find a way to detect an available NSEvent without using
> nextEventMatchingMask: or interrupt the dispatch in Cocoa or CF so that it can
> be deferred to gdk_event_dispatch. I don't know if this is possible, but it's
> the way that best matches the way mainloop works.
> 
> * Move nextEventMatchingMask: to gdk_event_dispatch() and run that function on
> every iteration. This may well have other consequences and would require
> thorough testing.

fwiw, I have a very strong preference for solving it this way and doing away with the custom poll function altogether.  After a bit of research, it seems like the NSRunLoop is using kqueue internally, so it could theoretically be possible to just get their fd (and expected minimum timeout) and poll on it, but I can't find a public API to do this and I don't think this is something that we should be too hacky with...

Comment 18 Allison Karlitskaya (desrt) 2013-12-18 14:15:54 UTC

Another interesting possibility would be to move a bit of the quartz hacks into GMainContext in a generic way and turn the entire thing inside out.

It's something that I've considered for a long time: in our epoll() future, we will be able to provide a single fd that we can pass to people and have them poll on that will let them know "GMainContext is ready to handle events".  For the non-epoll() case we'd need to emulate this with a thread that did the polling and an eventfd or pipe to communicate this ready state back to whoever was doing the watching.  This would look pretty similar to the thread that the quartz backend code creates now (which is what I mean by moving some of the quartz hacks into GMainContext).  No change so far, really.

With a generic interface for getting a single pollable fd from GMainContext we could then provide a generic mechanism to allow GMainContext to become subservient to other mainloops.  (aside: this was broached in bug 699132 but the approach there involved adding every single fd and timer separately to the other mainloop, discarding priorities in the process).  Every call to g_main_context_iteration() would then be passed off to a callback where (at least for the Quartz backend) we would simply iterate the NSApplication in the usual way except that when we saw activity on the GMainContext fd, we'd call back into GMainContext, telling it to handle it.

Looking toward the future, if we ever write a kqueue-based GMainContext, we could use its single fd as the one that we return and skip the thread entirely.

Comment 19 Sebastian Dröge (slomo) 2014-12-12 12:39:18 UTC

The last part is something I'm currently looking into. For GStreamer we're going to need the default main context when running on the main thread to dispatch any Cocoa events, and when it is running on another thread (or not at all, e.g. native Cocoa app) not do anything.

Currently we have a lot of hacks implemented for this, which all cause deadlocks and other problems. Having GMainContext do all this would allow to get rid of the hacks and would actually allow to implement it in a safe way.


Was there any work started on this already? My current idea would be to move most of gdkeventloop-quartz.c into GMainContext but only replace the poll function if the context is acquired on the application's main thread.

Main open problem seems to be that GDK wants to get all the NS events by itself to translate them into GDK events if required. For that I could imagine either having a GSource (like the UNIX signal source), that would allow GDK to subscribe to those events... or just allowing to set a "handle native events" function globally, which when unset would just dispatch the events and could be replaced by GDK with the current magic.

The other open problem is that this would dramatically change the behaviour of GMainContext on OSX. That is requires changes in GDK to not break GDK seems bad already. I wonder how much other code would become broken out there, and do we care?

Comment 20 Paul Davis 2016-05-24 14:29:04 UTC

Most of the comments above have sadly missed the key point:

Owen's design exists because at the time he wrote in, CFRunLoopSource could not deal with file descriptors. This changed in OS X 10.5. So there's no need to even try to maintain his design at all. 

I opened #766840 as a way to track a redesign and reimplementation of the GDK event loop on Quartz. Sebastian already created a patch for glib that copies Owen's design into glib itself, which is a step in the right direction, but alas retains the dual thread design inherited from the lack of file descriptor support.

Comment 21 Owen Taylor 2016-06-22 16:20:18 UTC

(In reply to Paul Davis from comment #20)
> Most of the comments above have sadly missed the key point:
> 
> Owen's design exists because at the time he wrote in, CFRunLoopSource could
> not deal with file descriptors. This changed in OS X 10.5. So there's no
> need to even try to maintain his design at all. 
> 
> I opened #766840 as a way to track a redesign and reimplementation of the
> GDK event loop on Quartz. Sebastian already created a patch for glib that
> copies Owen's design into glib itself, which is a step in the right
> direction, but alas retains the dual thread design inherited from the lack
> of file descriptor support.

I basically don't think the dual thread design is related to the issue discussed here at all - if you use the Quartz API to wait for events, you can get this unexpected recursion.

If event loop was moved to GLib and GMainLoop or g_main_context_iteration() was virtualized in some fashion, then possibly you could mostly run in the "Quartz in control" mode of gdkeventloop-quartz.c, which is possibly less susceptible to this, since Quartz callbacks are dispatched outside of the scope of the GLib main loop. My general feeling is that the quartz-in-control mode is less compatible with arbitrary GLib code, but I don't have a lot of specific things to point to 10 years later. 

          /* If we fail to acquire the main context, that means someone is iterating
           * the main context in a different thread; we simply wait until this loop
           * exits and then try again at next entry. In general, iterating the loop
           * from a different thread is rare: it is only possible when GDK threading
           * is initialized and is not frequently used even then. So, we hope that
           * having GLib main loop iteration blocked in the combination of that and
           * a native modal operation is a minimal problem. We could imagine using a
           * thread that does g_main_context_wait() and then wakes us back up, but
           * the gain doesn't seem worth the complexity.
           */

Isn't particularly serious, I think - GDK threading is no longer even supported.

I also wrote that the recursive problem occurred "In either case described above" probably meaning that it also applied to quartz-in-control but I don't immediately see that being the case.

One last thing to think about is that for anything other than the default main context you'd be better off using the cross-platform code and not involving the Quartz run loop.

Comment 22 Paul Davis 2016-06-22 19:21:20 UTC

Owen, your design notes explicitly cite the lack of a way to wait for file descriptors as the reason for the design.

My suggestion is a design that doesn't create opposing "glib-in-control" or "quartz-in-control" but instead integrates them, which I believe can be done. Right now, you have 2 modes and 2 threads

Threads:
   * regular event loop, running either glib or quartz run loop
   * select thread waiting on file descriptors, injecting events into the quartz
       run loop so that it returns to the glib "level"

Modes:
   * glib-in-control : so we can wait for all kinds of events
   * quartz-in-control : nothing to do at the glib level, let quartz do its stuff.

These can be integrated into 1 thead and 1 mode:

   * thread runs glib event loop
   * when glib event loop goes idle, falls back to quartz run loop, which
       will do redrawing and then wait for events.
   * when an event (including file descriptors) occurs, the quartz run loop
       will wake up and return to the glib level

It is true that inside the quartz run loop, glib is not "directly" in control. And it is true that drawing (exposes/draws) will take place from inside the quartz code, rather than from within glib code. But I think this is still a much cleaner design that the dual-threaded one that exists today.

I don't believe you will get any recursion from using the Quartz run loop, other than when the run loop is run recursively via application code choice.

Comment 23 Owen Taylor 2016-06-28 20:07:05 UTC

(In reply to Paul Davis from comment #22)
> Owen, your design notes explicitly cite the lack of a way to wait for file
> descriptors as the reason for the design.

The lack of a way to wait for file descriptors is the reason for the *two threads* - this is only part of the design.

> My suggestion is a design that doesn't create opposing "glib-in-control" or
> "quartz-in-control" but instead integrates them, which I believe can be
> done. Right now, you have 2 modes and 2 threads
> 
> Threads:
>    * regular event loop, running either glib or quartz run loop
>    * select thread waiting on file descriptors, injecting events into the
> quartz
>        run loop so that it returns to the glib "level"
> 
> Modes:
>    * glib-in-control : so we can wait for all kinds of events
>    * quartz-in-control : nothing to do at the glib level, let quartz do its
> stuff.

This is not a correct intepretation of quartz-in-control - GLib sources are still checked and dispatched in quartz-in-control mode.

> These can be integrated into 1 thead and 1 mode:
> 
>    * thread runs glib event loop
>    * when glib event loop goes idle, falls back to quartz run loop, which
>        will do redrawing and then wait for events.
>    * when an event (including file descriptors) occurs, the quartz run loop
>        will wake up and return to the glib level

This would require "MainLoop or g_main_context_iteration() was virtualized in some fashion" as I described, but what you describe would not result in events being fired at the right time with the right ordering; depending on priority, GLib sources need to preempt native events or vice versa - the current code gets that right in both modes.

> It is true that inside the quartz run loop, glib is not "directly" in
> control. And it is true that drawing (exposes/draws) will take place from
> inside the quartz code, rather than from within glib code. But I think this
> is still a much cleaner design that the dual-threaded one that exists today.

Given the native ability to wait for file descriptors, the dual threading can be removed. That doesn't change nearly as much about how the two run loops are integrated as you say here.
 
> I don't believe you will get any recursion from using the Quartz run loop,
> other than when the run loop is run recursively via application code choice.

That is what this bug is about - when the run loop is run recursively via application code choice.

Comment 24 Paul Davis 2016-06-28 20:43:14 UTC

(In reply to Owen Taylor from comment #23)
> (In reply to Paul Davis from comment #22)

> > Modes:
> >    * glib-in-control : so we can wait for all kinds of events
> >    * quartz-in-control : nothing to do at the glib level, let quartz do its
> > stuff.
> 
> This is not a correct intepretation of quartz-in-control - GLib sources are
> still checked and dispatched in quartz-in-control mode.

I don't think it is accurate to say the are "checked and dispatched in quartz-in-control mode". They are checked, and the quartz-in-control mode exits, and we return to glib-in-control mode (maybe ... see final links in this reply).

> 
> > These can be integrated into 1 thead and 1 mode:
> > 
> >    * thread runs glib event loop
> >    * when glib event loop goes idle, falls back to quartz run loop, which
> >        will do redrawing and then wait for events.
> >    * when an event (including file descriptors) occurs, the quartz run loop
> >        will wake up and return to the glib level
> 
> This would require "MainLoop or g_main_context_iteration() was virtualized
> in some fashion" as I described,

I think it is already sufficiently virtualized. The pollfunc can be supplied (as now). It simply calls into the existing "hey quartz do your stuff and return to me when a source is ready", BUT without the additional select thread, since all fd Glib sources are just CFRunLoopSources as well. What more is needed?

 but what you describe would not result in
> events being fired at the right time with the right ordering; depending on
> priority, GLib sources need to preempt native events or vice versa - the
> current code gets that right in both modes.

Sources for NSEventLoops have priorities (at least Version 0 sources, which is what we use and would use). I don't see a reason why we could not get ordering correct.

> 
> > It is true that inside the quartz run loop, glib is not "directly" in
> > control. And it is true that drawing (exposes/draws) will take place from
> > inside the quartz code, rather than from within glib code. But I think this
> > is still a much cleaner design that the dual-threaded one that exists today.
> 
> Given the native ability to wait for file descriptors, the dual threading
> can be removed. That doesn't change nearly as much about how the two run
> loops are integrated as you say here.
>  
> > I don't believe you will get any recursion from using the Quartz run loop,
> > other than when the run loop is run recursively via application code choice.
> 
> That is what this bug is about - when the run loop is run recursively via
> application code choice.

Fair enough. My comments are attacking this from a different angle, as described in https://mail.gnome.org/archives/gtk-devel-list/2016-May/msg00013.html and referenced as https://bugzilla.gnome.org/show_bug.cgi?id=766840

Comment 25 Owen Taylor 2016-06-29 15:06:25 UTC

(In reply to Paul Davis from comment #24)
> (In reply to Owen Taylor from comment #23)
> > (In reply to Paul Davis from comment #22)
> 
> > > Modes:
> > >    * glib-in-control : so we can wait for all kinds of events
> > >    * quartz-in-control : nothing to do at the glib level, let quartz do its
> > > stuff.
> > 
> > This is not a correct intepretation of quartz-in-control - GLib sources are
> > still checked and dispatched in quartz-in-control mode.
> 
> I don't think it is accurate to say the are "checked and dispatched in
> quartz-in-control mode". They are checked, and the quartz-in-control mode
> exits, and we return to glib-in-control mode (maybe ... see final links in
> this reply).

See calls to g_main_context_dispatch() in run_loop_before_sources() and run_loop_after_waiting().

> > > These can be integrated into 1 thead and 1 mode:
> > > 
> > >    * thread runs glib event loop
> > >    * when glib event loop goes idle, falls back to quartz run loop, which
> > >        will do redrawing and then wait for events.
> > >    * when an event (including file descriptors) occurs, the quartz run loop
> > >        will wake up and return to the glib level
> > 
> > This would require "MainLoop or g_main_context_iteration() was virtualized
> > in some fashion" as I described,
> 
> I think it is already sufficiently virtualized. The pollfunc can be supplied
> (as now). It simply calls into the existing "hey quartz do your stuff and
> return to me when a source is ready", BUT without the additional select
> thread, since all fd Glib sources are just CFRunLoopSources as well. What
> more is needed?

The poll() function is *supposed* to simply wait and do nothing, and then sources are dispatched afterwards. The fact that user callbacks can be called within the poll() function on Quartz causes problems currently. Moving things so that *all* source will be dispatched within the poll() function is going to cause far more problems.

I'm also pretty skeptical that it is possible to move to a world where there is a CFRunLoopSource per GSource in a way that is compatible with the current GSource API. It would certainly be a huge project in API migration.

> Fair enough. My comments are attacking this from a different angle, as
> described in
> https://mail.gnome.org/archives/gtk-devel-list/2016-May/msg00013.html and
> referenced as https://bugzilla.gnome.org/show_bug.cgi?id=766840

In the quartz-in-control mode, you could tweak the priority check in run_loop_before_sources() to prevent events and redrawing from being preempted by quartz sources.

Then for a particular application (if not GLib generically) you could just run a Quartz run loop instead of g_main_loop_run().

Comment 26 GNOME Infrastructure Team 2018-05-24 15:31:37 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/732.