GNOME Bugzilla – Bug 763852
gdk/wayland: event source is not multi-thread aware
Last modified: 2016-05-18 03:07:07 UTC
The current usage of the Wayland dispatching API is not conducive to multi-threaded usage and as a result, using Wayland from multiple threads can lead to deadlocks. e.g. GStreamer and/or EGL usage off the main thread will hit Wayland from multiple threads. From the Wayland man pages, the solution is to use the wl_display_prepare_read(), wl_display_read_events() and wl_display_cancel_read() family of functions.
Created attachment 324232 [details] [review] gdk/wayland: use the multi-thread safe wayland dispatching API
Review of attachment 324232 [details] [review]: Indeed. The patch looks mostly good, except the inline comment and that you don't cancel on finalize, if we ended up having prepared and not dispatched. FWIW, the wl_display_dispatch(_queue)() API is multi-thread safe if used correctly (i.e. don't mix poll() and dispatch() more or less), so should probably update the commit message to reflect that. ::: gdk/wayland/gdkeventsource.c @@ +63,3 @@ + g_error ("Error dispatching display: %s", g_strerror (errno)); + } + You should set source->reading to TRUE here.
Created attachment 324237 [details] [review] gdk/wayland: use the multi-thread safe wayland dispatching API (In reply to Jonas Ådahl from comment #2) > Indeed. The patch looks mostly good, except the inline comment and that you > don't cancel on finalize, if we ended up having prepared and not dispatched. The reentrant thing? I've changed it to mention balancing prepare/cancel/read instead which makes more sense. cancel on finalize() is fixed. > FWIW, the wl_display_dispatch(_queue)() API is multi-thread safe if used > correctly (i.e. don't mix poll() and dispatch() more or less), so should > probably update the commit message to reflect that. Something like this? > ::: gdk/wayland/gdkeventsource.c > @@ +63,3 @@ > + g_error ("Error dispatching display: %s", g_strerror (errno)); > + } > + > > You should set source->reading to TRUE here. Fixed
Review of attachment 324237 [details] [review]: Sorry, missed another issue in the first version. ::: gdk/wayland/gdkeventsource.c @@ +69,1 @@ if (wl_display_flush (display->wl_display) < 0) Out of scope for this patch, but maybe in scope for the bug, wl_display_flush() may return EAGAIN if it didn't manage to flush everything, meaning we may theoretically still dead-lock if we end up waiting for some reply to some request idling in our queue. @@ +81,2 @@ if (source->display->event_pause_count > 0) return _gdk_event_queue_find_first (source->display) != NULL; Oops. Need to call cancel() here as well, since we prepared but didn't read.
Created attachment 324239 [details] [review] gdk/wayland: use the multi-thread safe wayland dispatching API (In reply to Jonas Ådahl from comment #4) > Review of attachment 324237 [details] [review] [review]: > > Sorry, missed another issue in the first version. > > ::: gdk/wayland/gdkeventsource.c > @@ +69,1 @@ > if (wl_display_flush (display->wl_display) < 0) > > Out of scope for this patch, but maybe in scope for the bug, > wl_display_flush() may return EAGAIN if it didn't manage to flush > everything, meaning we may theoretically still dead-lock if we end up > waiting for some reply to some request idling in our queue. Fixing that would require polling with G_IO_OUT as well to check writability.\ and then state tracking for that. > @@ +81,2 @@ > if (source->display->event_pause_count > 0) > return _gdk_event_queue_find_first (source->display) != NULL; > > Oops. Need to call cancel() here as well, since we prepared but didn't read. Fixed.
How is this relevant to GTK+s use of libwayland ? How do you end up with multiple threads sharing the wayland socket that GDK is using ?
(In reply to Matthias Clasen from comment #6) > How is this relevant to GTK+s use of libwayland ? How do you end up with > multiple threads sharing the wayland socket that GDK is using ? EGL and/or GStreamer and/or possibly Vulkan. 1. mesa internally creates a wl_event_queue for all it's winsys operations and polls/dispatches that as necessary from the calling thread. The moment one performs EGL operations off the main thread, you're prone to deadlocks. gtkglarea itself isn't affected because everything is performed on the main thread. 2. GStreamer both with or without OpenGL/EGL - The GStreamer elements waylandsink (using subsurfaces), glimagesink (using subsurfaces and EGL), or gtkglsink (using EGL) have their own rendering threads and OpenGL contexts that they're manipulating off the main thread which will poll and dispatch the wayland fd. Side note, we have to do a similar thing on X11 in GStreamer's gtkglsink (at the very least on Intel hardware) as sharing OpenGL contexts does not work across multiple X11 display connections like it does on some other hardware/drivers. Using XReparentWindow works across X11 display connections but produces rendering artifacts when resizing. gtkglsink doesn't have these artifacts as the frame updates are integrated into Gtk+'s draw cycle.
Created attachment 324334 [details] [review] gdk/wayland: use the multi-thread safe wayland dispatching API This contains a small change to fix a deadlock explained below. The scenario is two GMainLoops with separate GSource's polling the same wayland fd. Source 1 in Thread 1 can be woken up from another GSource (timeout, idle, frame clock, ...) becoming ready calling _check(). There has been no write/activity on the wayland fd so revents is 0. Thread 2, Source 2 is still in poll() as it has not been woken up. Source 1 calling wl_display_read_events() in _check() in this case will block on a condition waiting for Source 2 to exit poll() and call wl_display_read_events() or wl_display_cancel_read() which will not occur until e.g. the user provides some input and causes Thread 2's poll() to wake up. Fix is to only wl_display_read_events() on actual fd activity (signalled through (revents & G_IO_IN)) and _cancel_read() in all other cases.
Review of attachment 324334 [details] [review]: ::: gdk/wayland/gdkeventsource.c @@ +58,3 @@ + * wl_display_cancel_read() */ + if (source->reading) + wl_display_cancel_read (display->wl_display); When will this actually happen? Anyway, if it is needed, we should cancel the read the first thing in this function, so that we don't leave this function without later polling the fd. @@ +60,3 @@ + wl_display_cancel_read (display->wl_display); + + while (wl_display_prepare_read (display->wl_display) != 0) I think we can just do if (wl_display_prepare_read (display->wl_display) != 0) return TRUE; as it means we can already dispatch without polling. We also need to wl_display_prepare_read() on the potential return FALSE; path above (the first if statement in this function), so that we always have prepared for reading when we return FALSE. @@ +81,1 @@ + if (source->display->event_pause_count > 0) { nit: { sholud go on its own line. @@ +90,3 @@ + { + if (wl_display_read_events (display_wayland->wl_display) < 0) + g_error ("Error reading events from display: %s", g_strerror (errno)); I think it might be a good idea to either g_assert (source->reading) or g_return_if_fail (source->reading) before actually reading here.
Created attachment 325118 [details] [review] gdk/wayland: use the multi-thread safe wayland dispatching API (In reply to Jonas Ådahl from comment #9) > Review of attachment 324334 [details] [review] [review]: > ::: gdk/wayland/gdkeventsource.c > @@ +58,3 @@ > + * wl_display_cancel_read() */ > + if (source->reading) > + wl_display_cancel_read (display->wl_display); > > When will this actually happen? Anyway, if it is needed, we should cancel > the read the first thing in this function, so that we don't leave this > function without later polling the fd. It used to happen in earlier patch versions. I've replaced it with an assert instead. > @@ +60,3 @@ > + wl_display_cancel_read (display->wl_display); > + > + while (wl_display_prepare_read (display->wl_display) != 0) > > I think we can just do if (wl_display_prepare_read (display->wl_display) != > 0) return TRUE; as it means we can already dispatch without polling. > > We also need to wl_display_prepare_read() on the potential return FALSE; > path above (the first if statement in this function), so that we always have > prepared for reading when we return FALSE. As in when one of the previous conditions returns FALSE? I took a different approach by only reading in check() when source->reading is TRUE. Also, I wasn't sure if the revents & IN condition before dispatch_pending() would always succeed when preemptively returning TRUE from prepare_read() so I removed that as well so we always dispatch regardless of revents state. I think that's still safe with the other gdk conditions possibly returning TRUE from prepare(). > @@ +81,1 @@ > + if (source->display->event_pause_count > 0) { > > nit: { sholud go on its own line. Fixed > @@ +90,3 @@ > + { > + if (wl_display_read_events (display_wayland->wl_display) < 0) > + g_error ("Error reading events from display: %s", g_strerror > (errno)); > > I think it might be a good idea to either g_assert (source->reading) or > g_return_if_fail (source->reading) before actually reading here. That fails as prepare can return FALSE without reading (as you pointed out above). I took the opposite approach of only reading when source->reading is TRUE as that's simpler and involves less state tracking.
Review of attachment 325118 [details] [review]: I think it looks fine except for one detail which I haven't managed to find clear documentation about. ::: gdk/wayland/gdkeventsource.c @@ +56,3 @@ + * wl_display_read_events() or wl_display_cancel_read() + * (in gdk_event_source_check() */ + g_assert (source->reading == FALSE); I can't find any documentation saying a non-mainloop caller may never prepare() only once before calling check(), so I think this assert is a bit dangerous. In other words, I think we should maybe change this assert to if (source->reading) return FALSE;
Created attachment 327362 [details] [review] gdk/wayland: use the multi-thread safe wayland dispatching API (In reply to Jonas Ådahl from comment #11) > Review of attachment 325118 [details] [review] [review]: > > I think it looks fine except for one detail which I haven't managed to find > clear documentation about. > > ::: gdk/wayland/gdkeventsource.c > @@ +56,3 @@ > + * wl_display_read_events() or wl_display_cancel_read() > + * (in gdk_event_source_check() */ > + g_assert (source->reading == FALSE); > > I can't find any documentation saying a non-mainloop caller may never > prepare() only once before calling check(), so I think this assert is a bit > dangerous. In other words, I think we should maybe change this assert to if > (source->reading) return FALSE; Now uses if (source->reading) return FALSE;
Review of attachment 327362 [details] [review]: Looks good to me now.