GNOME Bugzilla – Bug 755490
Segmentation fault on mpv segfault
Last modified: 2015-10-12 19:32:56 UTC
While trying to play video in mpv, which utilizes the GLX OpenGL backend through XWayland, it would crash and take gnome-shell wayland compositor along with it. Stack trace captured by systemd-coredump: Coredump: /var/lib/systemd/coredump/core.gnome-shell.1000.e109f832e546428f95e5d794761d670c.600.1443031788000000.xz Message: Process 600 (gnome-shell) of user 1000 dumped core. Stack trace of thread 600: #0 0x00007f324f1984e8 __GI_raise (libc.so.6) #1 0x00007f324f19993a __GI_abort (libc.so.6) #2 0x00007f3250d60495 g_assertion_message (libglib-2.0.so.0) #3 0x00007f3250d6052a g_assertion_message_expr (libglib-2.0.so.0) #4 0x00007f32548cff78 toplevel_surface_commit (libmutter.so.0) #5 0x00007f32548cd524 meta_wayland_surface_role_commit (libmutter.so.0) #6 0x00007f324d67d080 ffi_call_unix64 (libffi.so.6) #7 0x00007f324d67caeb ffi_call (libffi.so.6) #8 0x00007f3250497b58 wl_closure_invoke (libwayland-server.so.0) #9 0x00007f3250494196 wl_client_connection_data (libwayland-server.so.0) #10 0x00007f3250495f22 wl_event_loop_dispatch (libwayland-server.so.0) #11 0x00007f32548c3e57 wayland_event_source_dispatch (libmutter.so.0) #12 0x00007f3250d3aa77 g_main_dispatch (libglib-2.0.so.0) #13 0x00007f3250d3acd0 g_main_context_iterate (libglib-2.0.so.0) #14 0x00007f3250d3aff2 g_main_loop_run (libglib-2.0.so.0) #15 0x00007f32548975b6 meta_run (libmutter.so.0) #16 0x000000000040242f main (/usr/bin/gnome-shell (deleted)) #17 0x00007f324f1855e0 __libc_start_main (libc.so.6) #18 0x0000000000402539 _start (/usr/bin/gnome-shell (deleted)) Stack trace of thread 603: #0 0x00007f324f24505d poll (libc.so.6) #1 0x00007f3250d3ac6c g_main_context_poll (libglib-2.0.so.0) #2 0x00007f3250d3aff2 g_main_loop_run (libglib-2.0.so.0) #3 0x00007f325218d276 gdbus_shared_thread_func (libgio-2.0.so.0) #4 0x00007f3250d61285 g_thread_proxy (libglib-2.0.so.0) #5 0x00007f324f50f434 start_thread (libpthread.so.0) #6 0x00007f324f24da0d __clone (libc.so.6) Stack trace of thread 640: #0 0x00007f324f514fbf pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f324ce4f2a0 PR_WaitCondVar (libnspr4.so) #2 0x00007f3257b86786 n/a (libmozjs-24.so) #3 0x00007f324ce55da4 n/a (libnspr4.so) #4 0x00007f324f50f434 start_thread (libpthread.so.0) #5 0x00007f324f24da0d __clone (libc.so.6) Stack trace of thread 605: #0 0x00007f324f24505d poll (libc.so.6) #1 0x00007f3250d3ac6c g_main_context_poll (libglib-2.0.so.0) #2 0x00007f3250d3ad7c g_main_context_iteration (libglib-2.0.so.0) #3 0x00007f324082423d n/a (libdconfsettings.so) #4 0x00007f3250d61285 g_thread_proxy (libglib-2.0.so.0) #5 0x00007f324f50f434 start_thread (libpthread.so.0) #6 0x00007f324f24da0d __clone (libc.so.6) Stack trace of thread 639: #0 0x00007f324f514fbf pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007f324ce4f2a0 PR_WaitCondVar (libnspr4.so) #2 0x00007f3257b1107e n/a (libmozjs-24.so) #3 0x00007f324ce55da4 n/a (libnspr4.so) #4 0x00007f324f50f434 start_thread (libpthread.so.0) #5 0x00007f324f24da0d __clone (libc.so.6) Stack trace of thread 602: #0 0x00007f324f24505d poll (libc.so.6) #1 0x00007f3250d3ac6c g_main_context_poll (libglib-2.0.so.0) #2 0x00007f3250d3ad7c g_main_context_iteration (libglib-2.0.so.0) #3 0x00007f3250d3adb9 glib_worker_main (libglib-2.0.so.0) #4 0x00007f3250d61285 g_thread_proxy (libglib-2.0.so.0) #5 0x00007f324f50f434 start_thread (libpthread.so.0) #6 0x00007f324f24da0d __clone (libc.so.6) Stack trace of thread 638: #0 0x00007f324f24505d poll (libc.so.6) #1 0x00007f3258af0871 n/a (libpulse.so.0) #2 0x00007f3258ae1ee1 pa_mainloop_poll (libpulse.so.0) #3 0x00007f3258ae257e pa_mainloop_iterate (libpulse.so.0) #4 0x00007f3258ae2630 pa_mainloop_run (libpulse.so.0) #5 0x00007f3258af0806 n/a (libpulse.so.0) #6 0x00007f324ef391b8 n/a (libpulsecommon-6.99.so) #7 0x00007f324f50f434 start_thread (libpthread.so.0) #8 0x00007f324f24da0d __clone (libc.so.6) It's worth noting that mpv segfaults in i965_dri.so, so it's probably a graphics driver bug. Still, it shouldn't take the compositor down. GNOME 3.18, Mesa-11.0.0, Xorg-Server-1.17.99.901 (XWayland), Linux-4.3-rc2.
Backtrace from gdb:
+ Trace 235489
Created attachment 311992 [details] [review] MetaWaylandSurface: Don't assume a toplevel always have a MetaWindow When committing a toplevel surface we might no longer have a MetaWindow associated with it. The reason may vary but some are: a popup was dismissed, the client attached and committed a NULL buffer to a wl_surface with the wl_shell_surface role, the client committed a buffer to a wl_surface which previously had an toplevel window role which extension object was destroyed.
After applying the patch, neither mpv nor gnome-shell/mutter crash anymore.
Review of attachment 311992 [details] [review]: So, I thought of doing basically the same as this patch in bug 753237 but then I decided not to and just add the specific case for xdg_popups since that's a legitimate race. Is there a legitimate race here too or are we just going to paper over client misuse of the protocols?
(In reply to Rui Matos from comment #4) > Review of attachment 311992 [details] [review] [review]: > > So, I thought of doing basically the same as this patch in bug 753237 but > then I decided not to and just add the specific case for xdg_popups since > that's a legitimate race. > > Is there a legitimate race here too or are we just going to paper over > client misuse of the protocols? The client in question AFAIK uses wl_shell_surface which is a bit under specified. We could paper over wl_shell_surface a bit more if you prefer that, but then again its not really specified whatever happens to commits of an xdg_surface/popup after the extension object is destroyed. Currently we keep the role as is (because we need to remember the role), and we should handle that better, but that's a different issue from this.
(In reply to Jonas Ådahl from comment #5) > its not really specified whatever happens to commits of > an xdg_surface/popup after the extension object is destroyed. What I've always imagined is that commits on surfaces without roles are no-ops. I had that in mind when writing xdg_surface / xdg_popup. Later, I realized that some ways that people used cursor roles actually worked if you committed and then called set_cursor, which was strange to me, but I had to support it because it was never specified, *sigh*.
(In reply to Jasper St. Pierre from comment #6) > (In reply to Jonas Ådahl from comment #5) > > its not really specified whatever happens to commits of > > an xdg_surface/popup after the extension object is destroyed. > > What I've always imagined is that commits on surfaces without roles are > no-ops. I had that in mind when writing xdg_surface / xdg_popup. Later, I > realized that some ways that people used cursor roles actually worked if you > committed and then called set_cursor, which was strange to me, but I had to > support it because it was never specified, *sigh*. So should we go with the patch for now?
Yes.
Attachment 311992 [details] pushed as 8b0b0cf - MetaWaylandSurface: Don't assume a toplevel always have a MetaWindow
*** Bug 756343 has been marked as a duplicate of this bug. ***