After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 755490 - Segmentation fault on mpv segfault
Segmentation fault on mpv segfault
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: wayland
3.18.x
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
: 756343 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2015-09-23 18:15 UTC by Armin K.
Modified: 2015-10-12 19:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
MetaWaylandSurface: Don't assume a toplevel always have a MetaWindow (1.92 KB, patch)
2015-09-24 00:28 UTC, Jonas Ådahl
committed Details | Review

Description Armin K. 2015-09-23 18:15:47 UTC
While trying to play video in mpv, which utilizes the GLX OpenGL backend through XWayland, it would crash and take gnome-shell wayland compositor along with it.

Stack trace captured by systemd-coredump:

      Coredump: /var/lib/systemd/coredump/core.gnome-shell.1000.e109f832e546428f95e5d794761d670c.600.1443031788000000.xz
       Message: Process 600 (gnome-shell) of user 1000 dumped core.
                
                Stack trace of thread 600:
                #0  0x00007f324f1984e8 __GI_raise (libc.so.6)
                #1  0x00007f324f19993a __GI_abort (libc.so.6)
                #2  0x00007f3250d60495 g_assertion_message (libglib-2.0.so.0)
                #3  0x00007f3250d6052a g_assertion_message_expr (libglib-2.0.so.0)
                #4  0x00007f32548cff78 toplevel_surface_commit (libmutter.so.0)
                #5  0x00007f32548cd524 meta_wayland_surface_role_commit (libmutter.so.0)
                #6  0x00007f324d67d080 ffi_call_unix64 (libffi.so.6)
                #7  0x00007f324d67caeb ffi_call (libffi.so.6)
                #8  0x00007f3250497b58 wl_closure_invoke (libwayland-server.so.0)
                #9  0x00007f3250494196 wl_client_connection_data (libwayland-server.so.0)
                #10 0x00007f3250495f22 wl_event_loop_dispatch (libwayland-server.so.0)
                #11 0x00007f32548c3e57 wayland_event_source_dispatch (libmutter.so.0)
                #12 0x00007f3250d3aa77 g_main_dispatch (libglib-2.0.so.0)
                #13 0x00007f3250d3acd0 g_main_context_iterate (libglib-2.0.so.0)
                #14 0x00007f3250d3aff2 g_main_loop_run (libglib-2.0.so.0)
                #15 0x00007f32548975b6 meta_run (libmutter.so.0)
                #16 0x000000000040242f main (/usr/bin/gnome-shell (deleted))
                #17 0x00007f324f1855e0 __libc_start_main (libc.so.6)
                #18 0x0000000000402539 _start (/usr/bin/gnome-shell (deleted))
                
                Stack trace of thread 603:
                #0  0x00007f324f24505d poll (libc.so.6)
                #1  0x00007f3250d3ac6c g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f3250d3aff2 g_main_loop_run (libglib-2.0.so.0)
                #3  0x00007f325218d276 gdbus_shared_thread_func (libgio-2.0.so.0)
                #4  0x00007f3250d61285 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f324f50f434 start_thread (libpthread.so.0)
                #6  0x00007f324f24da0d __clone (libc.so.6)
                
                Stack trace of thread 640:
                #0  0x00007f324f514fbf pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f324ce4f2a0 PR_WaitCondVar (libnspr4.so)
                #2  0x00007f3257b86786 n/a (libmozjs-24.so)
                #3  0x00007f324ce55da4 n/a (libnspr4.so)
                #4  0x00007f324f50f434 start_thread (libpthread.so.0)
                #5  0x00007f324f24da0d __clone (libc.so.6)
                
                Stack trace of thread 605:
                #0  0x00007f324f24505d poll (libc.so.6)
                #1  0x00007f3250d3ac6c g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f3250d3ad7c g_main_context_iteration (libglib-2.0.so.0)
                #3  0x00007f324082423d n/a (libdconfsettings.so)
                #4  0x00007f3250d61285 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f324f50f434 start_thread (libpthread.so.0)
                #6  0x00007f324f24da0d __clone (libc.so.6)
                
                Stack trace of thread 639:
                #0  0x00007f324f514fbf pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x00007f324ce4f2a0 PR_WaitCondVar (libnspr4.so)
                #2  0x00007f3257b1107e n/a (libmozjs-24.so)
                #3  0x00007f324ce55da4 n/a (libnspr4.so)
                #4  0x00007f324f50f434 start_thread (libpthread.so.0)
                #5  0x00007f324f24da0d __clone (libc.so.6)
                
                Stack trace of thread 602:
                #0  0x00007f324f24505d poll (libc.so.6)
                #1  0x00007f3250d3ac6c g_main_context_poll (libglib-2.0.so.0)
                #2  0x00007f3250d3ad7c g_main_context_iteration (libglib-2.0.so.0)
                #3  0x00007f3250d3adb9 glib_worker_main (libglib-2.0.so.0)
                #4  0x00007f3250d61285 g_thread_proxy (libglib-2.0.so.0)
                #5  0x00007f324f50f434 start_thread (libpthread.so.0)
                #6  0x00007f324f24da0d __clone (libc.so.6)
                
                Stack trace of thread 638:
                #0  0x00007f324f24505d poll (libc.so.6)
                #1  0x00007f3258af0871 n/a (libpulse.so.0)
                #2  0x00007f3258ae1ee1 pa_mainloop_poll (libpulse.so.0)
                #3  0x00007f3258ae257e pa_mainloop_iterate (libpulse.so.0)
                #4  0x00007f3258ae2630 pa_mainloop_run (libpulse.so.0)
                #5  0x00007f3258af0806 n/a (libpulse.so.0)
                #6  0x00007f324ef391b8 n/a (libpulsecommon-6.99.so)
                #7  0x00007f324f50f434 start_thread (libpthread.so.0)
                #8  0x00007f324f24da0d __clone (libc.so.6)



It's worth noting that mpv segfaults in i965_dri.so, so it's probably a graphics driver bug. Still, it shouldn't take the compositor down.

GNOME 3.18, Mesa-11.0.0, Xorg-Server-1.17.99.901 (XWayland), Linux-4.3-rc2.
Comment 1 Armin K. 2015-09-23 18:22:59 UTC
Backtrace from gdb:

  • #0 __GI_raise
    at ../sysdeps/unix/sysv/linux/raise.c line 55
  • #1 __GI_abort
    at abort.c line 89
  • #2 g_assertion_message
  • #3 g_assertion_message_expr
    at gtestutils.c line 2444
  • #4 toplevel_surface_commit
    at wayland/meta-wayland-surface.c line 380
  • #5 meta_wayland_surface_role_commit
    at wayland/meta-wayland-surface.c line 2442
  • #6 apply_pending_state
    at wayland/meta-wayland-surface.c line 650
  • #7 ffi_call_unix64
    from /usr/lib/libffi.so.6
  • #8 ffi_call
    from /usr/lib/libffi.so.6
  • #9 wl_closure_invoke
    at src/connection.c line 945
  • #10 wl_client_connection_data
    at src/wayland-server.c line 339
  • #11 wl_event_loop_dispatch
    at src/event-loop.c line 422
  • #12 wayland_event_source_dispatch
    at wayland/meta-wayland.c line 85
  • #13 g_main_dispatch
    at gmain.c line 3154
  • #14 g_main_context_dispatch
    at gmain.c line 3769
  • #15 g_main_context_iterate
    at gmain.c line 3840
  • #16 g_main_loop_run
    at gmain.c line 4034
  • #17 meta_run
    at core/main.c line 437
  • #18 main
    at main.c line 463

Comment 2 Jonas Ådahl 2015-09-24 00:28:43 UTC
Created attachment 311992 [details] [review]
MetaWaylandSurface: Don't assume a toplevel always have a MetaWindow

When committing a toplevel surface we might no longer have a MetaWindow
associated with it. The reason may vary but some are: a popup was
dismissed, the client attached and committed a NULL buffer to a
wl_surface with the wl_shell_surface role, the client committed a
buffer to a wl_surface which previously had an toplevel window role
which extension object was destroyed.
Comment 3 Armin K. 2015-09-24 09:36:34 UTC
After applying the patch, neither mpv nor gnome-shell/mutter crash anymore.
Comment 4 Rui Matos 2015-09-24 13:43:56 UTC
Review of attachment 311992 [details] [review]:

So, I thought of doing basically the same as this patch in bug 753237 but then I decided not to and just add the specific case for xdg_popups since that's a legitimate race.

Is there a legitimate race here too or are we just going to paper over client misuse of the protocols?
Comment 5 Jonas Ådahl 2015-09-24 14:23:57 UTC
(In reply to Rui Matos from comment #4)
> Review of attachment 311992 [details] [review] [review]:
> 
> So, I thought of doing basically the same as this patch in bug 753237 but
> then I decided not to and just add the specific case for xdg_popups since
> that's a legitimate race.
> 
> Is there a legitimate race here too or are we just going to paper over
> client misuse of the protocols?

The client in question AFAIK uses wl_shell_surface which is a bit under specified. We could paper over wl_shell_surface a bit more if you prefer that, but then again its not really specified whatever happens to commits of an xdg_surface/popup after the extension object is destroyed. Currently we keep the role as is (because we need to remember the role), and we should handle that better, but that's a different issue from this.
Comment 6 Jasper St. Pierre (not reading bugmail) 2015-09-24 16:08:03 UTC
(In reply to Jonas Ådahl from comment #5)
> its not really specified whatever happens to commits of
> an xdg_surface/popup after the extension object is destroyed.

What I've always imagined is that commits on surfaces without roles are no-ops. I had that in mind when writing xdg_surface / xdg_popup. Later, I realized that some ways that people used cursor roles actually worked if you committed and then called set_cursor, which was strange to me, but I had to support it because it was never specified, *sigh*.
Comment 7 Jonas Ådahl 2015-09-29 00:07:01 UTC
(In reply to Jasper St. Pierre from comment #6)
> (In reply to Jonas Ådahl from comment #5)
> > its not really specified whatever happens to commits of
> > an xdg_surface/popup after the extension object is destroyed.
> 
> What I've always imagined is that commits on surfaces without roles are
> no-ops. I had that in mind when writing xdg_surface / xdg_popup. Later, I
> realized that some ways that people used cursor roles actually worked if you
> committed and then called set_cursor, which was strange to me, but I had to
> support it because it was never specified, *sigh*.

So should we go with the patch for now?
Comment 8 Jasper St. Pierre (not reading bugmail) 2015-09-29 00:45:53 UTC
Yes.
Comment 9 Jonas Ådahl 2015-09-29 01:10:32 UTC
Attachment 311992 [details] pushed as 8b0b0cf - MetaWaylandSurface: Don't assume a toplevel always have a MetaWindow
Comment 10 Jonas Ådahl 2015-10-10 14:04:52 UTC
*** Bug 756343 has been marked as a duplicate of this bug. ***