GNOME Bugzilla – Bug 765062
Wayland: Inspector GtkLabel drag'n'drop freezes gnome-shell session
Last modified: 2016-04-26 00:24:29 UTC
I can't really try this out a lot since it freezes the session and I have to restart my PC :( I think the first time it happened I saw some console output but can't remember anymore what it was. The easy way to reproduce is just: Open the inspector, select text in a selectable GtkLabel (like in the General tab), then try to drag that text. Happens with 3.20.3 as well as master.
I can't reproduce this in plain weston, so reassigning to mutter.
I didn't manage to reproduce this in bare mutter. There, trying to reproduce what you describe kills the client because it sends bogus requests (it tries to pass an object with the wrong type). When trying on gnome-shell, it does indeed freeze the session, and the reason it freezes is that it gets into a loop that never ends in block_source() in gmain.c where for some reason tmp_list == tmp_list->next == tmp_list->data. The reason for things breaking, anyhow, seems to be because when the inspector is open, there are two connections open to the Wayland compositor, and when selecting text and dragging, gtk creates a buffer with one connection, then tries to attach it to a surface on another connection. We first need to fix this on the gtk side, by not mixing objects on different connections, but we should also look into why gnome-shell ends up messing up the sources. I will first continue investigating where mutter can't handle invalid input.
After further digging, it seems that the messed up source that has entered an eternal loop (due to poll_fds == poll_fds->next == poll_fds->data), is from gjs, more precisely auto garbage collector idle source (with the callback trigger_gc_if_needed()).
It was a memory corruption bug related to drag-and-drop. Attaching a patch.
Created attachment 326672 [details] [review] wayland: Handle wl_data_device being destroyed while focused A wl_data_device object may be created while it is being focused, either because the client destroyed it or because the client was destroyed. Handle this by early out in focus handler vfuncs the case where it was destroyed, so that we don't corrupt memory and/or cause segmentation fault.
Comment on attachment 326672 [details] [review] wayland: Handle wl_data_device being destroyed while focused Indeed, makes sense. Funky that I wasn't able to reproduce (I only got the protocol error), perhaps I was moving the cursor outside the client altogether.
(In reply to Carlos Garnacho from comment #6) > Comment on attachment 326672 [details] [review] [review] > wayland: Handle wl_data_device being destroyed while focused > > Indeed, makes sense. Funky that I wasn't able to reproduce (I only got the > protocol error), perhaps I was moving the cursor outside the client > altogether. I got only the protocol error when running mutter nested. Maybe we were "lucky" and it corrupted some irrelevant memory. The reason for the freeze was that, because the client was destroyed due to the error, the focus was lost, and the wl_list_remove() in the focus_out handler, pointing to memory that was not any of its business, happened to set the poll_fd GList in the gjs GC idle task source to an endless loop, and I guess when running nested mutter, it just happened to poke around in some memory not causing any visible errors.
Attachment 326672 [details] pushed as 08ac192 - wayland: Handle wl_data_device being destroyed while focused
Opened bug 765565 for the gtk+ side.