After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 783113 - Clicking with mice on wayland client crashes gnome-client
Clicking with mice on wayland client crashes gnome-client
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: wayland
3.22.x
Other Linux
: Normal critical
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2017-05-26 07:47 UTC by Naman Dixit
Modified: 2017-06-01 06:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Minimal source file to reproduce the bug (20.24 KB, text/x-csrc)
2017-05-26 08:52 UTC, Naman Dixit
  Details
wayland/pointer: Use glib signals tracking focus surface (3.79 KB, patch)
2017-05-31 09:30 UTC, Jonas Ådahl
committed Details | Review
wayland/pointer: Track lifetime of current surface (3.61 KB, patch)
2017-05-31 09:30 UTC, Jonas Ådahl
committed Details | Review

Description Naman Dixit 2017-05-26 07:47:36 UTC
(This is the first time I am filing a GNOME bug. If any more details/logs are needed, please tell me.)

I am using Fedora 25 with GNOME 3.22.2 on Dell Vostro 14 3446 laptop.

In a wayland client (window) that has the following function implementing the wl_pointer::button event, 

    void waylandPointerHandleButton (void *data, struct wl_pointer *pointer,
                                     Uint32 serial, Uint32 time, Uint32 button,
                                     Uint32 state)
    {
        if (button == BTN_LEFT && state == WL_POINTER_BUTTON_STATE_PRESSED) {
            game_is_running = false;
        }
    }

where "game_is_running" is a global variable that acts as control for a loop similar to below,

    while (game_is_running) {
        {
            wl_display_dispatch_pending(wayland_display);
        }

        {
            glClearColor(rs, 0.0, 0.0, 1.0);
            glClear(GL_COLOR_BUFFER_BIT);
            glFlush();
        }
        {
            eglSwapBuffers(egl_display, egl_surface);
        }

    }

clicking the left button closes the loop ending the program as expected. However, if I change WL_POINTER_BUTTON_STATE_PRESSED to WL_POINTER_BUTTON_STATE_RELEASED and then click the left button, gnome-shell crashes closing all my open applications and landing me on GDM login page.

If I change BTN_LEFT to BTN_RIGHT keeping WL_POINTER_BUTTON_STATE_PRESSED as it is, then right-clicking with a physical mouse or the physical button on touchpad closes the window cleanly. However, right-clicking with touchpad through two-finger multitouch again crashes gnome-shell. If I also change WL_POINTER_BUTTON_STATE_PRESSED to WL_POINTER_BUTTON_STATE_RELEASED, then right-clicking in all three above way result in crash.

Also, I was able to figure out that my own program actually finishes and returns cleanly (both by attaching GDB to it from virtual terminal and by printing something to a file right before returning). The program also works as intended on Weston 1.12.0 on the same aforementioned machine.

I am using the pointer-constraint and relative-pointer wayland extensions, but they seem to work properly.
Comment 1 Jonas Ådahl 2017-05-26 07:57:51 UTC
Could you get a backtrace of the gnome-shell process? Best would be if you have the debug symbols for both the mutter and gnome-shell packages.
Comment 2 Naman Dixit 2017-05-26 08:21:34 UTC
It seems that the bug in fact is in pointer-constraint extension (pointer-constraints-unstable-v1, specifically). Adding either this:

wayland_confined_pointer = zwp_pointer_constraints_v1_confine_pointer(
                                wayland_pointer_constraint,
                                wayland_surface,
                                wayland_pointer,
                                NULL,
                                ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_PERSISTENT);
                                                                              zwp_confined_pointer_v1_add_listener(wayland_confined_pointer,
                                     &wayland_confined_pointer_listener,
                                     NULL);

or this

wayland_locked_pointer = zwp_pointer_constraints_v1_lock_pointer(
                                wayland_pointer_constraint,
                                wayland_surface,
                                wayland_pointer,
                                NULL,
                                ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_PERSISTENT);

zwp_locked_pointer_v1_add_listener(wayland_locked_pointer,
                                   &wayland_locked_pointer_listener,
                                   NULL);
creates this issue. Changing ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_PERSISTENT to ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_ONESHOT does NOT change anything and still result in a crash. relative-pointer extension doesn't seem to effect the crashes.

(In reply to Jonas Ådahl from comment #1)
> Could you get a backtrace of the gnome-shell process? Best would be if you
> have the debug symbols for both the mutter and gnome-shell packages.

Sorry, the debuginfo packages for gnome-shell and dependencies will require 2 GB download; way higher that my remaining data pack limits (third world internet FTW).
Comment 3 Jonas Ådahl 2017-05-26 08:32:27 UTC
(In reply to ndixit26 from comment #2)
> It seems that the bug in fact is in pointer-constraint extension
> (pointer-constraints-unstable-v1, specifically). Adding either this:
> 
> wayland_confined_pointer = zwp_pointer_constraints_v1_confine_pointer(
>                                 wayland_pointer_constraint,
>                                 wayland_surface,
>                                 wayland_pointer,
>                                 NULL,
>                                
> ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_PERSISTENT);
>                                                                             
> zwp_confined_pointer_v1_add_listener(wayland_confined_pointer,
>                                      &wayland_confined_pointer_listener,
>                                      NULL);
> 
> or this
> 
> wayland_locked_pointer = zwp_pointer_constraints_v1_lock_pointer(
>                                 wayland_pointer_constraint,
>                                 wayland_surface,
>                                 wayland_pointer,
>                                 NULL,
>                                
> ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_PERSISTENT);
> 
> zwp_locked_pointer_v1_add_listener(wayland_locked_pointer,
>                                    &wayland_locked_pointer_listener,
>                                    NULL);
> creates this issue. Changing ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_PERSISTENT
> to ZWP_POINTER_CONSTRAINTS_V1_LIFETIME_ONESHOT does NOT change anything and
> still result in a crash. relative-pointer extension doesn't seem to effect
> the crashes.

These are good hints. Any chance to get a reproducer? I can't reproduce with the pointer-locking example client I have here.
Comment 4 Naman Dixit 2017-05-26 08:52:08 UTC
Created attachment 352631 [details]
Minimal source file to reproduce the bug

The file is ~600 LOC but most of it is just boilerplate. To compile, run

gcc mystery-crashes-lite.c pointer-constraints-unstable-v1-code.c relative-pointer-unstable-v1-code.c -lwayland-client -lwayland-cursor -lwayland-egl -lEGL -lGLESv2

You will need the client-headers and code file generated by wayland-scanner for pointer-constraint and relative-pointer.

In the compiled program, right-clicking with physical button on touchpad closes the program while two-finger right click crashes everything.
Comment 5 Jonas Ådahl 2017-05-26 09:33:54 UTC
Thanks. I can reproduce the issue by changing PRESSED to RELEASED in the attached client.
Comment 6 Naman Dixit 2017-05-26 10:09:06 UTC
On my end, with PRESSED, only two-finger multitouch right-clicking crashes gnome-shell. With RELEASED, both physical button on touchpad and multitouch right-click results in crash.
Comment 7 Naman Dixit 2017-05-27 09:58:16 UTC
After some more experiments, it seems that interacting with the touchpad in most way (clicking, moving, scrolling) and then quitting the application within a certain (very short) timeframe results in gnome-shell crashing.
Comment 8 Jonas Ådahl 2017-05-31 09:30:41 UTC
Created attachment 352927 [details] [review]
wayland/pointer: Use glib signals tracking focus surface

Use the "destroy" MetaWaylandSurface signal instead of the wl_resource
destroy signal for tracking the lifetime of the surface with pointer
focus.

As unsetting the focus may have side effects due to handlers of the
"focus-surface-changed" signal, connect the signal after the default
handler to make sure other clean up facilities have the chance deal with
the surface destruction before we try to unset the focus.
Comment 9 Jonas Ådahl 2017-05-31 09:30:48 UTC
Created attachment 352928 [details] [review]
wayland/pointer: Track lifetime of current surface

Clear the pointer->current when the surface is destroyed.
Comment 10 Rui Matos 2017-05-31 16:33:50 UTC
Review of attachment 352927 [details] [review]:

looks fine, hopefully we don't have more instances of issues like this
Comment 11 Rui Matos 2017-05-31 16:36:35 UTC
Review of attachment 352928 [details] [review]:

looks good, but does this fix anything right now or is it just a correctness issue you noticed?
Comment 12 Jonas Ådahl 2017-06-01 03:12:03 UTC
(In reply to Rui Matos from comment #11)
> Review of attachment 352928 [details] [review] [review]:
> 
> looks good, but does this fix anything right now or is it just a correctness
> issue you noticed?

The crash was because we set the new focus to what pointer->current was pointing to, as a side effoct of wl_client_destroy() destroying all the clients resources. MetaWaylandPointer listened on the wl_resource destroy signal, which happens before the "destroy" GSignal, but we wouldn't set a new pointer->current until clutter sent a synthesized crossing event triggering a repick.

So in a way, the crash was caused by us not clearing pointer->current on surface destroy before trying to set a new focus, but just adding a "destroy" listener that did that is not enough, as the set_focus() would happen before that.

Changing MetaWaylandPointer to use g_signal_connect_after() and adding the "destroy" handler for pointer->current would make set_focus() happen after pointer->current was cleared, fixing the issue, but being called after other handlers already fixes the issue, as MetaWaylandPointerConstraints has its own MetawaylandSurface::destroy handler destroying the constraint instance, *also* fixing the issue (as no new constraint would be enabled now)

So the issue is double fixed :P
Comment 13 Jonas Ådahl 2017-06-01 06:39:20 UTC
Attachment 352927 [details] pushed as b19e459 - wayland/pointer: Use glib signals tracking focus surface
Attachment 352928 [details] pushed as b4120a7 - wayland/pointer: Track lifetime of current surface
Comment 14 Jonas Ådahl 2017-06-01 06:40:52 UTC
Pushed to gnome-3-24.