After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 759299 - wayland: possible infinite loop in gdkwindow-wayland
wayland: possible infinite loop in gdkwindow-wayland
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: Backend: Wayland
3.19.x
Other Linux
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2015-12-10 13:08 UTC by Olivier Fourdan
Modified: 2015-12-16 13:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Reproducer program (1.07 KB, text/plain)
2015-12-10 13:08 UTC, Olivier Fourdan
  Details
wayland: Check transient depth (2.87 KB, patch)
2015-12-14 15:46 UTC, Olivier Fourdan
reviewed Details | Review
wayland: Check transient loop (2.18 KB, patch)
2015-12-15 14:19 UTC, Olivier Fourdan
none Details | Review
A slightly more elaborate reproducer (1.01 KB, text/plain)
2015-12-15 14:30 UTC, Olivier Fourdan
  Details
Updated patch (1.93 KB, patch)
2015-12-16 08:01 UTC, Olivier Fourdan
committed Details | Review

Description Olivier Fourdan 2015-12-10 13:08:29 UTC
Created attachment 317111 [details]
Reproducer program

Summary:

The Wayland backend will enter an infinite loop in case of a cycle in transient relationship (a transient for B and B transient for A).

Steps to reproduce:

1. Save and build the attached program
  $ gcc -o cycling-transient cycling-transient.c `pkg-config --libs --cflags gtk+-3.0`

2. Execute the program under Wayland

Actual result:

The program wil lenter an infinite loop and take 100% CPU

Expected result:

The windows appear normally as in X11

Additional data:

This is because of the following code construct in gdkwindow-wayland.c

	  while (window)
	    {
	      GdkWindowImplWayland *impl;
	      impl = GDK_WINDOW_IMPL_WAYLAND (window->impl);
	      window = impl->transient_for;
	    }

As found in find_grab_input_seat(), get_popup_parent() or gdk_wayland_window_get_fake_root_coords()
Comment 1 Matthias Clasen 2015-12-11 12:50:00 UTC
WE should be robust against this sort of application error. We could maintain a 'transient depth' for each window, for example.
Comment 2 Jonas Ådahl 2015-12-11 13:26:11 UTC
(In reply to Matthias Clasen from comment #1)
> WE should be robust against this sort of application error. We could
> maintain a 'transient depth' for each window, for example.

Or we check for loops on xdg_surface.set_parent and just disconnect a client that tries to set a loop.
Comment 3 Olivier Fourdan 2015-12-14 15:46:46 UTC
Created attachment 317380 [details] [review]
wayland: Check transient depth

Gdk Wayland backend traverses up the window tree looking for transiency parents, but does not check for cycles when doing so.
    
As a result, if two or more windows are transient to each other, the Wayland gdk backend will enter an infinite loop.
    
While this is clearly a bug in the application, gtk+/gdk should be more robust and handle such errors more gracefully.
    
To avoid looping infinitely, add a transient depth and check that the parent window in on a lower transient depth and exit the loop otherwise.
Comment 4 Matthias Clasen 2015-12-14 15:58:29 UTC
Review of attachment 317380 [details] [review]:

::: gdk/wayland/gdkwindow-wayland.c
@@ +1976,3 @@
+    }
+  else
+    impl->transient_depth = 0;

Sadly, I don't think this works - don't you have to update the transient_depth field for all transients of this window here ? I think we probably do have to walk up the tree, after all
Comment 5 Olivier Fourdan 2015-12-15 09:29:38 UTC
Right.

Actually I think it's even worse than that, because by walking up the transient tree, we may run into the cycle loop.

And checking for transient depth here won't help, because the window could be made transient for a parent of lower depth or hight depth.

E.g.


[ 0 ] -> [ 1 ] -> [ 2 ]
          /        /
    /----/   /----/
   /        /
[ 0 ] -> [ 1 ]

In this case, this is fine 0 < 1

But:

[ 0 ] -> [ 1 ] -> [ 2 ]
   \                 \
    \------------\    \------------\
                  \                 \
[ 0 ] -> [ 1 ]...[ 2 ] -> [ 3 ] -> [ 4 ] 

In this case, the next depth in the tree is actually lesser (or equal) than the current one, and yet this does not mean there is a loop, so testing the transient depth is meaningless here precisely because we are adjusting its value.

Beside, if we can detect a loop in the set_transient() function and refuse to apply the transient relationship in such a case, we can guarantee that no loop is created and we don't need the transient depth at all.

Means we do all the job in set_transient() (possibly using a more costly algorithm) but we don't have to do it again every time we walk up the transient list (and this is done very often, for example gdk_wayland_window_get_fake_root_coords() is invoked at each pointer move from pointer_handle_motion()).
Comment 6 Matthias Clasen 2015-12-15 13:11:00 UTC
(In reply to Olivier Fourdan from comment #5)
> Right.
> 
> Actually I think it's even worse than that, because by walking up the
> transient tree, we may run into the cycle loop.

Well, walking up the tree, you check for either hitting NULL (good) or yourself (bad).
Comment 7 Olivier Fourdan 2015-12-15 14:03:41 UTC
(In reply to Matthias Clasen from comment #6)
> Well, walking up the tree, you check for either hitting NULL (good) or
> yourself (bad).

Yes, exactly, we don't need a transient depth, we just need to be careful when setting the transient relationship at first.

Unless I am missing something obvious, a trivial patch is to follow.
Comment 8 Olivier Fourdan 2015-12-15 14:19:04 UTC
Created attachment 317426 [details] [review]
wayland: Check transient loop

Note, this patch tries to avoid the problem by setting the transient relationship before checking for a possible loop. If such a loop is found, the previous transient value is restored.

Yet, because unmap_subsurface() is called first and also uses transient_for, we need to proceed till the end of the function gdk_wayland_window_set_transient_for() as if we modified the transient, even if the value is not changed so that the subsurface has a chance to be recreated if needed.
Comment 9 Olivier Fourdan 2015-12-15 14:30:49 UTC
Created attachment 317427 [details]
A slightly more elaborate reproducer

This reproducer will hang mutter/gnome-shell with an unpatched gtk+, i.e. the infinite loop will occur in mutter this time.

So do not run this reproducer if your your main session is GNOME on Wayland.

Yet it means a broken client can easily cause a denial of service in mutter, means we also need to fix mutter...
Comment 10 Matthias Clasen 2015-12-15 15:37:02 UTC
Review of attachment 317426 [details] [review]:

::: gdk/wayland/gdkwindow-wayland.c
@@ +1961,1 @@
   impl->transient_for = parent;

I don't see why you even need to set transient_for first. Just start walking up from the parent - if you hit the window, setting transient_for would create a cycle. No ?
Comment 11 Olivier Fourdan 2015-12-15 15:48:47 UTC
I reckon you need to set the transient_for first to walk up the tree and detect all possibilities.

Consider this case, two valid and distinct transient trees (where "->" means "is transient for"):

[ A ] -> [ B ] -> [ C ]

And 

[ D ] -> [ E ] -> [ A ]

Now the program tries to set [ C ] -> [ D ] (ie C becomes transient for D)

If you start walking up the tree starting from C, you won't hit D in the first tree, neither you will hit C walking up the second tree starting from D, and yet, if  C becomes transient for D, you'll have a loop:

[ A ] -> [ B ] -> [ C ] -> [ D ] -> [ E ] -> [ A ]
Comment 12 Matthias Clasen 2015-12-15 18:36:49 UTC
(In reply to Olivier Fourdan from comment #11)
> I reckon you need to set the transient_for first to walk up the tree and
> detect all possibilities.
> 
> Consider this case, two valid and distinct transient trees (where "->" means
> "is transient for"):
> 
> [ A ] -> [ B ] -> [ C ]
> 
> And 
> 
> [ D ] -> [ E ] -> [ A ]
> 

How is this two trees, and not just d -> e -> a -> b -> c ?
Comment 13 Olivier Fourdan 2015-12-16 07:35:10 UTC
Yeah, you're right of course, it's obvious... Let's blame that on a temporary (or so I hope ^_~) mental fog on my side :-)
Comment 14 Olivier Fourdan 2015-12-16 08:01:52 UTC
Created attachment 317476 [details] [review]
Updated patch
Comment 15 Matthias Clasen 2015-12-16 11:37:36 UTC
you could even call it transient mental fog
Comment 16 Matthias Clasen 2015-12-16 11:38:09 UTC
Review of attachment 317476 [details] [review]:

looks good now
Comment 17 Olivier Fourdan 2015-12-16 13:13:30 UTC
Review of attachment 317476 [details] [review]:

attachment 317476 [details] [review] pushed as commit b456db8 wayland: Check transient loop
Comment 18 Olivier Fourdan 2015-12-16 13:14:01 UTC
Closing.