Bug 761613 – crash with xwayland glamor

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 761613 - crash with xwayland glamor


Summary:	crash with xwayland glamor


Status:	RESOLVED FIXED

Product:	mutter
Classification:	Core
Component:	wayland
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	mutter-maint
QA Contact:	mutter-maint

URL:
Whiteboard:

Duplicates:	761899 (view as bug list)
Depends on:
Blocks:

Reported:	2016-02-05 22:56 UTC by Ray Strode [halfline]
Modified:	2016-02-19 14:01 UTC

See Also:
GNOME target:	3.20
GNOME version:	---

Attachments
wayland: survive client attaching a buffer twice w/o intermediate release (5.20 KB, patch) 2016-02-05 22:57 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: use glib function for fetching timestamp (4.68 KB, patch) 2016-02-05 22:57 UTC, Ray Strode [halfline]	committed	Details \| Review
wayland: consolidate frame callback code (19.52 KB, patch) 2016-02-05 22:57 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: send error to client if it tries to use unavailable buffer (10.21 KB, patch) 2016-02-05 22:57 UTC, Ray Strode [halfline]	rejected	Details \| Review
wayland: examine pending->buffer when prepping toplevel commit (4.58 KB, patch) 2016-02-08 22:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: only release buffer early for toplevel surfaces (5.01 KB, patch) 2016-02-08 22:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: get rid of buffer->copied_data boolean (7.53 KB, patch) 2016-02-08 22:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: rework buffer handling one more time (12.33 KB, patch) 2016-02-08 22:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: handle client attaching already attached buffer better (3.10 KB, patch) 2016-02-08 22:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: return from toplevel commit early if no new buffer (3.01 KB, patch) 2016-02-16 21:03 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: examine pending->buffer when prepping toplevel commit (4.61 KB, patch) 2016-02-16 21:03 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: move toplevel surface geometry update to helper function (7.71 KB, patch) 2016-02-16 21:03 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: only release buffer early for toplevel surfaces (4.63 KB, patch) 2016-02-16 21:03 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: get rid of buffer->copied_data boolean (7.26 KB, patch) 2016-02-16 21:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: rework buffer handling one more time (12.01 KB, patch) 2016-02-16 21:04 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: don't depending on surface->buffer to know if a window is showable (8.96 KB, patch) 2016-02-17 14:57 UTC, Ray Strode [halfline]	none	Details \| Review
wayland: get rid of buffer->copied_data boolean (8.24 KB, patch) 2016-02-18 19:27 UTC, Ray Strode [halfline]	committed	Details \| Review
wayland: change accessible boolean to use_count counter (15.08 KB, patch) 2016-02-18 19:30 UTC, Ray Strode [halfline]	committed	Details \| Review

Description Ray Strode [halfline] 2016-02-05 22:56:01 UTC

So I haven't seen this crash yet (not sure why), but rtcm pointed out on irc that the meta_fatal in meta_wayland_buffer_take_control in commit 0165cb697466ba0843b993416e00d4f768c00d45 was triggering a crash for him because of this code in Xwayland:

http://cgit.freedesktop.org/xorg/xserver/tree/hw/xwayland/xwayland.c#n367

I'll attach a patches that I think should address the issue, but I'm not seeing the problem yet so I can't be sure the patch fixes it.

I'll also attach a work-in-progess patchset to try to notice when a client is doing things to the buffer when it shouldn't be. not sure if this should go in or not. definitely not without me carefully exercising the new code paths to make sure they work, which I haven't yet.

Comment 1 Ray Strode [halfline] 2016-02-05 22:57:13 UTC

Created attachment 320516 [details] [review]
wayland: survive client attaching a buffer twice w/o intermediate release

Right now we abort if a client attaches and commits the same buffer to
a surface twice, without waiting for a release in between. That's wrong,
the compositor should be more robust to client misbehavior than that.

Xwayland, for instance, reattaches the already attached buffer every
time it posts damage so we need to accomodate that.

Comment 2 Ray Strode [halfline] 2016-02-05 22:57:17 UTC

Created attachment 320517 [details] [review]
wayland: use glib function for fetching timestamp

The code currently implements a function, get_time, that
fetches a timestamp.  That duplicates code already in glib,
and the glib implementation is better, anyway, since it doesn't
skew backward when the system clock is changed.

This commit changes the code to use g_get_monotonic_time and
drop the get_time function.

Comment 3 Ray Strode [halfline] 2016-02-05 22:57:21 UTC

Created attachment 320518 [details] [review]
wayland: consolidate frame callback code

At the moment frame callbacks are instantiated in
the surface code and finished in the compositor code.

This commit consolidates both halves of functionality
into the surface code, by generalizing MetaWaylandFrameCallback
to complete a specified closure instead of handling wl_callback
resources specifically.

Comment 4 Ray Strode [halfline] 2016-02-05 22:57:26 UTC

Created attachment 320519 [details] [review]
wayland: send error to client if it tries to use unavailable buffer

If a client tries to use a buffer before it's available for use then
we should send back an error so it knows it's doing something
with undefined behavior.

This commit tracks when damage is posted to a buffer waits until the
next frame callback before allowing damage to be posted to the buffer
again.

The exceptions, of course, are shm buffers which we cede control
of immediately after copying the pixels out of, so they can be reused
right away.

Comment 5 Ray Strode [halfline] 2016-02-05 23:00:03 UTC

Rui, if you could give attachment 320516 [details] [review] a try, i'd appreciate it!

what's your reproducer ? just running an x client (say glxgears ?) ?

Comment 6 Jonas Ådahl 2016-02-06 02:21:55 UTC

Review of attachment 320518 [details] [review]:

::: src/wayland/meta-wayland-buffer.h
@@ +39,3 @@
 
+  MetaWaylandFrameCallback frame_callback;
+

It feels generally wrong to associate a wl_buffer with a frame callback. The same buffer may be attached to any number of surfaces, and there is nothing incorrect about that AFAIU. When a frame callback should be invoked or dropped should be a detail of the surface role, as its the role that determines how a surface is drawn.

Comment 7 Jonas Ådahl 2016-02-06 02:28:47 UTC

Review of attachment 320519 [details] [review]:

A buffer is not ever "unavailable", it's just "in use by the compositor". I don't think we should kill clients just because they post damage after having drawn to a live buffer. We should just let them potentially get glitches, since that is what they are asking for.

This seems like it could also potentially kill Xwayland if it looses the release race, since it (at least did) commit the same buffer over and over again.

Comment 8 Jonas Ådahl 2016-02-06 02:36:19 UTC

Review of attachment 320516 [details] [review]:

::: src/wayland/meta-wayland-surface.c
@@ +597,3 @@
+   should_attach_buffer = FALSE;
+
+  if (should_attach_buffer)

Couldn't this just be

  if (pending->newly_attached && pending->buffer != surface->buffer)
    {
       ...

? It'd cover the case when the client attaches the same buffer twice by ignoring the non-update. We'd also ignore attaching a second NULL buffer.

Why does surface->buffer->accessible make any difference here?

Comment 9 Jonas Ådahl 2016-02-06 02:37:43 UTC

Review of attachment 320517 [details] [review]:

Looks good to me.

Comment 10 Jonas Ådahl 2016-02-06 02:40:38 UTC

Shouldn't it be enough to just don't try to re-take control of a attached buffer if that buffer is already the active one? Adding anything related to frame callbacks to MetaWaylandBuffer seems wrong to me anyhow, since its on the wrong level.

The meta_fatal() could IMHO as well be a g_assert() or g_return_if_fail() to detect bugs in mutter, but it shouldn't be triggerable by a client.

Comment 11 Ray Strode [halfline] 2016-02-06 04:33:55 UTC

(In reply to Jonas Ådahl from comment #6)
> It feels generally wrong to associate a wl_buffer with a frame callback. The
> same buffer may be attached to any number of surfaces, and there is nothing
> incorrect about that AFAIU. When a frame callback should be invoked or
> dropped should be a detail of the surface role, as its the role that
> determines how a surface is drawn.

makes sense.

(In reply to Jonas Ådahl from comment #7)
> Review of attachment 320519 [details] [review] [review]:
> 
> A buffer is not ever "unavailable", it's just "in use by the compositor". I
> don't think we should kill clients just because they post damage after
> having drawn to a live buffer. We should just let them potentially get
> glitches, since that is what they are asking for.
Well, one of the original premises of wayland was "every frame is perfect", so I don't entirely agree that we should be okay with clients doing things that can lead to graphical glitches, but I also see the point that killing the client is a strong reaction. Given your opinion here and the problems you've pointed out in comment 6 I think we can just drop this patch.  I wasn't 100% sure the patch should go in anyway.  In the future, I guess, if we want to provide better accountability to clients we can do it in the role specific code and, perhaps do it in a less fatal way. 
 
> This seems like it could also potentially kill Xwayland if it looses the
> release race, since it (at least did) commit the same buffer over and over
> again.
I don't think so.  Xwayland doesn't reuse the buffer until the frame completes, and the patch explicitly allows the buffer to be reused after the frame completes (which isn't strictly correct, but should keep xwayland working).  Anyway, that's moot, let's just drop this for now!

Comment 12 Ray Strode [halfline] 2016-02-06 04:46:32 UTC

(In reply to Jonas Ådahl from comment #8)
> Why does surface->buffer->accessible make any difference here?

the compositor releases control of shm buffers as soon as the pixel data is uploaded to the gpu, and so those buffers aren't accessible for posting future damage requests without first reattaching the buffer.  If we don't have the accessible check there, then a shm buffer that reattaches won't actually reattach.  Put another way: if the buffer isn't accessible to the compositor because the compositor ceded control of it, then the compositor needs to retake control of the buffer before it can use it again.

Comment 13 Ray Strode [halfline] 2016-02-06 04:59:58 UTC

(In reply to Jonas Ådahl from comment #10)
> Shouldn't it be enough to just don't try to re-take control of a attached
> buffer if that buffer is already the active one?
Not following you here.  If it's an egl buffer we don't release control of the buffer until a new one is attached to the surface, so under what scenario are you thinking we should re-take control of the buffer?  Are you talking about shm buffers?

> Adding anything related to
> frame callbacks to MetaWaylandBuffer seems wrong to me anyhow, since its on
> the wrong level.
Yup I think we're on the same page here.

> The meta_fatal() could IMHO as well be a g_assert() or g_return_if_fail() to
> detect bugs in mutter, but it shouldn't be triggerable by a client.
i guess g_return_if_fail is a little bit kinder to the user if don't have the state machine quite right.

Comment 14 Jonas Ådahl 2016-02-06 13:55:17 UTC

(In reply to Ray Strode [halfline] from comment #11)
> (In reply to Jonas Ådahl from comment #7)
> > Review of attachment 320519 [details] [review] [review] [review]:
> > 
> > A buffer is not ever "unavailable", it's just "in use by the compositor". I
> > don't think we should kill clients just because they post damage after
> > having drawn to a live buffer. We should just let them potentially get
> > glitches, since that is what they are asking for.
> Well, one of the original premises of wayland was "every frame is perfect",
> so I don't entirely agree that we should be okay with clients doing things
> that can lead to graphical glitches, but I also see the point that killing
> the client is a strong reaction. Given your opinion here and the problems
> you've pointed out in comment 6 I think we can just drop this patch.  I
> wasn't 100% sure the patch should go in anyway.  In the future, I guess, if
> we want to provide better accountability to clients we can do it in the role
> specific code and, perhaps do it in a less fatal way. 

Wayland enables the possibility of "every frame is perfect", but the compositor has no way to enforce it. A client can draw funky buffers as much as it likes and there is nothing we can do about it.

(In reply to Ray Strode [halfline] from comment #12)
> (In reply to Jonas Ådahl from comment #8)
> > Why does surface->buffer->accessible make any difference here?
> 
> the compositor releases control of shm buffers as soon as the pixel data is
> uploaded to the gpu, and so those buffers aren't accessible for posting
> future damage requests without first reattaching the buffer.  If we don't
> have the accessible check there, then a shm buffer that reattaches won't
> actually reattach.  Put another way: if the buffer isn't accessible to the
> compositor because the compositor ceded control of it, then the compositor
> needs to retake control of the buffer before it can use it again.

Ah I see. Yea, that makes sense. Also, I wonder if we actually should make the "accessible" bool a ref-count style int, since there might be several surfaces that all are actively using the same buffer.

Also, might I suggest that we rename "accessible" to "is_in_use" or something like that? A buffer is always accessible until destroyed, but what we are doing is tracking when the compositor is *using* the buffer.

Comment 15 Ray Strode [halfline] 2016-02-06 15:19:25 UTC

(In reply to Jonas Ådahl from comment #14)
> (In reply to Ray Strode [halfline] from comment #11)
> > (In reply to Jonas Ådahl from comment #7)
> > > Review of attachment 320519 [details] [review] [review] [review] [review]:
> > > 
> > > A buffer is not ever "unavailable", it's just "in use by the compositor". I
> > > don't think we should kill clients just because they post damage after
> > > having drawn to a live buffer. We should just let them potentially get
> > > glitches, since that is what they are asking for.
> > Well, one of the original premises of wayland was "every frame is perfect",
> > so I don't entirely agree that we should be okay with clients doing things
> > that can lead to graphical glitches, but I also see the point that killing
> > the client is a strong reaction. Given your opinion here and the problems
> > you've pointed out in comment 6 I think we can just drop this patch.  I
> > wasn't 100% sure the patch should go in anyway.  In the future, I guess, if
> > we want to provide better accountability to clients we can do it in the role
> > specific code and, perhaps do it in a less fatal way. 
> 
> Wayland enables the possibility of "every frame is perfect", but the
> compositor has no way to enforce it. A client can draw funky buffers as much
> as it likes and there is nothing we can do about it.
true. I do think, at some point, we could help clients achieve "every frame is perfect" by somehow alerting the user a developer about a misbehaving client when we can detect it. maybe only if an environment variable is set, not sure.  Anyway, we can do that down the road.

> Ah I see. Yea, that makes sense. Also, I wonder if we actually should make
> the "accessible" bool a ref-count style int, since there might be several
> surfaces that all are actively using the same buffer.
> 
> Also, might I suggest that we rename "accessible" to "is_in_use" or
> something like that? A buffer is always accessible until destroyed, but what
> we are doing is tracking when the compositor is *using* the buffer.
Yea, I think a ref count is necessary indeed.  will look into it.

Comment 16 Ray Strode [halfline] 2016-02-08 22:03:54 UTC

Comment on attachment 320517 [details] [review]
wayland: use glib function for fetching timestamp

Attachment 320517 [details] pushed as 50099c4 - wayland: use glib function for fetching timestamp

Comment 17 Ray Strode [halfline] 2016-02-08 22:04:32 UTC

Created attachment 320662 [details] [review]
wayland: examine pending->buffer when prepping toplevel commit

toplevel_surface_commit looks at surface->buffer to decide how to
proceed.  It should instead look at pending->buffer, since pending
represents the new changes to be committed.

This commit changes the function to look at pending->buffer instead.
At this point, the change may seem a bit academic, since they both
hold the same value, but it's actually important prep work for a future
change dealing with buffer management.

Comment 18 Ray Strode [halfline] 2016-02-08 22:04:36 UTC

Created attachment 320663 [details] [review]
wayland: only release buffer early for toplevel surfaces

It's important we don't prematurely release a surface used by the
pointer, since the pointer references the surface on every
mouse update.

This commit moves the buffer release to toplevel_surface_commit

Comment 19 Ray Strode [halfline] 2016-02-08 22:04:40 UTC

Created attachment 320664 [details] [review]
wayland: get rid of buffer->copied_data boolean

We currently track whether or not a buffer can be released early
by looking at the copied_data boolean on the buffer.  This boolean
is, practically speaking, always set to TRUE for shm buffers and is
always false otherwise.

We can just as easily check if the buffer is a shm buffer to decide
whether or not to do an early release.  That's better from a
theoretical point of view since copied_data assumes a 1-to-1
relationship between surface and buffer, which may not actually hold.

This commit drops copied_data and changes the check to instead see
if the buffer is shm.

Comment 20 Ray Strode [halfline] 2016-02-08 22:04:44 UTC

Created attachment 320665 [details] [review]
wayland: rework buffer handling one more time

commit 0165cb6974 changed mutter to release shm buffers back to the
compositor as early as possible.  In order to track whether or not
a buffer can be released, the commit introduced an "accessible" flag
on the buffer structure (to mean "still being used, don't release
yet")

The problem, though, is that a buffer can theoretically be used by more
than one surface at a time.  If we have multiple surfaces delaying the
release of the buffer, then we need to know when every surface is done
with the buffer before doing the release.  A single boolean is
inappropriate for tracking whether or not multiple surfaces are delaying
release of the buffer, a usage counter is needed instead.

Furthermore, the surface objects don't need to access a
MetaWaylandBuffer object that's been released early—the texture created
from the buffer is referenced separately.  Since we don't need the
buffer after release, we can use the buffer references on the surface
objects directly as our usage counter to know when to release.

This commit drops meta_wayland_buffer_{take,release}_control and the
associated "accessible" struct member, and instead relies on the ref
count of the MetaWaylandBuffer itself to know when to do the release.

Comment 21 Ray Strode [halfline] 2016-02-08 22:04:49 UTC

Created attachment 320666 [details] [review]
wayland: handle client attaching already attached buffer better

If a client attaches the same buffer that it previously attached,
we end up doing a number of function calls that ultimately won't
do anything.

This commit adds a check up front to avoid the function calls
entirely.  The check makes the code clearer.

Comment 22 Olivier Fourdan 2016-02-16 13:55:57 UTC

bug 761899 is likely a dupe of this one, commit 7adbb58 seems to be the cause of the fatal error and there are a couple of reproducer in bug 761899 (namely either xvidtune or vlc)

Comment 23 Ray Strode [halfline] 2016-02-16 14:30:55 UTC

*** Bug 761899 has been marked as a duplicate of this bug. ***

Comment 24 Olivier Fourdan 2016-02-16 15:08:20 UTC

attachment 320665 [details] [review] and attachment 320666 [details] [review] do not apply cleanly anymore on head of git master apparently (git am -3 doesn't help are the blobs are missing) but the rebase is trivial.

Once applied, it solves the issue with vlc and xvidtune but now mutter crashes with gnome-terminal (reverting to the origin/master heads doesn't have that issue).

I tried twice to be sure.

Comment 25 Olivier Fourdan 2016-02-16 15:10:54 UTC

backtrace is:

+ Trace 235977

#0 cogl_texture_get_width
at cogl-texture.c line 210
#1 toplevel_surface_commit
at wayland/meta-wayland-surface.c line 396
#2 meta_wayland_surface_role_commit
at wayland/meta-wayland-surface.c line 2600
#3 apply_pending_state
at wayland/meta-wayland-surface.c line 700
#4 ffi_call_unix64
from /lib64/libffi.so.6
#5 ffi_call
from /lib64/libffi.so.6
#6 wl_closure_invoke
at src/connection.c line 949
#7 wl_client_connection_data
at src/wayland-server.c line 337
#8 wl_event_loop_dispatch
at src/event-loop.c line 421
#9 wayland_event_source_dispatch
at wayland/meta-wayland.c line 77
#10 g_main_dispatch
at gmain.c line 3154
#11 g_main_context_dispatch
at gmain.c line 3769
#12 g_main_context_iterate
at gmain.c line 3840
#13 g_main_loop_run
at gmain.c line 4034
#14 meta_run
at core/main.c line 521
#15 main
at main.c line 471

Comment 26 Ray Strode [halfline] 2016-02-16 21:03:25 UTC

Created attachment 321442 [details] [review]
wayland: return from toplevel commit early if no new buffer

meta_wayland_surface_toplevel_commit has a lot of logic to handle
a new buffer getting attached as part of the commit.  None of
that code needs to run if there is no new buffer attached.

This commit short-circuits that case.

Comment 27 Ray Strode [halfline] 2016-02-16 21:03:36 UTC

Created attachment 321443 [details] [review]
wayland: examine pending->buffer when prepping toplevel commit

toplevel_surface_commit looks at surface->buffer to decide how to
proceed.  It should instead look at pending->buffer, since pending
represents the new changes from the client to be committed.

At this point, the change may seem a bit academic, since surface->buffer
reflects changes made to the surface by the client, but in the future
we're going to nullify surface->buffer when the compositor is finished
with it, so this commit is important prep work for when the invariant
will no longer hold.

Comment 28 Ray Strode [halfline] 2016-02-16 21:03:46 UTC

Created attachment 321444 [details] [review]
wayland: move toplevel surface geometry update to helper function

toplevel_surface_commit has some fairly complicated code to deal
with syncing the window geometry with the buffer surface geometry.

For clarity, this commit moves that code to a helper function.

Comment 29 Ray Strode [halfline] 2016-02-16 21:03:57 UTC

Created attachment 321445 [details] [review]
wayland: only release buffer early for toplevel surfaces

It's important we don't prematurely release a surface used by the
pointer, since the pointer references the surface on every
mouse update.

This commit moves the buffer release to toplevel_surface_commit

Comment 30 Ray Strode [halfline] 2016-02-16 21:04:04 UTC

Created attachment 321446 [details] [review]
wayland: get rid of buffer->copied_data boolean

We currently track whether or not a buffer can be released early
by looking at the copied_data boolean on the buffer.  This boolean
is, practically speaking, always set to TRUE for shm buffers and is
always false otherwise.

We can just as easily check if the buffer is a shm buffer to decide
whether or not to do an early release.  That's better from a
theoretical point of view since copied_data assumes a 1-to-1
relationship between surface and buffer, which may not actually hold.

This commit drops copied_data and changes the check to instead see
if the buffer is shm.

Comment 31 Ray Strode [halfline] 2016-02-16 21:04:13 UTC

Created attachment 321447 [details] [review]
wayland: rework buffer handling one more time

commit 0165cb6974 changed mutter to release shm buffers back to the
compositor as early as possible.  In order to track whether or not
a buffer can be released, the commit introduced an "accessible" flag
on the buffer structure (to mean "still being used, don't release
yet")

The problem, though, is that a buffer can theoretically be used by more
than one surface at a time.  If we have multiple surfaces delaying the
release of the buffer, then we need to know when every surface is done
with the buffer before doing the release.  A single boolean is
inappropriate for tracking whether or not multiple surfaces are delaying
release of the buffer, a usage counter is needed instead.

Furthermore, the surface objects don't need to access a
MetaWaylandBuffer object that's been released early—the texture created
from the buffer is referenced separately.  Since we don't need the
buffer after release, we can use the buffer references on the surface
objects directly as our usage counter to know when to release.

This commit drops meta_wayland_buffer_{take,release}_control and the
associated "accessible" struct member, and instead relies on the ref
count of the MetaWaylandBuffer itself to know when to do the release.

Comment 32 Olivier Fourdan 2016-02-17 07:55:17 UTC

Still not good I think, now it doesn't crash anymore with gnome-terminal, vlc nor xvidtune but starting gnome-terminal, nothing happens, the window never gets mapped.

It's kinda "funny", the gnome-terminal shows in the shell's window picker, you can even click on it, but it never gets mapped on screen.

Comment 33 Ray Strode [halfline] 2016-02-17 14:55:59 UTC

oh sorry, i guess i screwed up my git-bz incantation and dropped the first patch

Comment 34 Ray Strode [halfline] 2016-02-17 14:57:04 UTC

Created attachment 321498 [details] [review]
wayland: don't depending on surface->buffer to know if a window is showable

The core window code defers showing a window until there's a buffer on
the surface associated with the window.

A surface may release its buffer early, though, so checking for the
existance of a buffer isn't a good check.

This commit changes the check to use a boolean instead.

Comment 35 Ray Strode [halfline] 2016-02-17 14:57:36 UTC

^ this one is supposed to be first on the branch

Comment 36 Olivier Fourdan 2016-02-17 15:06:49 UTC

Works for me! thanks!

Comment 37 Jasper St. Pierre (not reading bugmail) 2016-02-17 16:46:51 UTC

Review of attachment 321498 [details] [review]:

This doesn't handle the case that if someone explicitly attaches NULL with wl_surface.attach on a wl_shell_surface, it should become unshowable.

The issue seems to be that we're using surface->buffer for "the surface that should be shown" and some object refcounting thing. I'm not quite sure I get it -- want to chat on IRC?

Comment 38 Jasper St. Pierre (not reading bugmail) 2016-02-17 16:48:58 UTC

Review of attachment 321442 [details] [review]:

Looks OK.

Comment 39 Jasper St. Pierre (not reading bugmail) 2016-02-17 16:49:32 UTC

Review of attachment 321443 [details] [review]:

I don't like changing the meaning of surface->buffer, as discussed earlier.

Comment 40 Jasper St. Pierre (not reading bugmail) 2016-02-17 16:51:20 UTC

Review of attachment 321444 [details] [review]:

I don't see the point, I'm +0 on it.

Comment 41 Jasper St. Pierre (not reading bugmail) 2016-02-17 16:53:36 UTC

Review of attachment 321445 [details] [review]:

I'm very confused why we need this. More explanation is definitely needed.

Comment 42 Jasper St. Pierre (not reading bugmail) 2016-02-17 16:59:52 UTC

Review of attachment 321447 [details] [review]:

So we're going back to the old refcount system I had -- heh.

I don't like the approach, though. I think the MetaWaylandBuffer and the CoglTexture should be associated together and have the same lifetime. Let's chat and see what's happening here. I'm definitely confused why "if SHM, copy as soon as possible and send a buffer release in ensure_cogl_texture" apparently doesn't work. For all buffers, not just toplevels.

Comment 43 Ray Strode [halfline] 2016-02-17 18:28:47 UTC

(In reply to Jasper St. Pierre from comment #37)
> Review of attachment 321498 [details] [review] [review]:
> 
> This doesn't handle the case that if someone explicitly attaches NULL with
> wl_surface.attach on a wl_shell_surface, it should become unshowable.
It doesn't actually need to handle that case, since we meta_window_unmanage the window, and dissociate the surface from the window.

We could set is_showable to FALSE in that case, but I don't think it matters in practice. Still I don't mind adding a line to that effect.

> The issue seems to be that we're using surface->buffer for "the surface that
> should be shown" and some object refcounting thing. I'm not quite sure I get
> it
So the problem is that, in theory, a buffer can be associated with more than one surface at once.  That means the is_accessible boolean on the buffer isn't right if we care about multiple surfaces using the same buffer.  Just because one surface copied the pixels from the buffer and is done with it doesn't mean they all have.   So now we need to track when all surfaces are using the buffer and only send a release when all surfaces are finished. 

You could accomplish this with a use_count variable that gets incremented by every surface that attaches the buffer, and then only release the buffer when the use_count drops to 0 (see comment 14/comment 15), but there's a complication there: we have to be careful to not bump the use_count if a surface attaches the same buffer that it's already attached.  This is because clients sometimes attach the same buffer when they don't need to, and we need to treat that as a no-op.  But, of course, the attachment shouldn't be a no-op if the buffer was previously released and is getting reused. And a surface wouldn't be able to tell if the buffer is getting recyled, with just a use_count variable.  So we would have to introduce more state on each surface object to detect that case.

It's all doable, but a question popped into my head: "Do the surfaces actually need to keep the MetaWaylandBuffer around when it has 0 use_count ?" I think the answer is no. Once we've released the buffer, the surfaces shouldn't need to access the object anymore.  So having two counts, use_count and ref_count is complexity we can trim out.

If we're willing to no longer assume surface->buffer != NULL means the same thing as "surface has valid texture", and instead treat it to mean "surface has a buffer that isn't released yet, since the surface depends on the buffer", then we can go back to the original spirit of surface->buffer from before bug 761312 and simplify how it all works.  That's what the patchset does, but of course, it has to weed out:

1) users of the code that assume surface->buffer != NULL means "surface has valid texture".  attachment 321498 [details] [review] and attachment 321443 [details] [review] do that

2) the assumptions in the code that assume only one surface will use a buffer at once. attachment 321446 [details] [review] and attachment 321447 [details] [review] does that

Comment 44 Ray Strode [halfline] 2016-02-17 18:34:42 UTC

(In reply to Jasper St. Pierre from comment #40)
> Review of attachment 321444 [details] [review] [review]:
> 
> I don't see the point, I'm +0 on it.

Right, it's a lateral organizational change in the code. It's only important because the patch after it depends on running some code at the bottom of the toplevel_commit function, and the code getting moved has early returns.  If we want to keep the code in the same function we need to change the early returns to "goto outs" or drop the returns and make them skip over the parts of the code they were trying to avoid by returning early.  Moving it to a subroutine seemed like the cleanest approach to me (especially since that chunk of moved code is pretty self contained)

Comment 45 Ray Strode [halfline] 2016-02-17 18:52:03 UTC

(In reply to Jasper St. Pierre from comment #41)
> Review of attachment 321445 [details] [review] [review]:
> 
> I'm very confused why we need this. More explanation is definitely needed.

So it's confusing because the commit message is wrong. backstory is I noticed there may be a problem with cursor code, but put a pin in doing the actual investigation until later then forgot to follow up.

The problem is wl_pointer::set_cursor needs to access the buffer associated with the cursor surface, which won't be available if the buffer is already released.

Comment 46 Ray Strode [halfline] 2016-02-17 20:02:06 UTC

So Jasper and I and #wayland had a discussion about buffer semantics, and we ultimately settled on one problem with destroying the MetaWaylandBuffer at release time.  It means we have to recreate the texture and do a full buffer copy if there's attach/damage after the buffer has been released. It would be good to be able to optimize that case, such that only the damaged region is copied.

Given that, I think it makes sense to rework the patchset again, to use the more complicated use_count / ref_count split.

Comment 47 Ray Strode [halfline] 2016-02-17 21:46:33 UTC

Comment on attachment 321442 [details] [review]
wayland: return from toplevel commit early if no new buffer

Attachment 321442 [details] pushed as acd5050 - wayland: return from toplevel commit early if no new buffer

Comment 48 Jonas Ådahl 2016-02-18 01:00:58 UTC

(In reply to Ray Strode [halfline] from comment #46)
> So Jasper and I and #wayland had a discussion about buffer semantics, and we
> ultimately settled on one problem with destroying the MetaWaylandBuffer at
> release time.  It means we have to recreate the texture and do a full buffer
> copy if there's attach/damage after the buffer has been released. It would
> be good to be able to optimize that case, such that only the damaged region
> is copied.
> 
> Given that, I think it makes sense to rework the patchset again, to use the
> more complicated use_count / ref_count split.

Sorry for not looking at the patches earlier. Two separate ref counters is what I assumed was needed, yes, and that the MetaWaylandBuffer would indeed stay referenced by the surface attaches another buffer.

Comment 49 Ray Strode [halfline] 2016-02-18 14:42:20 UTC

right, I was just hoping we could simplify the code and bring it back to the original spirit of the way Jasper wrote it originally, but it didn't pan out.  Will do an update soon.

Comment 50 Ray Strode [halfline] 2016-02-18 19:27:57 UTC

Created attachment 321611 [details] [review]
wayland: get rid of buffer->copied_data boolean

We currently track whether or not a buffer can be released early
by looking at the copied_data boolean on the buffer.  This boolean
is, practically speaking, always set to TRUE for shm buffers and is
always false otherwise.

We can just as easily check if the buffer is a shm buffer to decide
whether or not to do an early release.  That's better from a
theoretical point of view since copied_data assumes a 1-to-1
relationship between surface and buffer, which may not actually hold.

This commit drops copied_data and changes the check to instead see
if the buffer is shm.

Comment 51 Ray Strode [halfline] 2016-02-18 19:30:25 UTC

Created attachment 321612 [details] [review]
wayland: change accessible boolean to use_count counter

Since a buffer can be used by multiple surfaces at once,
we need to release the buffer only after all surfaces
are finished with it.  Currently we track whether or
not to release the buffer based on the accessible boolean.
This commit changes it to a counter to accomodate multiple
users.

Also, each surface needs to know whether not it is done with
the buffer, so this commit adds a buffer_used boolean to the
surface state.

--

So we still have a problem with surfaces getting used
for pointers potentially getting released early.  It's kind
of a side issue, though, and a minor one at that, so I think
we can address it in a different bug.

Comment 52 Jonas Ådahl 2016-02-19 02:42:33 UTC

Review of attachment 321611 [details] [review]:

Looks good to me.

Comment 53 Jasper St. Pierre (not reading bugmail) 2016-02-19 02:58:31 UTC

Review of attachment 321612 [details] [review]:

::: src/wayland/meta-wayland-surface.c
@@ +178,3 @@
+surface_use_buffer (MetaWaylandSurface *surface)
+{
+  if (surface->using_buffer)

When would these ever hit? I feel like any case this would happen would also be done by "surface->buffer == buffer", no?

Comment 54 Jonas Ådahl 2016-02-19 02:59:11 UTC

Review of attachment 321612 [details] [review]:

Looks good to me.

Comment 55 Jonas Ådahl 2016-02-19 03:04:16 UTC

Review of attachment 321612 [details] [review]:

::: src/wayland/meta-wayland-surface.c
@@ +178,3 @@
+surface_use_buffer (MetaWaylandSurface *surface)
+{
+  if (surface->using_buffer)

The surface->buffer == buffer check is in surface_set_buffer(), while this is called from apply_pending_state().

I assume it's needed when

wl_surface.attach(eglbufferA);
wl_surface.damage();
wl_surface.commit();
(note: no wl_buffer.release() on eglbufferA
wl_surface.attach(eglbufferA);
wl_surface.damage();
wl_surface.commit();

While this means the client attached an in-use buffer with damage which is silly, we are already using it, and shouldn't add increase the use count.

Comment 56 Ray Strode [halfline] 2016-02-19 13:48:15 UTC

Review of attachment 321612 [details] [review]:

::: src/wayland/meta-wayland-surface.c
@@ +178,3 @@
+surface_use_buffer (MetaWaylandSurface *surface)
+{
+  if (surface->using_buffer)

there's a bug here though.  I shouldn't decrement the use_count by setting release_new_buffer = TRUE if I don't increment the use_count in surface_use_buffer.

Comment 57 Ray Strode [halfline] 2016-02-19 14:00:57 UTC

pushed with that change:

Attachment 321611 [details] pushed as e097bc8 - wayland: get rid of buffer->copied_data boolean
Attachment 321612 [details] pushed as bed8242 - wayland: change accessible boolean to use_count counter