GNOME Bugzilla – Bug 727452
Wayland windows with non-trivial alpha get new content drawn over old
Last modified: 2015-11-17 20:36:01 UTC
GtkWindowImplWayland tries to optimize by doing the Gtk drawing directly into the same buffer it gives to Wayland, as long as Wayland is not using that buffer right now. Normally, this would be fine, because we clear that buffer to the window's background pattern with gdk_window_clear_backing_region() before painting it. However, if you're making a fancy composited shell with semitransparent bits, the background pattern might itself have an alpha channel. If it does, at the moment we just paint it over the top of whatever happens to be left in the buffer from the previous frame, which will shine through any areas with non-trivial alpha.
Created attachment 273425 [details] [review] gdk_window_impl_wayland_begin_paint_region: always use an intermediate buffer GtkWindowImplWayland tries to optimize by doing the Gtk drawing directly into the same buffer it gives to Wayland, as long as Wayland is not using that buffer right now. Normally, this would be fine, because we clear that buffer to the window's background pattern with gdk_window_clear_backing_region() before painting it. However, if you're making a fancy composited shell with semitransparent bits, the background pattern might itself have an alpha channel. If it does, at the moment we just paint it over the top of whatever happens to be left in the buffer from the previous frame, which will shine through any areas with non-trivial alpha. As a quick hack around this, return TRUE, which means Gtk will allocate a new, blank (rgba(0,0,0,0)) image surface with gdk_wayland_window_create_similar_image_surface(), do the Gtk-level drawing into that, and copy its contents into impl->cairo_surface when painting has finished. --- I know this is a hack, but it works, and might point someone towards a better-quality solution.
Created attachment 273427 [details] [review] [doesn't work] gdk_window_clear_backing_region: clear to rgba(0,0,0,0) before filling I hoped this would make the previous patch unnecessary, but it doesn't seem to work. I always get confused by alpha channels in Cairo; perhaps someone who understands this stuff could salvage it.
I suppose explicitly memset()ing the entire buffer to rgba(0,0,0,0) in gdk_window_impl_wayland_begin_paint_region() might also do the trick.
Review of attachment 273427 [details] [review]: ::: gdk/gdkwindow.c @@ +3064,3 @@ + cairo_save (cr); + cairo_set_source_rgb (cr, 0, 0, 0); + * the Wayland backend. Does CAIRO_OPERATOR_SOURCE work better?
Actually, all that should be unnecessary - try removing, well, all of it, and just changing the operator to CAIRO_OPERATOR_SOURCE (i.e. memcpy) before the cairo_fill(). No sense blending with undefined content (as the default operator is OVER).
a testcase demonstrating the problem would be great
(In reply to comment #6) > a testcase demonstrating the problem would be great What I've seen is that if you switch between focused and non-focused, the entire window gets repainted, which illustrates the problem described above as the drop shadows gets darker and darker.
(In reply to comment #5) > Actually, all that should be unnecessary - try removing, well, all of it, and > just changing the operator to CAIRO_OPERATOR_SOURCE (i.e. memcpy) before the > cairo_fill(). I feel as though that ought to work, so I tried this patch, but it doesn't seem to work: --- a/gdk/gdkwindow.c +++ b/gdk/gdkwindow.c @@ -3055,6 +3055,12 @@ gdk_window_clear_backing_region (GdkWindow *window, cairo_region_intersect (clip, region); gdk_cairo_region (cr, clip); + /* If the background pattern is not fully opaque, the default OVER operator + * would leave old content visible through the non-opaque regions. In + * practice the X11 backend is unaffected because it uses a new surface + * every time, but the Wayland backend is susceptible to this since + * it re-uses the same Cairo surface as much as possible (#727452). */ + cairo_set_operator (cr, CAIRO_OPERATOR_SOURCE); cairo_fill (cr); cairo_destroy (cr); (In reply to comment #6) > a testcase demonstrating the problem would be great Sorry, I'm doing drive-by bugfixing in some code that is not currently public, and I don't know enough about Wayland to construct a more minimal test case. I believe drawing a GtkProgressBar in gtk_progress_bar_pulse() mode, on a transparent toplevel window, would illustrate the bug? The darkening drop-shadow that Kristian described sounds like it might well be another symptom of the same thing. Kristian, does Attachment #273425 [details] fix that for you?
Created attachment 273458 [details] [review] gdk_window_impl_wayland_begin_paint_region: clear the buffer for each frame GtkWindowImplWayland tries to optimize by doing the Gtk drawing directly into the same buffer it gives to Wayland, as long as Wayland is not using that buffer right now. Normally, this would be fine, because we clear that buffer to the window's background pattern with gdk_window_clear_backing_region() before painting it. However, if you're making a fancy composited shell with semitransparent bits, the background pattern might itself have an alpha channel. If it does, at the moment we just paint it over the top of whatever happens to be left in the buffer from the previous frame, which will shine through any areas with non-trivial alpha. --- This also works, and might be a bit nicer for performance than Attachment #273425 [details].
Hmm, I see the shadow problem with weston, but not with mutter-wayland. I guess the buffer handling is slightly different between the two ?
Our environment has Weston; I'd be somewhat surprised if that makes a difference, but you never know...
In fact, weston --use-pixman does not exhibit the blackening shadow issue either.
That's because the behavior is specific to reusing surfaces. And in mutter and pixman-weston, the compositor doesn't release the surface immediately so GDK marks it as busy and falls back to the way we do it in X by using a similar surface. I believe regular weston releases the buffer pretty much immediately (as its contents are uploaded to a GL texture). So GDK reuses the buffer and because it isn't cleared, the bug happens. I'm not sure memsetting the whole buffer is correcct for the cases where we only redraw parts of the window. Also, if we wonder about performance, CAIRO_OPERATOR_CLEAR'ing its cairo surface should be faster as cairo has optimizations for cleared surfaces. I think the problem is that we have no definition inside GDK if begin_paint clears the cairo surface or returns a cairo surface with undefined contents.
(In reply to comment #13) > That's because the behavior is specific to reusing surfaces. And in mutter and > pixman-weston, the compositor doesn't release the surface immediately so GDK > marks it as busy and falls back to the way we do it in X by using a similar > surface. That makes sense - so Mutter and pixman-weston are basically in the same situation as when I patched out the fast-path altogether (Attachment #273425 [details]). If the slow-path is the only thing that gets tested in practice, perhaps it would be better to use that as a short-term solution. > I'm not sure memsetting the whole buffer is correcct for the cases where we > only redraw parts of the window. Hmm, good point. I did think "it's fine, if anything its semantics are closer to the slow path", but I think that may have been wrong reasoning - the slow path only copies the redrawn region from the similar surface (overwriting the corresponding part of the reusable surface), not the entire similar surface. > I think the problem is that we have no definition inside GDK if begin_paint > clears the cairo surface or returns a cairo surface with undefined contents. The comments, and the implementation in GdkWindow, suggest that it's meant to be OK for it to have undefined contents - so the patch from Comment #8 would be a more correct approach, if it worked (unfortunately it doesn't, and I still can't see why not).
(In reply to comment #14) > The comments, and the implementation in GdkWindow, suggest that it's meant to > be OK for it to have undefined contents - so the patch from Comment #8 would be > a more correct approach, if it worked (unfortunately it doesn't, and I still > can't see why not). > I think this is the approach we should take at least in the stable branch. But why doesn't this work? Your comment above says you don't use it because you feared it'd be slow.
(In reply to comment #15) > (In reply to comment #14) > > The comments, and the implementation in GdkWindow, suggest that it's meant to > > be OK for it to have undefined contents - so the patch from Comment #8 would be > > a more correct approach, if it worked (unfortunately it doesn't, and I still > > can't see why not). > > > I think this is the approach we should take at least in the stable branch. > > But why doesn't this work? Your comment above says you don't use it because you > feared it'd be slow. Sorry, I might have been unclear. I've put four patches on this bug: Attachment #273425 [details] ("#if 0" the fast-path to make it work like the X11 backend) works, but I worried that it would hurt performance by adding a copy into the most common case. On the other hand, it sounds as though in practice there's always a copy on Mutter-Wayland anyway. I think this is the only one so far that both works, and is theoretically correct. Attachment #273427 [details] doesn't actually seem to prevent the symptom for me, and I haven't been able to work out why not. The pseudo-patch in the text of Comment #8 is a more refined version of Attachment #273427 [details] in response to Daniel's comments, but also doesn't seem to work. Again, I haven't been able to work out why it doesn't do what I think it should. Attachment #273458 [details] prevented the symptom when I tried it, but you pointed out that it was wrong for the cases where we only redraw parts of the window, and I agree.
The mysterious shell where we had this problem is <http://blog.barisione.org/2014-04/maynard/>. We went, as a temporary measure, for Simon's first patch.
Review of attachment 273458 [details] [review]: Marking this patch as rejected. Benjamin Otte pointed out that it is incorrect, and I agree with his analysis.
Comment on attachment 273425 [details] [review] gdk_window_impl_wayland_begin_paint_region: always use an intermediate buffer Un-obsoleting this patch. It's the one we ended up using for Maynard, and it works.
I am not seeing any problems with the alpha window example in testgtk with todays gtk, either with mutter or with weston.