GNOME Bugzilla – Bug 773274
[Wayland] Crash under gdk_wayland_window_attach_image()
Last modified: 2018-02-20 09:42:32 UTC
When I try to forward a message as an attachment in Evolution 3.22.1 on Wayland it crashes. Other methods of forwarding (inline, quoted) work fine. It happens with any message (both plain text and HTML). Backtrace:
+ Trace 236755
Thanks for a bug report. I tried to reproduce it here, but no luck. The backtrace shows the evolution only partly, then main() at the bottom is it, and then between the composer_load_signature_cb() and e_util_invoke_g_dbus_proxy_call_sync_wrapper_full(). The later iterates on the main context (either default or the thread default), while waiting for a response from the D-Bus call. Could that be a reason?
do you see this message: g_critical ("The window %p already has a drawing context. You cannot "• "call gdk_window_begin_draw_frame() without calling "• "gdk_window_end_draw_frame() first.", window);• in the log ?
*** Bug 773411 has been marked as a duplicate of this bug. ***
No I don't see this message. All that is printed in the console is Gdk:ERROR:gdkwindow-wayland.c:742:gdk_wayland_window_attach_image: assertion failed: (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))
Interestingly, when running evolution with --gtk-debug=all, the crash (otherwise reproducible 100% of the time) doesn't happen.
*** Bug 773624 has been marked as a duplicate of this bug. ***
I managed to reproduce the crash even with the quoted method of forwarding when there was an attached file present in the original message. The quoted method doesn't crash if you forward a message without an attachment. I get the same error: Gdk:ERROR:gdkwindow-wayland.c:742:gdk_wayland_window_attach_image: assertion failed: (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface)) No other error. Backtrace:
+ Trace 236797
Created attachment 338945 [details] [review] [PATCH] wayland: check valid pending cairo surface drop_cairo_surfaces() will nullify the drop_cairo_surfaces() (to avoid reusing the buffer later), but gdk_window_impl_wayland_end_paint() doesn't check if the drop_cairo_surfaces is still valid prior to calling gdk_wayland_window_attach_image(). As gdk_wayland_window_attach_image() assumes there is a valid pending cairo surface, make sure the staging cairo surface is stil lvalid prior to attaching the image, just like it's done in gdk_wayland_window_show() as well.
Can you try with this patch?
Review of attachment 338945 [details] [review]: Works for me
Jirko, here's a test build for Fedora 25 with the patch included: http://koji.fedoraproject.org/koji/taskinfo?taskID=16278829
but under what circumstance is drop_cairo_surfaces getting called before end_paint ?
(In reply to Ray Strode [halfline] from comment #12) > but under what circumstance is drop_cairo_surfaces getting called before > end_paint ? When the client is resized, from gdk_wayland_window_update_size(), which explains the reproducer from bug 773307 (which I used to investigate this one, so I reckon both are the same issue) Adding traces in gdk_window_impl_wayland_end_paint(), drop_cairo_surfaces(), gdk_wayland_window_update_size(), and gdk_wayland_window_destroy() (the last tow being the only caller of drop_cairo_surfaces()) shows: gdk_wayland_window_update_size() window 0x1c473440 surface (nil) drop_cairo_surfaces() window 0x1c473440 surface (nil) gdk_wayland_window_update_size() window 0x1c473440 surface (nil) drop_cairo_surfaces() window 0x1c473440 surface (nil) gdk_window_impl_wayland_end_paint() window 0x1c473440 surface 0x1c61e590 gdk_window_impl_wayland_end_paint() window 0x1c473440 surface 0x1c61e590 gdk_wayland_window_update_size() window 0x1c473440 surface 0x1c61e590 drop_cairo_surfaces() window 0x1c473440 surface 0x1c61e590 gdk_wayland_window_update_size() window 0x1c473440 surface 0x1c75e430 drop_cairo_surfaces() window 0x1c473440 surface 0x1c75e430 gdk_wayland_window_update_size() window 0x1c473440 surface (nil) drop_cairo_surfaces() window 0x1c473440 surface (nil) gdk_wayland_window_update_size() window 0x1c473440 surface (nil) drop_cairo_surfaces() window 0x1c473440 surface (nil) gdk_window_impl_wayland_end_paint() window 0x1c473440 surface (nil) Segmentation fault (core dumped)
i'm still confused, can you put a trace in begin_paint too ? somehow update_size is getting called after begin_paint but before end_paint ? Do we even rev the main loop in between those calls ever?
(In reply to Milan Crha from comment #11) > Jirko, here's a test build for Fedora 25 with the patch included: > http://koji.fedoraproject.org/koji/taskinfo?taskID=16278829 It seems to have fixed the issue, at least for me. I can't reproduce it any more.
(In reply to Ray Strode [halfline] from comment #14) > i'm still confused, can you put a trace in begin_paint too ? somehow > update_size is getting called after begin_paint but before end_paint ? Do we > even rev the main loop in between those calls ever? Sure thing, there is an update_size() in between begin_paint() and end_paint() : gdk_wayland_window_update_size() window 0x1c473be0 surface (nil) drop_cairo_surfaces() window 0x1c473be0 surface (nil) gdk_wayland_window_update_size() window 0x1c473be0 surface (nil) drop_cairo_surfaces() window 0x1c473be0 surface (nil) gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface (nil) gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface 0x1c61f1b0 gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface 0x1c61f1b0 gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface 0x1c61f1b0 gdk_wayland_window_update_size() window 0x1c473be0 surface 0x1c61f1b0 drop_cairo_surfaces() window 0x1c473be0 surface 0x1c61f1b0 gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface (nil) gdk_wayland_window_update_size() window 0x1c473be0 surface 0x1c736260 drop_cairo_surfaces() window 0x1c473be0 surface 0x1c736260 gdk_wayland_window_update_size() window 0x1c473be0 surface (nil) drop_cairo_surfaces() window 0x1c473be0 surface (nil) gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface (nil) Segmentation fault (core dumped)
Created attachment 338965 [details] Backtrace leading to update_size() being called between begin_paint() and end_paint() FWIW, it's the call to gdk_flush() that invokes a wl_display_roundtrip_queue() which in turn may call xdg_surface_configure() which will resize the window.
ohhhhh okay, where's the source to the example you're running? I wonder why its doing gdk_flush. If it continues to draw after the gdk_flush it's going to write to a cairo surface that won't ultimately get committed but I guess that's okay. As an aside, I think maybe we should have: g_clear_pointer (&impl->staged_updates_region, cairo_region_destroy); in drop_cairo_surfaces since the staged updates just got dropped.
Review of attachment 338945 [details] [review]: ::: gdk/wayland/gdkwindow-wayland.c @@ +906,3 @@ + if (impl->staging_cairo_surface && + _gdk_wayland_is_shm_surface (impl->staging_cairo_surface) && i don't think the is_shm_surface check is necessary ?
(In reply to Ray Strode [halfline] from comment #18) > ohhhhh okay, where's the source to the example you're running? I wonder why > its doing gdk_flush. The reproducer is the one attached in bug 773307. > If it continues to draw after the gdk_flush it's going to write to a cairo > surface that won't ultimately get committed but I guess that's okay. > > As an aside, I think maybe we should have: > > g_clear_pointer (&impl->staged_updates_region, cairo_region_destroy); > > in drop_cairo_surfaces since the staged updates just got dropped. Possibly but I'm not sure it's necessary for this particular bug, is it? (but I have no problem amending the patch if needed. or even add a separate patch for that) (In reply to Ray Strode [halfline] from comment #19) > Review of attachment 338945 [details] [review] [review]: > > ::: gdk/wayland/gdkwindow-wayland.c > @@ +906,3 @@ > > + if (impl->staging_cairo_surface && > + _gdk_wayland_is_shm_surface (impl->staging_cairo_surface) && > > i don't think the is_shm_surface check is necessary ? In theory, I agree, but it's how it's done elsewhere in gdk_wayland_window_ensure_cairo_surface(): https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c?h=gtk-3-22#n824 Besides, there is an assert() in gdk_wayland_window_attach_image() which would fail precisely if not a shm_surface(): https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c?h=gtk-3-22#n733
Hi, > The reproducer is the one attached in bug 773307. Oh right, stransky explains the problem there nicely, too. > Possibly but I'm not sure it's necessary for this particular bug, is it? > (but I have no problem amending the patch if needed. or even add a separate > patch for that) Nope, that was just a side point i noticed when i was sniffing around the code with the mindset of "end_paint might get bypassed now after your patch" > In theory, I agree, but it's how it's done elsewhere in > gdk_wayland_window_ensure_cairo_surface(): okay fair enough.
Review of attachment 338945 [details] [review]: You write "doesn't check if the drop_cairo_surfaces" but I think you mean the "staging_cairo_surface". But also, I wish the commit message went into the specific problem a little bit. Maybe something like "gdk_wayland_window_attach_image is normally called from gdk_window_end_paint to notify the compositor of newly staged drawing. If any of the drawing code inadvertently dispatches the wayland event loop (for instance with a gdk_flush call), then it's possible that by the time gdk_window_end_paint is called, the staged drawing is already destroyed. This commit bypasses the attach_image call in scenarios where the staged drawing is prematurely dropped"
Comment on attachment 338945 [details] [review] [PATCH] wayland: check valid pending cairo surface attachment 338945 [details] [review] pushed to git master with changes from comment 22 as commit 2324b96 - wayland: check valid pending cairo surface attachment 338945 [details] [review] pushed to branch gtk-3-22 with changes from comment 22 as commit 209e01f - wayland: check valid pending cairo surface
*** Bug 773307 has been marked as a duplicate of this bug. ***
*** Bug 772843 has been marked as a duplicate of this bug. ***
Guys, I still see this crash on Fedora 27 / gtk3-3.22.26-2.fc27.x86_64. Is it supposed to be fixed here? We're going to deploy Firefox 59 with native Wayland backend enabled, would be great to have this one fixed here.
There's a backtrace:
+ Trace 238372
It comes from expose event at gtkmain.c: 1832 case GDK_EXPOSE: 1833 if (event->any.window) 1834 gtk_widget_render (event_widget, event->any.window, event->expose.region); 1835 break;
(In reply to Martin Stransky from comment #26) > Guys, I still see this crash on Fedora 27 / gtk3-3.22.26-2.fc27.x86_64. Is > it supposed to be fixed here? Well, there could be more than one cause to the same crash (comment 10 and comment 15 both confirmed the fix) The last backtrace differs from the other ones, trace 236755 and trace 236797 show and assertion failed (g_assert (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))) whereas in trace 238372, the wl_surface is NULL and it's a segfault.
Thanks Olivier. I'm preparing Firefox 59 packages with Wayland backend for Fedora so you may be able to test those by yourself.
Created attachment 367641 [details] [review] [PATCH] wayland: bail out if wl_surface is invalid I guess that patch would avoid the problem from comment 27, can you try it?
Sure, will test that, thanks.
If that helps (and even if it doesn't), I reckon best would be to continue in a separate (new) bug, unlikely people will monitor an old (closed) bug... Also because I think it's a different issue/backtrace.
(In reply to Olivier Fourdan from comment #32) > If that helps (and even if it doesn't), I reckon best would be to continue > in a separate (new) bug, unlikely people will monitor an old (closed) bug... > Also because I think it's a different issue/backtrace. It seems to be fixed - filed a new Bug 793062. Please backport to F27 at least to enable Wayland browser run there.
Review of attachment 367641 [details] [review]: Seems it can still crash for the same reason in gdk_window_impl_wayland_end_paint()
Created attachment 368610 [details] [review] [PATCH] wayland: Don't paint if the window is unmapped That patch is much better imho as it doesn't interfere with the gdk processing (so we don't get gdk warnings), it avoids the crash by checking in the gdkwayland backend instead where we would crash (because on Wayland, if the window is unmapped, the Wayland surface is gone)
Comment on attachment 368610 [details] [review] [PATCH] wayland: Don't paint if the window is unmapped Blimey, wrong bug!