Bug 773274 – [Wayland] Crash under gdk_wayland_window_attach_image()

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 773274 - [Wayland] Crash under gdk_wayland_window_attach_image()


Summary:	[Wayland] Crash under gdk_wayland_window_attach_image()


Status:	RESOLVED FIXED

Product:	gtk+
Classification:	Platform
Component:	Backend: Wayland
Version:	3.22.x
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	gtk-bugs
QA Contact:	gtk-bugs

URL:
Whiteboard:

Duplicates:	772843 773307 773411 773624 (view as bug list)
Depends on:
Blocks:	WaylandRelated

Reported:	2016-10-20 13:27 UTC by Jiri Eischmann
Modified:	2018-02-20 09:42 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH] wayland: check valid pending cairo surface (1.42 KB, patch) 2016-11-02 11:00 UTC, Olivier Fourdan	committed	Details \| Review
Backtrace leading to update_size() being called between begin_paint() and end_paint() (20.78 KB, text/plain) 2016-11-02 16:57 UTC, Olivier Fourdan		Details
[PATCH] wayland: bail out if wl_surface is invalid (899 bytes, patch) 2018-01-30 13:08 UTC, Olivier Fourdan	needs-work	Details \| Review
[PATCH] wayland: Don't paint if the window is unmapped (1.19 KB, patch) 2018-02-20 09:41 UTC, Olivier Fourdan	none	Details \| Review

Description Jiri Eischmann 2016-10-20 13:27:52 UTC

When I try to forward a message as an attachment in Evolution 3.22.1 on Wayland it crashes. Other methods of forwarding (inline, quoted) work fine.
It happens with any message (both plain text and HTML).

Backtrace:

+ Trace 236755

#0 raise
#1 abort
#2 g_assertion_message
#3 g_assertion_message_expr
at gtestutils.c line 2452
#4 gdk_wayland_window_attach_image
#5 gdk_window_impl_wayland_end_paint
#6 gdk_window_end_paint_internal
#7 gdk_window_end_draw_frame
#8 gtk_widget_render
#9 gtk_main_do_event
#10 _gdk_event_emit
#11 _gdk_window_process_updates_recurse_helper
#12 gdk_window_process_updates_internal
#13 gdk_window_process_updates_with_mode
#17 <emit signal ??? on instance 0x7fff28299370 [GdkFrameClockIdle]>
at gsignal.c line 3447
#18 gdk_frame_clock_paint_idle
#19 gdk_threads_dispatch
#20 g_timeout_dispatch
at gmain.c line 4674
#21 g_main_dispatch
at gmain.c line 3203
#22 g_main_context_dispatch
at gmain.c line 3856
#23 g_main_context_iterate
at gmain.c line 3929
#24 g_main_context_iteration
at gmain.c line 3990
#25 e_util_invoke_g_dbus_proxy_call_sync_wrapper_full
at e-misc-utils.c line 3812
#26 e_util_invoke_g_dbus_proxy_call_sync_wrapper_with_error_check
at e-misc-utils.c line 3744
#27 webkit_editor_insert_signature
at e-webkit-editor.c line 2316
#28 composer_load_signature_cb
at e-composer-private.c line 695
#29 g_simple_async_result_complete
at gsimpleasyncresult.c line 801
#30 complete_in_idle_cb
at gsimpleasyncresult.c line 813
#31 g_idle_dispatch
at gmain.c line 5545
#32 g_main_dispatch
at gmain.c line 3203
#33 g_main_context_dispatch
at gmain.c line 3856
#34 g_main_context_iterate
at gmain.c line 3929
#35 g_main_loop_run
at gmain.c line 4125
#36 gtk_main
#37 main
at main.c line 662

Comment 1 Milan Crha 2016-10-20 16:23:20 UTC

Thanks for a bug report. I tried to reproduce it here, but no luck. The backtrace shows the evolution only partly, then main() at the bottom is it, and then between the composer_load_signature_cb() and e_util_invoke_g_dbus_proxy_call_sync_wrapper_full(). The later iterates on the main context (either default or the thread default), while waiting for a response from the D-Bus call. Could that be a reason?

Comment 2 Ray Strode [halfline] 2016-10-21 18:49:25 UTC

do you see this message:

      g_critical ("The window %p already has a drawing context. You cannot "•
                  "call gdk_window_begin_draw_frame() without calling "•     
                  "gdk_window_end_draw_frame() first.", window);•            

in the log ?

Comment 3 Milan Crha 2016-10-24 20:54:59 UTC

*** Bug 773411 has been marked as a duplicate of this bug. ***

Comment 4 François Guerraz 2016-10-25 04:16:38 UTC

No I don't see this message. All that is printed in the console is

Gdk:ERROR:gdkwindow-wayland.c:742:gdk_wayland_window_attach_image: assertion failed: (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))

Comment 5 François Guerraz 2016-10-25 05:14:52 UTC

Interestingly, when running evolution with --gtk-debug=all, the crash (otherwise reproducible 100% of the time) doesn't happen.

Comment 6 Milan Crha 2016-10-31 16:06:16 UTC

*** Bug 773624 has been marked as a duplicate of this bug. ***

Comment 7 Jiri Eischmann 2016-11-01 08:46:34 UTC

I managed to reproduce the crash even with the quoted method of forwarding when there was an attached file present in the original message. The quoted method doesn't crash if you forward a message without an attachment.

I get the same error:
Gdk:ERROR:gdkwindow-wayland.c:742:gdk_wayland_window_attach_image: assertion failed: (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))

No other error.

Backtrace:

+ Trace 236797

#0 raise
#1 abort
#2 g_assertion_message
#3 g_assertion_message_expr
at gtestutils.c line 2452
#4 gdk_wayland_window_attach_image
#5 gdk_window_impl_wayland_end_paint
#6 gdk_window_end_paint_internal
#7 gdk_window_end_draw_frame
#8 gtk_widget_render
#9 gtk_main_do_event
#10 _gdk_event_emit
#11 _gdk_window_process_updates_recurse_helper
#12 gdk_window_process_updates_internal
#13 gdk_window_process_updates_with_mode
#17 <emit signal ??? on instance 0x55555580c7a0 [GdkFrameClockIdle]>
at gsignal.c line 3447
#18 gdk_frame_clock_paint_idle
#19 gdk_threads_dispatch
#20 g_timeout_dispatch
#21 g_main_dispatch
at gmain.c line 3203
#22 g_main_context_dispatch
at gmain.c line 3856
#23 g_main_context_iterate
at gmain.c line 3929
#24 g_main_loop_run
at gmain.c line 4125
#25 gtk_main
#26 main
at main.c line 662

Comment 8 Olivier Fourdan 2016-11-02 11:00:18 UTC

Created attachment 338945 [details] [review]
[PATCH] wayland: check valid pending cairo surface

drop_cairo_surfaces() will nullify the drop_cairo_surfaces() (to avoid
reusing the buffer later), but gdk_window_impl_wayland_end_paint()
doesn't check if the drop_cairo_surfaces is still valid prior to calling
gdk_wayland_window_attach_image().

As gdk_wayland_window_attach_image() assumes there is a valid pending
cairo surface, make sure the staging cairo surface is stil lvalid prior
to attaching the image, just like it's done in gdk_wayland_window_show()
as well.

Comment 9 Olivier Fourdan 2016-11-02 11:00:53 UTC

Can you try with this patch?

Comment 10 François Guerraz 2016-11-02 11:47:43 UTC

Review of attachment 338945 [details] [review]:

Works for me

Comment 11 Milan Crha 2016-11-02 13:30:33 UTC

Jirko, here's a test build for Fedora 25 with the patch included:
http://koji.fedoraproject.org/koji/taskinfo?taskID=16278829

Comment 12 Ray Strode [halfline] 2016-11-02 14:42:36 UTC

but under what circumstance is drop_cairo_surfaces getting called before end_paint ?

Comment 13 Olivier Fourdan 2016-11-02 15:13:31 UTC

(In reply to Ray Strode [halfline] from comment #12)
> but under what circumstance is drop_cairo_surfaces getting called before
> end_paint ?

When the client is resized, from gdk_wayland_window_update_size(), which explains the reproducer from bug 773307 (which I used to investigate this one, so I reckon both are the same issue)

Adding traces in gdk_window_impl_wayland_end_paint(), drop_cairo_surfaces(), gdk_wayland_window_update_size(), and gdk_wayland_window_destroy() (the last tow being the only caller of drop_cairo_surfaces()) shows:

gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473440 surface 0x1c61e590
gdk_window_impl_wayland_end_paint() window 0x1c473440 surface 0x1c61e590
gdk_wayland_window_update_size() window 0x1c473440 surface 0x1c61e590
drop_cairo_surfaces() window 0x1c473440 surface 0x1c61e590
gdk_wayland_window_update_size() window 0x1c473440 surface 0x1c75e430
drop_cairo_surfaces() window 0x1c473440 surface 0x1c75e430
gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473440 surface (nil)
Segmentation fault (core dumped)

Comment 14 Ray Strode [halfline] 2016-11-02 15:19:51 UTC

i'm still confused, can you put a trace in begin_paint too ? somehow update_size is getting called after begin_paint but before end_paint ? Do we even rev the main loop in between those calls ever?

Comment 15 Jiri Eischmann 2016-11-02 15:50:58 UTC

(In reply to Milan Crha from comment #11)
> Jirko, here's a test build for Fedora 25 with the patch included:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=16278829

It seems to have fixed the issue, at least for me. I can't reproduce it any more.

Comment 16 Olivier Fourdan 2016-11-02 16:23:28 UTC

(In reply to Ray Strode [halfline] from comment #14)
> i'm still confused, can you put a trace in begin_paint too ? somehow
> update_size is getting called after begin_paint but before end_paint ? Do we
> even rev the main loop in between those calls ever?

Sure thing, there is an update_size() in between begin_paint() and end_paint() :

gdk_wayland_window_update_size() window 0x1c473be0 surface (nil)
drop_cairo_surfaces() window 0x1c473be0 surface (nil)
gdk_wayland_window_update_size() window 0x1c473be0 surface (nil)
drop_cairo_surfaces() window 0x1c473be0 surface (nil)
gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface 0x1c61f1b0
gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface 0x1c61f1b0
gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface 0x1c61f1b0
gdk_wayland_window_update_size() window 0x1c473be0 surface 0x1c61f1b0
drop_cairo_surfaces() window 0x1c473be0 surface 0x1c61f1b0
gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface (nil)
gdk_wayland_window_update_size() window 0x1c473be0 surface 0x1c736260
drop_cairo_surfaces() window 0x1c473be0 surface 0x1c736260
gdk_wayland_window_update_size() window 0x1c473be0 surface (nil)
drop_cairo_surfaces() window 0x1c473be0 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface (nil)
Segmentation fault (core dumped)

Comment 17 Olivier Fourdan 2016-11-02 16:57:40 UTC

Created attachment 338965 [details]
Backtrace leading to update_size() being called between begin_paint() and end_paint()

FWIW, it's the call to gdk_flush() that invokes a wl_display_roundtrip_queue() which in turn may call xdg_surface_configure() which will resize the window.

Comment 18 Ray Strode [halfline] 2016-11-02 17:40:29 UTC

ohhhhh okay, where's the source to the example you're running?  I wonder why its doing gdk_flush.

If it continues to draw after the gdk_flush it's going to write to a cairo surface that won't ultimately get committed but I guess that's okay.

As an aside, I think maybe we should have:

  g_clear_pointer (&impl->staged_updates_region, cairo_region_destroy);

in drop_cairo_surfaces since the staged updates just got dropped.

Comment 19 Ray Strode [halfline] 2016-11-02 17:42:31 UTC

Review of attachment 338945 [details] [review]:

::: gdk/wayland/gdkwindow-wayland.c
@@ +906,3 @@
 
+  if (impl->staging_cairo_surface &&
+      _gdk_wayland_is_shm_surface (impl->staging_cairo_surface) &&

i don't think the is_shm_surface check is necessary ?

Comment 20 Olivier Fourdan 2016-11-02 19:18:23 UTC

(In reply to Ray Strode [halfline] from comment #18)
> ohhhhh okay, where's the source to the example you're running?  I wonder why
> its doing gdk_flush.

The reproducer is the one attached in bug 773307.
 
> If it continues to draw after the gdk_flush it's going to write to a cairo
> surface that won't ultimately get committed but I guess that's okay.
> 
> As an aside, I think maybe we should have:
> 
>   g_clear_pointer (&impl->staged_updates_region, cairo_region_destroy);
> 
> in drop_cairo_surfaces since the staged updates just got dropped.

Possibly but I'm not sure it's necessary for this particular bug, is it? (but I have no problem amending the patch if needed. or even add a separate patch for that)

(In reply to Ray Strode [halfline] from comment #19)
> Review of attachment 338945 [details] [review] [review]:
> 
> ::: gdk/wayland/gdkwindow-wayland.c
> @@ +906,3 @@
>  
> +  if (impl->staging_cairo_surface &&
> +      _gdk_wayland_is_shm_surface (impl->staging_cairo_surface) &&
> 
> i don't think the is_shm_surface check is necessary ?

In theory, I agree, but it's how it's done elsewhere in gdk_wayland_window_ensure_cairo_surface():

https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c?h=gtk-3-22#n824

Besides, there is an assert() in gdk_wayland_window_attach_image() which would fail precisely if not a shm_surface():

https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c?h=gtk-3-22#n733

Comment 21 Ray Strode [halfline] 2016-11-02 20:36:14 UTC

Hi,
> The reproducer is the one attached in bug 773307.
Oh right, stransky explains the problem there nicely, too.

> Possibly but I'm not sure it's necessary for this particular bug, is it?
> (but I have no problem amending the patch if needed. or even add a separate
> patch for that)
Nope, that was just a side point i noticed when i was sniffing around the code with the mindset of "end_paint might get bypassed now after your patch"

> In theory, I agree, but it's how it's done elsewhere in
> gdk_wayland_window_ensure_cairo_surface():
okay fair enough.

Comment 22 Ray Strode [halfline] 2016-11-02 20:48:36 UTC

Review of attachment 338945 [details] [review]:

You write "doesn't check if the drop_cairo_surfaces" but I think you mean the "staging_cairo_surface".  But also, I wish the commit message went into the specific problem a little bit. Maybe something like "gdk_wayland_window_attach_image is normally called from gdk_window_end_paint to notify the compositor of newly staged drawing.  If any of the drawing code inadvertently dispatches the wayland event loop (for instance with a gdk_flush call), then it's possible that by the time gdk_window_end_paint is called, the staged drawing is already destroyed.  This commit bypasses the attach_image call in scenarios where the staged drawing is prematurely dropped"

Comment 23 Olivier Fourdan 2016-11-03 07:53:56 UTC

Comment on attachment 338945 [details] [review]
[PATCH] wayland: check valid pending cairo surface

attachment 338945 [details] [review] pushed to git master with changes from comment 22 as commit 2324b96 - wayland: check valid pending cairo surface

attachment 338945 [details] [review] pushed to branch gtk-3-22 with changes from comment 22 as commit 209e01f - wayland: check valid pending cairo surface

Comment 24 Olivier Fourdan 2016-11-03 07:57:18 UTC

*** Bug 773307 has been marked as a duplicate of this bug. ***

Comment 25 Michael Catanzaro 2016-12-13 15:42:04 UTC

*** Bug 772843 has been marked as a duplicate of this bug. ***

Comment 26 Martin Stransky 2018-01-30 11:29:44 UTC

Guys, I still see this crash on Fedora 27 / gtk3-3.22.26-2.fc27.x86_64. Is it supposed to be fixed here?

We're going to deploy Firefox 59 with native Wayland backend enabled, would be great to have this one fixed here.

Comment 27 Martin Stransky 2018-01-30 11:33:04 UTC

There's a backtrace:

+ Trace 238372

#0 __GI___nanosleep
at ../sysdeps/unix/sysv/linux/nanosleep.c line 27
#1 __sleep
at ../sysdeps/posix/sleep.c line 55
#2 ah_crap_handler(int)
at /home/komat/tmp676-trunk-gtk3/src1/toolkit/xre/nsSigHandlers.cpp line 102
#3 nsProfileLock::FatalSignalHandler(int, siginfo_t*, void*)
at /home/komat/tmp676-trunk-gtk3/src1/toolkit/profile/nsProfileLock.cpp line 187
#4 js::UnixExceptionHandler(int, siginfo_t*, void*)
at /home/komat/tmp676-trunk-gtk3/src1/js/src/ds/MemoryProtectionExceptionHandler.cpp line 263
#5 WasmFaultHandler(int, siginfo_t*, void*)
at /home/komat/tmp676-trunk-gtk3/src1/js/src/wasm/WasmSignalHandlers.cpp line 1459
#6 <signal handler called>
#7 wl_proxy_marshal
#8 wl_surface_attach
at /usr/include/wayland-client-protocol.h line 3446
#9 gdk_wayland_window_attach_image
at gdkwindow-wayland.c line 755
#10 gdk_window_impl_wayland_end_paint
at gdkwindow-wayland.c line 925
#11 gdk_window_end_paint_internal
at gdkwindow.c line 3021
#12 gdk_window_end_draw_frame
at gdkwindow.c line 3289
#13 gtk_widget_render
at gtkwidget.c line 17522
#14 gtk_main_do_event
at gtkmain.c line 1834
#15 _gdk_event_emit
at gdkevents.c line 73
#16 _gdk_window_process_updates_recurse_helper
at gdkwindow.c line 3852
#17 _gdk_window_process_updates_recurse
at gdkwindow.c line 3909
#18 gdk_window_impl_process_updates_recurse
at gdkwindowimpl.c line 333
#19 gdk_window_process_updates_internal
at gdkwindow.c line 3998
#20 gdk_window_process_updates_with_mode
at gdkwindow.c line 4192
#21 gdk_window_paint_on_clock
at gdkwindow.c line 11699
#25 <emit signal ??? on instance 0x7f1ffbc93a90 [GdkFrameClockIdle]>
at gsignal.c line 3447
#26 _gdk_frame_clock_emit_paint
at gdkframeclock.c line 640
#27 gdk_frame_clock_paint_idle
at gdkframeclockidle.c line 430
#28 gdk_threads_dispatch
at gdk.c line 743
#29 g_timeout_dispatch
at gmain.c line 4615
#30 g_main_dispatch
at gmain.c line 3142
#31 g_main_context_dispatch
at gmain.c line 3795
#32 g_main_context_iterate
at gmain.c line 3868
#33 g_main_context_iteration
at gmain.c line 3929


It comes from expose event at gtkmain.c:

1832	    case GDK_EXPOSE:
1833	      if (event->any.window)
1834	        gtk_widget_render (event_widget, event->any.window, event->expose.region);
1835	      break;

Comment 28 Olivier Fourdan 2018-01-30 12:51:51 UTC

(In reply to Martin Stransky from comment #26)
> Guys, I still see this crash on Fedora 27 / gtk3-3.22.26-2.fc27.x86_64. Is
> it supposed to be fixed here?

Well, there could be more than one cause to the same crash (comment 10 and comment 15 both confirmed the fix)

The last backtrace differs from the other ones, trace 236755 and trace 236797 show and assertion failed (g_assert (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))) whereas in trace 238372, the wl_surface is NULL and it's a segfault.

Comment 29 Martin Stransky 2018-01-30 12:57:03 UTC

Thanks Olivier. I'm preparing Firefox 59 packages with Wayland backend for Fedora so you may be able to test those by yourself.

Comment 30 Olivier Fourdan 2018-01-30 13:08:55 UTC

Created attachment 367641 [details] [review]
[PATCH] wayland: bail out if wl_surface is invalid

I guess that patch would avoid the problem from comment 27, can you try it?

Comment 31 Martin Stransky 2018-01-30 13:14:52 UTC

Sure, will test that, thanks.

Comment 32 Olivier Fourdan 2018-01-30 13:33:03 UTC

If that helps (and even if it doesn't), I reckon best would be to continue in a separate (new) bug, unlikely people will monitor an old (closed) bug... Also because I think it's a different issue/backtrace.

Comment 33 Martin Stransky 2018-01-31 12:27:35 UTC

(In reply to Olivier Fourdan from comment #32)
> If that helps (and even if it doesn't), I reckon best would be to continue
> in a separate (new) bug, unlikely people will monitor an old (closed) bug...
> Also because I think it's a different issue/backtrace.

It seems to be fixed - filed a new Bug 793062. Please backport to F27 at least to enable Wayland browser run there.

Comment 34 Olivier Fourdan 2018-01-31 13:43:03 UTC

Review of attachment 367641 [details] [review]:

Seems it can still crash for the same reason in gdk_window_impl_wayland_end_paint()

Comment 35 Olivier Fourdan 2018-02-20 09:41:33 UTC

Created attachment 368610 [details] [review]
[PATCH] wayland: Don't paint if the window is unmapped

That patch is much better imho as it doesn't interfere with the gdk processing (so we don't get gdk warnings), it avoids the crash by checking in the gdkwayland backend instead where we would crash (because on Wayland, if the window is unmapped, the Wayland surface is gone)

Comment 36 Olivier Fourdan 2018-02-20 09:42:32 UTC

Comment on attachment 368610 [details] [review]
[PATCH] wayland: Don't paint if the window is unmapped

Blimey, wrong bug!