After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 773274 - [Wayland] Crash under gdk_wayland_window_attach_image()
[Wayland] Crash under gdk_wayland_window_attach_image()
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: Backend: Wayland
3.22.x
Other Linux
: Normal major
: ---
Assigned To: gtk-bugs
gtk-bugs
: 772843 773307 773411 773624 (view as bug list)
Depends on:
Blocks: WaylandRelated
 
 
Reported: 2016-10-20 13:27 UTC by Jiri Eischmann
Modified: 2018-02-20 09:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] wayland: check valid pending cairo surface (1.42 KB, patch)
2016-11-02 11:00 UTC, Olivier Fourdan
committed Details | Review
Backtrace leading to update_size() being called between begin_paint() and end_paint() (20.78 KB, text/plain)
2016-11-02 16:57 UTC, Olivier Fourdan
  Details
[PATCH] wayland: bail out if wl_surface is invalid (899 bytes, patch)
2018-01-30 13:08 UTC, Olivier Fourdan
needs-work Details | Review
[PATCH] wayland: Don't paint if the window is unmapped (1.19 KB, patch)
2018-02-20 09:41 UTC, Olivier Fourdan
none Details | Review

Description Jiri Eischmann 2016-10-20 13:27:52 UTC
When I try to forward a message as an attachment in Evolution 3.22.1 on Wayland it crashes. Other methods of forwarding (inline, quoted) work fine.
It happens with any message (both plain text and HTML).

Backtrace:

  • #0 raise
  • #1 abort
  • #2 g_assertion_message
  • #3 g_assertion_message_expr
    at gtestutils.c line 2452
  • #4 gdk_wayland_window_attach_image
  • #5 gdk_window_impl_wayland_end_paint
  • #6 gdk_window_end_paint_internal
  • #7 gdk_window_end_draw_frame
  • #8 gtk_widget_render
  • #9 gtk_main_do_event
  • #10 _gdk_event_emit
  • #11 _gdk_window_process_updates_recurse_helper
  • #12 gdk_window_process_updates_internal
  • #13 gdk_window_process_updates_with_mode
  • #17 <emit signal ??? on instance 0x7fff28299370 [GdkFrameClockIdle]>
    at gsignal.c line 3447
  • #18 gdk_frame_clock_paint_idle
  • #19 gdk_threads_dispatch
  • #20 g_timeout_dispatch
    at gmain.c line 4674
  • #21 g_main_dispatch
    at gmain.c line 3203
  • #22 g_main_context_dispatch
    at gmain.c line 3856
  • #23 g_main_context_iterate
    at gmain.c line 3929
  • #24 g_main_context_iteration
    at gmain.c line 3990
  • #25 e_util_invoke_g_dbus_proxy_call_sync_wrapper_full
    at e-misc-utils.c line 3812
  • #26 e_util_invoke_g_dbus_proxy_call_sync_wrapper_with_error_check
    at e-misc-utils.c line 3744
  • #27 webkit_editor_insert_signature
    at e-webkit-editor.c line 2316
  • #28 composer_load_signature_cb
    at e-composer-private.c line 695
  • #29 g_simple_async_result_complete
    at gsimpleasyncresult.c line 801
  • #30 complete_in_idle_cb
    at gsimpleasyncresult.c line 813
  • #31 g_idle_dispatch
    at gmain.c line 5545
  • #32 g_main_dispatch
    at gmain.c line 3203
  • #33 g_main_context_dispatch
    at gmain.c line 3856
  • #34 g_main_context_iterate
    at gmain.c line 3929
  • #35 g_main_loop_run
    at gmain.c line 4125
  • #36 gtk_main
  • #37 main
    at main.c line 662

Comment 1 Milan Crha 2016-10-20 16:23:20 UTC
Thanks for a bug report. I tried to reproduce it here, but no luck. The backtrace shows the evolution only partly, then main() at the bottom is it, and then between the composer_load_signature_cb() and e_util_invoke_g_dbus_proxy_call_sync_wrapper_full(). The later iterates on the main context (either default or the thread default), while waiting for a response from the D-Bus call. Could that be a reason?
Comment 2 Ray Strode [halfline] 2016-10-21 18:49:25 UTC
do you see this message:

      g_critical ("The window %p already has a drawing context. You cannot "•
                  "call gdk_window_begin_draw_frame() without calling "•     
                  "gdk_window_end_draw_frame() first.", window);•            

in the log ?
Comment 3 Milan Crha 2016-10-24 20:54:59 UTC
*** Bug 773411 has been marked as a duplicate of this bug. ***
Comment 4 François Guerraz 2016-10-25 04:16:38 UTC
No I don't see this message. All that is printed in the console is

Gdk:ERROR:gdkwindow-wayland.c:742:gdk_wayland_window_attach_image: assertion failed: (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))
Comment 5 François Guerraz 2016-10-25 05:14:52 UTC
Interestingly, when running evolution with --gtk-debug=all, the crash (otherwise reproducible 100% of the time) doesn't happen.
Comment 6 Milan Crha 2016-10-31 16:06:16 UTC
*** Bug 773624 has been marked as a duplicate of this bug. ***
Comment 7 Jiri Eischmann 2016-11-01 08:46:34 UTC
I managed to reproduce the crash even with the quoted method of forwarding when there was an attached file present in the original message. The quoted method doesn't crash if you forward a message without an attachment.

I get the same error:
Gdk:ERROR:gdkwindow-wayland.c:742:gdk_wayland_window_attach_image: assertion failed: (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))

No other error.

Backtrace:

  • #0 raise
  • #1 abort
  • #2 g_assertion_message
  • #3 g_assertion_message_expr
    at gtestutils.c line 2452
  • #4 gdk_wayland_window_attach_image
  • #5 gdk_window_impl_wayland_end_paint
  • #6 gdk_window_end_paint_internal
  • #7 gdk_window_end_draw_frame
  • #8 gtk_widget_render
  • #9 gtk_main_do_event
  • #10 _gdk_event_emit
  • #11 _gdk_window_process_updates_recurse_helper
  • #12 gdk_window_process_updates_internal
  • #13 gdk_window_process_updates_with_mode
  • #17 <emit signal ??? on instance 0x55555580c7a0 [GdkFrameClockIdle]>
    at gsignal.c line 3447
  • #18 gdk_frame_clock_paint_idle
  • #19 gdk_threads_dispatch
  • #20 g_timeout_dispatch
  • #21 g_main_dispatch
    at gmain.c line 3203
  • #22 g_main_context_dispatch
    at gmain.c line 3856
  • #23 g_main_context_iterate
    at gmain.c line 3929
  • #24 g_main_loop_run
    at gmain.c line 4125
  • #25 gtk_main
  • #26 main
    at main.c line 662

Comment 8 Olivier Fourdan 2016-11-02 11:00:18 UTC
Created attachment 338945 [details] [review]
[PATCH] wayland: check valid pending cairo surface

drop_cairo_surfaces() will nullify the drop_cairo_surfaces() (to avoid
reusing the buffer later), but gdk_window_impl_wayland_end_paint()
doesn't check if the drop_cairo_surfaces is still valid prior to calling
gdk_wayland_window_attach_image().

As gdk_wayland_window_attach_image() assumes there is a valid pending
cairo surface, make sure the staging cairo surface is stil lvalid prior
to attaching the image, just like it's done in gdk_wayland_window_show()
as well.
Comment 9 Olivier Fourdan 2016-11-02 11:00:53 UTC
Can you try with this patch?
Comment 10 François Guerraz 2016-11-02 11:47:43 UTC
Review of attachment 338945 [details] [review]:

Works for me
Comment 11 Milan Crha 2016-11-02 13:30:33 UTC
Jirko, here's a test build for Fedora 25 with the patch included:
http://koji.fedoraproject.org/koji/taskinfo?taskID=16278829
Comment 12 Ray Strode [halfline] 2016-11-02 14:42:36 UTC
but under what circumstance is drop_cairo_surfaces getting called before end_paint ?
Comment 13 Olivier Fourdan 2016-11-02 15:13:31 UTC
(In reply to Ray Strode [halfline] from comment #12)
> but under what circumstance is drop_cairo_surfaces getting called before
> end_paint ?

When the client is resized, from gdk_wayland_window_update_size(), which explains the reproducer from bug 773307 (which I used to investigate this one, so I reckon both are the same issue)

Adding traces in gdk_window_impl_wayland_end_paint(), drop_cairo_surfaces(), gdk_wayland_window_update_size(), and gdk_wayland_window_destroy() (the last tow being the only caller of drop_cairo_surfaces()) shows:

gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473440 surface 0x1c61e590
gdk_window_impl_wayland_end_paint() window 0x1c473440 surface 0x1c61e590
gdk_wayland_window_update_size() window 0x1c473440 surface 0x1c61e590
drop_cairo_surfaces() window 0x1c473440 surface 0x1c61e590
gdk_wayland_window_update_size() window 0x1c473440 surface 0x1c75e430
drop_cairo_surfaces() window 0x1c473440 surface 0x1c75e430
gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_wayland_window_update_size() window 0x1c473440 surface (nil)
drop_cairo_surfaces() window 0x1c473440 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473440 surface (nil)
Segmentation fault (core dumped)
Comment 14 Ray Strode [halfline] 2016-11-02 15:19:51 UTC
i'm still confused, can you put a trace in begin_paint too ? somehow update_size is getting called after begin_paint but before end_paint ? Do we even rev the main loop in between those calls ever?
Comment 15 Jiri Eischmann 2016-11-02 15:50:58 UTC
(In reply to Milan Crha from comment #11)
> Jirko, here's a test build for Fedora 25 with the patch included:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=16278829

It seems to have fixed the issue, at least for me. I can't reproduce it any more.
Comment 16 Olivier Fourdan 2016-11-02 16:23:28 UTC
(In reply to Ray Strode [halfline] from comment #14)
> i'm still confused, can you put a trace in begin_paint too ? somehow
> update_size is getting called after begin_paint but before end_paint ? Do we
> even rev the main loop in between those calls ever?

Sure thing, there is an update_size() in between begin_paint() and end_paint() :

gdk_wayland_window_update_size() window 0x1c473be0 surface (nil)
drop_cairo_surfaces() window 0x1c473be0 surface (nil)
gdk_wayland_window_update_size() window 0x1c473be0 surface (nil)
drop_cairo_surfaces() window 0x1c473be0 surface (nil)
gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface 0x1c61f1b0
gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface 0x1c61f1b0
gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface 0x1c61f1b0
gdk_wayland_window_update_size() window 0x1c473be0 surface 0x1c61f1b0
drop_cairo_surfaces() window 0x1c473be0 surface 0x1c61f1b0
gdk_window_impl_wayland_begin_paint() window 0x1c473be0 surface (nil)
gdk_wayland_window_update_size() window 0x1c473be0 surface 0x1c736260
drop_cairo_surfaces() window 0x1c473be0 surface 0x1c736260
gdk_wayland_window_update_size() window 0x1c473be0 surface (nil)
drop_cairo_surfaces() window 0x1c473be0 surface (nil)
gdk_window_impl_wayland_end_paint() window 0x1c473be0 surface (nil)
Segmentation fault (core dumped)
Comment 17 Olivier Fourdan 2016-11-02 16:57:40 UTC
Created attachment 338965 [details]
Backtrace leading to update_size() being called between begin_paint() and end_paint()

FWIW, it's the call to gdk_flush() that invokes a wl_display_roundtrip_queue() which in turn may call xdg_surface_configure() which will resize the window.
Comment 18 Ray Strode [halfline] 2016-11-02 17:40:29 UTC
ohhhhh okay, where's the source to the example you're running?  I wonder why its doing gdk_flush.

If it continues to draw after the gdk_flush it's going to write to a cairo surface that won't ultimately get committed but I guess that's okay.

As an aside, I think maybe we should have:

  g_clear_pointer (&impl->staged_updates_region, cairo_region_destroy);

in drop_cairo_surfaces since the staged updates just got dropped.
Comment 19 Ray Strode [halfline] 2016-11-02 17:42:31 UTC
Review of attachment 338945 [details] [review]:

::: gdk/wayland/gdkwindow-wayland.c
@@ +906,3 @@
 
+  if (impl->staging_cairo_surface &&
+      _gdk_wayland_is_shm_surface (impl->staging_cairo_surface) &&

i don't think the is_shm_surface check is necessary ?
Comment 20 Olivier Fourdan 2016-11-02 19:18:23 UTC
(In reply to Ray Strode [halfline] from comment #18)
> ohhhhh okay, where's the source to the example you're running?  I wonder why
> its doing gdk_flush.

The reproducer is the one attached in bug 773307.
 
> If it continues to draw after the gdk_flush it's going to write to a cairo
> surface that won't ultimately get committed but I guess that's okay.
> 
> As an aside, I think maybe we should have:
> 
>   g_clear_pointer (&impl->staged_updates_region, cairo_region_destroy);
> 
> in drop_cairo_surfaces since the staged updates just got dropped.

Possibly but I'm not sure it's necessary for this particular bug, is it? (but I have no problem amending the patch if needed. or even add a separate patch for that)

(In reply to Ray Strode [halfline] from comment #19)
> Review of attachment 338945 [details] [review] [review]:
> 
> ::: gdk/wayland/gdkwindow-wayland.c
> @@ +906,3 @@
>  
> +  if (impl->staging_cairo_surface &&
> +      _gdk_wayland_is_shm_surface (impl->staging_cairo_surface) &&
> 
> i don't think the is_shm_surface check is necessary ?

In theory, I agree, but it's how it's done elsewhere in gdk_wayland_window_ensure_cairo_surface():

https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c?h=gtk-3-22#n824

Besides, there is an assert() in gdk_wayland_window_attach_image() which would fail precisely if not a shm_surface():

https://git.gnome.org/browse/gtk+/tree/gdk/wayland/gdkwindow-wayland.c?h=gtk-3-22#n733
Comment 21 Ray Strode [halfline] 2016-11-02 20:36:14 UTC
Hi,
> The reproducer is the one attached in bug 773307.
Oh right, stransky explains the problem there nicely, too.

> Possibly but I'm not sure it's necessary for this particular bug, is it?
> (but I have no problem amending the patch if needed. or even add a separate
> patch for that)
Nope, that was just a side point i noticed when i was sniffing around the code with the mindset of "end_paint might get bypassed now after your patch"

> In theory, I agree, but it's how it's done elsewhere in
> gdk_wayland_window_ensure_cairo_surface():
okay fair enough.
Comment 22 Ray Strode [halfline] 2016-11-02 20:48:36 UTC
Review of attachment 338945 [details] [review]:

You write "doesn't check if the drop_cairo_surfaces" but I think you mean the "staging_cairo_surface".  But also, I wish the commit message went into the specific problem a little bit. Maybe something like "gdk_wayland_window_attach_image is normally called from gdk_window_end_paint to notify the compositor of newly staged drawing.  If any of the drawing code inadvertently dispatches the wayland event loop (for instance with a gdk_flush call), then it's possible that by the time gdk_window_end_paint is called, the staged drawing is already destroyed.  This commit bypasses the attach_image call in scenarios where the staged drawing is prematurely dropped"
Comment 23 Olivier Fourdan 2016-11-03 07:53:56 UTC
Comment on attachment 338945 [details] [review]
[PATCH] wayland: check valid pending cairo surface

attachment 338945 [details] [review] pushed to git master with changes from comment 22 as commit 2324b96 - wayland: check valid pending cairo surface

attachment 338945 [details] [review] pushed to branch gtk-3-22 with changes from comment 22 as commit 209e01f - wayland: check valid pending cairo surface
Comment 24 Olivier Fourdan 2016-11-03 07:57:18 UTC
*** Bug 773307 has been marked as a duplicate of this bug. ***
Comment 25 Michael Catanzaro 2016-12-13 15:42:04 UTC
*** Bug 772843 has been marked as a duplicate of this bug. ***
Comment 26 Martin Stransky 2018-01-30 11:29:44 UTC
Guys, I still see this crash on Fedora 27 / gtk3-3.22.26-2.fc27.x86_64. Is it supposed to be fixed here?

We're going to deploy Firefox 59 with native Wayland backend enabled, would be great to have this one fixed here.
Comment 27 Martin Stransky 2018-01-30 11:33:04 UTC
There's a backtrace:

  • #0 __GI___nanosleep
    at ../sysdeps/unix/sysv/linux/nanosleep.c line 27
  • #1 __sleep
    at ../sysdeps/posix/sleep.c line 55
  • #2 ah_crap_handler(int)
    at /home/komat/tmp676-trunk-gtk3/src1/toolkit/xre/nsSigHandlers.cpp line 102
  • #3 nsProfileLock::FatalSignalHandler(int, siginfo_t*, void*)
    at /home/komat/tmp676-trunk-gtk3/src1/toolkit/profile/nsProfileLock.cpp line 187
  • #4 js::UnixExceptionHandler(int, siginfo_t*, void*)
    at /home/komat/tmp676-trunk-gtk3/src1/js/src/ds/MemoryProtectionExceptionHandler.cpp line 263
  • #5 WasmFaultHandler(int, siginfo_t*, void*)
    at /home/komat/tmp676-trunk-gtk3/src1/js/src/wasm/WasmSignalHandlers.cpp line 1459
  • #6 <signal handler called>
  • #7 wl_proxy_marshal
  • #8 wl_surface_attach
    at /usr/include/wayland-client-protocol.h line 3446
  • #9 gdk_wayland_window_attach_image
    at gdkwindow-wayland.c line 755
  • #10 gdk_window_impl_wayland_end_paint
    at gdkwindow-wayland.c line 925
  • #11 gdk_window_end_paint_internal
    at gdkwindow.c line 3021
  • #12 gdk_window_end_draw_frame
    at gdkwindow.c line 3289
  • #13 gtk_widget_render
    at gtkwidget.c line 17522
  • #14 gtk_main_do_event
    at gtkmain.c line 1834
  • #15 _gdk_event_emit
    at gdkevents.c line 73
  • #16 _gdk_window_process_updates_recurse_helper
    at gdkwindow.c line 3852
  • #17 _gdk_window_process_updates_recurse
    at gdkwindow.c line 3909
  • #18 gdk_window_impl_process_updates_recurse
    at gdkwindowimpl.c line 333
  • #19 gdk_window_process_updates_internal
    at gdkwindow.c line 3998
  • #20 gdk_window_process_updates_with_mode
    at gdkwindow.c line 4192
  • #21 gdk_window_paint_on_clock
    at gdkwindow.c line 11699
  • #25 <emit signal ??? on instance 0x7f1ffbc93a90 [GdkFrameClockIdle]>
    at gsignal.c line 3447
  • #26 _gdk_frame_clock_emit_paint
    at gdkframeclock.c line 640
  • #27 gdk_frame_clock_paint_idle
    at gdkframeclockidle.c line 430
  • #28 gdk_threads_dispatch
    at gdk.c line 743
  • #29 g_timeout_dispatch
    at gmain.c line 4615
  • #30 g_main_dispatch
    at gmain.c line 3142
  • #31 g_main_context_dispatch
    at gmain.c line 3795
  • #32 g_main_context_iterate
    at gmain.c line 3868
  • #33 g_main_context_iteration
    at gmain.c line 3929

It comes from expose event at gtkmain.c:

1832	    case GDK_EXPOSE:
1833	      if (event->any.window)
1834	        gtk_widget_render (event_widget, event->any.window, event->expose.region);
1835	      break;
Comment 28 Olivier Fourdan 2018-01-30 12:51:51 UTC
(In reply to Martin Stransky from comment #26)
> Guys, I still see this crash on Fedora 27 / gtk3-3.22.26-2.fc27.x86_64. Is
> it supposed to be fixed here?

Well, there could be more than one cause to the same crash (comment 10 and comment 15 both confirmed the fix)

The last backtrace differs from the other ones, trace 236755 and trace 236797 show and assertion failed (g_assert (_gdk_wayland_is_shm_surface (impl->staging_cairo_surface))) whereas in trace 238372, the wl_surface is NULL and it's a segfault.
Comment 29 Martin Stransky 2018-01-30 12:57:03 UTC
Thanks Olivier. I'm preparing Firefox 59 packages with Wayland backend for Fedora so you may be able to test those by yourself.
Comment 30 Olivier Fourdan 2018-01-30 13:08:55 UTC
Created attachment 367641 [details] [review]
[PATCH] wayland: bail out if wl_surface is invalid

I guess that patch would avoid the problem from comment 27, can you try it?
Comment 31 Martin Stransky 2018-01-30 13:14:52 UTC
Sure, will test that, thanks.
Comment 32 Olivier Fourdan 2018-01-30 13:33:03 UTC
If that helps (and even if it doesn't), I reckon best would be to continue in a separate (new) bug, unlikely people will monitor an old (closed) bug... Also because I think it's a different issue/backtrace.
Comment 33 Martin Stransky 2018-01-31 12:27:35 UTC
(In reply to Olivier Fourdan from comment #32)
> If that helps (and even if it doesn't), I reckon best would be to continue
> in a separate (new) bug, unlikely people will monitor an old (closed) bug...
> Also because I think it's a different issue/backtrace.

It seems to be fixed - filed a new Bug 793062. Please backport to F27 at least to enable Wayland browser run there.
Comment 34 Olivier Fourdan 2018-01-31 13:43:03 UTC
Review of attachment 367641 [details] [review]:

Seems it can still crash for the same reason in gdk_window_impl_wayland_end_paint()
Comment 35 Olivier Fourdan 2018-02-20 09:41:33 UTC
Created attachment 368610 [details] [review]
[PATCH] wayland: Don't paint if the window is unmapped

That patch is much better imho as it doesn't interfere with the gdk processing (so we don't get gdk warnings), it avoids the crash by checking in the gdkwayland backend instead where we would crash (because on Wayland, if the window is unmapped, the Wayland surface is gone)
Comment 36 Olivier Fourdan 2018-02-20 09:42:32 UTC
Comment on attachment 368610 [details] [review]
[PATCH] wayland: Don't paint if the window is unmapped

Blimey, wrong bug!