GNOME Bugzilla – Bug 784319
Sometimes tooltip_popup_timeout() in gtk/gtktooltip.c causes crash after closing a window
Last modified: 2017-07-12 10:17:18 UTC
Created attachment 354667 [details] [review] A patch to avoid the crash I'm now porting Firefox to Wayland/Weston on Renesas RZ/G1E. * Renesas RZ/G1E: http://elinux.org/RZ-G/Boards/SK-RZG1E * How to build BSP for it: http://elinux.org/RZ-G/Boards/Yocto_2.0 * Additional Yocto recipes to build Firefox: https://github.com/mozilla-japan/meta-browser/tree/firefox-52.1esr https://github.com/mozilla-japan/meta-gecko-embedded The version of GTK+ is 3.20.9. Sometimes tooltip_popup_timeout() in gtk/gtktooltip.c causes crash after closing a browser window. Although I'm not sure whether it's Firefox's fault or GTK+'s fault, I think GTK+ should ensure to avoid the crash. It seems that the code in gtk/gtktooltip.c doesn't observe destroying GdkWindow after registering a timeout function, it may be the cause of the bug. The attached patch seems work fine for me, it can avoid crash on my environment.
Created attachment 354668 [details] Stack trace of the crash
Looks like a dupe of bug 782283, unlikely this will be fixed in gtk+-3.20 though... Can you try with a current gtk+-3.22 ?
> Can you try with a current gtk+-3.22 ? Sorry, upgrading gtk+ for the board is hard due to the limitation of the GPU driver. gtk+-3.22 requires a later version of wayland although the driver is tied with the specific version of it (1.9 or 1.10). In addition I can't reproduce the bug on PC. Instead of using gtk+-3.22, I'll try the patch in bug 782283.
Or, alternatively, please provide a simple reproducer which exhibit the issue... Oddly, it's not obvious, based on the backtrace (attachment 354668 [details]), why this would affect Wayland alone, this occurs in gtk+ code, not gdk backend.
> Instead of using gtk+-3.22, I'll try the patch in bug 782283. Hmm, it's not effective for this bug, still reproduces. > Or, alternatively, please provide a simple reproducer which exhibit the issue... I'll try to find it...
It seems that the following issue seems same with this: https://bugzilla.redhat.com/show_bug.cgi?id=1467104
It may be a variant of https://bugzilla.gnome.org/show_bug.cgi?id=781391#c5 but for tooltips perhaps? AFAIK Jonas said that Firefox draw to already deleted window...maybe we can call gdk_window_is_destroyed() from Firefox before wl_surface.commit?
Yet nothing in either backtraces points toward gdk wayland backend, so how can we be sure this really is Wayland related? AFAICS, it looks like a race...
Ok, found a way to reproduce easily and that's related to Dnd I reckon. 1. Run Firefox profile manager 2. Create a new profile 3. initiate a fake DnD on the "Finish" button, i.e. press, move, release... -> firefox-wayland crashes with the exact same backtrace. Oddly, I see no tooltip being involved in this from a UI standpoint, the button never shows a tooltip there.
OK, I've just reproduced with the x11 backend so this is _not_ Wayland specific, but rather related to the use of CSD. To reproduce with x11, run: $ GDK_BACKEND=x11 GTK_CSD=1 firefox-wayland -ProfileManager And follow the reproducing steps from comment 9.
I tried to reproduce this with a simple gtk+ program, but failed to reproduce, so it could be /something/ that firefox does. From what I can see, we get an additional motion event for the gdk window which is about to be destroyed, so we end up in gtk_tooltip_handle_event_internal() with the event window being from the widget that has just been unrealized, so it will start a gtk_tooltip_start_delay() that will end up crashing. Obviously, attachment 354667 [details] [review] will avoid the crash, but I cannot really tell whether or not this is correct unless I understand what mechanism leads to this issue (i.e. I have the feeling that the patch is hiding the problem more than solving it, and I'm not even sure the problem is in gtk+ rather than firefox in the first place).
Ah! Using the gtk inspector shows the widget in question is the "MozContainer". In the backtrace at step #7:
+ Trace 237624
That Widget there 0x7f4252a8fe50 is the MozContainer according to the gtk+ inspector.
Olivier, thanks for looking at it. MozContainer is a custom Firefox widget, it's nested in a toplevel window and it holds live Firefox GdkWindow on Wayland or when toplevel window uses CSD (so it's complicated to draw to toplevel GdkWindow). The code is here: https://github.com/stransky/gecko-dev/blob/master/widget/gtk/mozcontainer.cpp
(In reply to Martin Stransky from comment #13) > Olivier, thanks for looking at it. MozContainer is a custom Firefox widget, > it's nested in a toplevel window and it holds live Firefox GdkWindow on > Wayland or when toplevel window uses CSD (so it's complicated to draw to > toplevel GdkWindow). Yeap, and this crash is with both Wayland and x11 with CSD.
Looking at that MozContainer, I find it weird that there is a moz_container_unrealize() for Wayland alone, and even there, it doesn't seem to do much: https://github.com/stransky/gecko-dev/blob/master/widget/gtk/mozcontainer.cpp#L383 Looking at gtk_window_unrealize() it will destroy and unregister its windows: https://git.gnome.org/browse/gtk+/tree/gtk/gtkwindow.c?h=gtk-3-22#n7448 I could be wrong, but considering that the issue is caused by a motion event coming from a MozContainer widget while the toplevel is being destroyed, I wonder if there might be some code missing in MozContainer.
Hmm, mozcontainer doesn't chain up the unrealize func like this: https://git.gnome.org/browse/gtk+/tree/gtk/gtkwindow.c?h=gtk-3-22#n7505
When the widget doesn't have its own GdkWindow (by not calling gtk_widget_set_has_window), a GdkWindow is created & destroyed by GtkWidget class: https://git.gnome.org/browse/gtk+/tree/gtk/gtkwidget.c#n10594 But when the child class doesn't chain up the unrealize func, the above code doesn't called so the window isn't destroyed. I think it's firefox's fault. I'll close this bug after I confirm it on my environment.
If fixing this in Firefox, please make sure the fix also works on x11/CSD as well.
moz_container_realize() calls gtk_widget_set_window(), do we need to extra call gtk_widget_set_has_window()? https://github.com/stransky/gecko-dev/blob/master/widget/gtk/mozcontainer.cpp#L373 Also the gtk_widget_set_has_window(container, FALSE) is called only when the toplevel GdkWindow is used for rendering: https://dxr.mozilla.org/mozilla-central/source/widget/gtk/nsWindow.cpp#3750
I see that the CSD/Wayland code path is under gtk_widget_get_has_window (widget) condition which looks like Gtk is aware about window owned by container: https://github.com/stransky/gecko-dev/blob/master/widget/gtk/mozcontainer.cpp#L334
(In reply to Martin Stransky from comment #20) > I see that the CSD/Wayland code path is under gtk_widget_get_has_window > (widget) condition which looks like Gtk is aware about window owned by > container: Sorry, you are right. My explanation about gtk_widget_get_has_window() is incorrect. gtk_widget_get_has_window() returns TRUE by default: https://git.gnome.org/browse/gtk+/tree/gtk/gtkwidget.c#n8202 (priv->no_window == 0 by default) In addition, when gtk_widget_get_has_window() is FALSE, a parent GdkWindow is used for the widget's GdkWindow, it doesn't create its own GdkWindow. On the other hand the explanation about "unrealize" is correct. I confirmed that it's not called so that no one destroy the GdkWindow of MozContainer. When MOZ_WAYLAND isn't defined, it's destroyed by the following line: https://git.gnome.org/browse/gtk+/tree/gtk/gtkwidget.c#n10589 But when I address it, sometimes another similar crash occurs. It's more rare than before... We should continue it at https://bugzilla.redhat.com/show_bug.cgi?id=1467104.
Now I confirmed that it's firefox-wayland's issue, not GTK+. (See https://bugzilla.redhat.com/show_bug.cgi?id=1467104) (In reply to Olivier Fourdan from comment #18) > If fixing this in Firefox, please make sure the fix also works on x11/CSD as > well. I confirmed it. Thank you for your effort.
Well, and why is the "unrealize" handler call missing here? I think the issue here is that the unrealize handler does not remove the GdkWindow created in realize handler, right? With this patch we'll leak the wayland surfaces here.
I added the GdkWindow delete to unrealized handler - please check if that fixes for you. https://github.com/stransky/gecko-dev/commit/dba7baee43fd9f75ae70c76f5ec4850f392cf0b9
moz_container_unrealize() is if defined(MOZ_WAYLAND) only, that won't work with x11 on CSD which is affected as well by this bug (comment 18).
(In reply to Olivier Fourdan from comment #25) > moz_container_unrealize() is if defined(MOZ_WAYLAND) only, that won't work > with x11 on CSD which is affected as well by this bug (comment 18). MOZ_WAYLAND is a build-time option - moz_container_unrealize() used when wayland enabled build is running on X11.
(In reply to Martin Stransky from comment #23) https://bugzilla.redhat.com/show_bug.cgi?id=1467104#c10