GNOME Bugzilla – Bug 678607
Defer widget unallocation until all get_style_context/get_pango_context references are gone
Last modified: 2018-01-10 20:17:52 UTC
The following code causes a segfault in pygobject-introspection 3.3.2 in Ubuntu oneiric nightly CD images: $ python >>> import gi >>> from gi.repository import Gtk >>> Gtk.MenuBar().get_style_context().get_color(Gtk.StateFlags.NORMAL) This was originally reported as a bug against gtimelog in Ubuntu, you can see the full output of Gtk+ and Python warnings as well as a stack trace with debug symbols at https://bugs.launchpad.net/gtimelog/+bug/1016212 Some background: I'm the author of GTimeLog. The crashing one-liner is my poor attempt to determine the color (dark/light) of the Gtk+ theme so that I could choose an appropriate image for an GtkStatusIcon when the user is running gnome-panel or Unity desktops.
I can confirm the same crash on Fedora Rawhide with pygobject3-3.3.2-1.fc18.x86_64.
It seems to be depending on the reference to the menu. This crashes: $ python3 -c 'from gi.repository import Gtk; print(Gtk.MenuBar().get_style_context().get_color(Gtk.StateFlags.NORMAL))' But this works: $ python3 -c 'from gi.repository import Gtk; m = Gtk.MenuBar(); print(m.get_style_context().get_color(Gtk.StateFlags.NORMAL))' <Gdk.Color(red=0.298039, green=0.298039, blue=0.298039, alpha=1.000000)>
I confirm that downgrading GTK from 3.5.4 to 3.4.2 fixes this. It does not seem dependent on the theme, it happens as well with Adwaita.
That sure smells like a reference counting bug.
git bisect determines that this commit triggers it: http://git.gnome.org/browse/gtk+/commit/?id=7f511f2b33f3dbf7e41b1e5dd467956bd6427b32 Will analyze further on Monday, whether it's a bug in GTK or pygobject.
When I revert this bit, it works again: r --- a/gtk/gtkwidget.c +++ b/gtk/gtkwidget.c @@ -13982,11 +13982,15 @@ GtkStyleContext * gtk_widget_get_style_context (GtkWidget *widget) { GtkWidgetPrivate *priv; + GtkWidgetPath *path; g_return_val_if_fail (GTK_IS_WIDGET (widget), NULL); priv = widget->priv; + /* updates style context if it exists already */ + path = gtk_widget_get_path (widget); + if (G_UNLIKELY (priv->context == NULL)) { GdkScreen *screen; @@ -13999,6 +14003,7 @@ gtk_widget_get_style_context (GtkWidget *widget) if (screen) gtk_style_context_set_screen (priv->context, screen); + gtk_style_context_set_path (priv->context, path); if (priv->parent) gtk_style_context_set_parent (priv->context, gtk_widget_get_style_context (priv->parent)); Benjamin, would you have an off-hand idea what goes wrong here? The relevant top of the stack trace:
+ Trace 230419
I was unclear in my previous comment:; the pasted patch already was the reversion. I. e. applying this makes it work again.
So what happens here is that gtk_style_context_get_property() stumbles over this assertion: g_return_if_fail (priv->widget != NULL || priv->widget_path != NULL); (-c:32551): Gtk-CRITICAL **: gtk_style_context_get_property: assertion `priv->widget != NULL || priv->widget_path != NULL' failed and thus the GValue value in gtk_style_context_get_valist() is still blank, which makes G_VALUE_LCOPY choke. Thus I am now fairly sure that this is a real regression from GTK commit 7f511f2b33f and not a pygobject ref counting bug.
Created attachment 217171 [details] C test program Hmm, the equivalent program in C (attached) works fine. Reassigning back to pygobject.
As a data point, with this monkey patch it works without a crash: --- a/gtk/gtkstylecontext.c +++ b/gtk/gtkstylecontext.c @@ -967,7 +967,7 @@ style_data_lookup (GtkStyleContext *context) if (info->data) return info->data; - g_assert (priv->widget != NULL || priv->widget_path != NULL); + //g_assert (priv->widget != NULL || priv->widget_path != NULL); data = g_hash_table_lookup (priv->style_data, info); if (data) @@ -1343,7 +1343,7 @@ gtk_style_context_get_property (GtkStyleContext *context, g_return_if_fail (value != NULL); priv = context->priv; - g_return_if_fail (priv->widget != NULL || priv->widget_path != NULL); + //g_return_if_fail (priv->widget != NULL || priv->widget_path != NULL); prop = _gtk_style_property_lookup (property); if (prop == NULL) $ LD_LIBRARY_PATH=./gtk/.libs python3 -c 'from gi.repository import Gtk; print(Gtk.MenuBar().get_style_context().get_color(Gtk.StateFlags.NORMAL))' Gtk-Message: Failed to load module "canberra-gtk-module" Gtk-Message: Failed to load module "canberra-gtk-module" Gtk-Message: Failed to load module "overlay-scrollbar" (-c:14872): Gtk-CRITICAL **: gtk_widget_path_copy: assertion `path != NULL' failed (-c:14872): Gtk-CRITICAL **: gtk_widget_path_length: assertion `path != NULL' failed (-c:14872): Gtk-CRITICAL **: gtk_widget_path_iter_add_class: assertion `path != NULL' failed (-c:14872): Gtk-CRITICAL **: gtk_widget_path_length: assertion `path != NULL' failed (-c:14872): Gtk-CRITICAL **: gtk_widget_path_free: assertion `path != NULL' failed <Gdk.Color(red=1.000000, green=1.000000, blue=1.000000, alpha=1.000000)> So there is no memory error here, just assertions. However, it could still be that it is accessing already unref'ed objects.
I added a few debug statements to see when the style context's priv->widget gets set and when the MenuBar destroyed: ---------------- 8< -------------- $ make -j4 -C gtk libgtk-3.la && LD_LIBRARY_PATH=./gtk/.libs python3 -c 'from gi.repository import Gtk; m = Gtk.MenuBar(); print(m.get_style_context().get_color(Gtk.StateFlags.NORMAL))'; echo; echo '----------------------'; echo; LD_LIBRARY_PATH=./gtk/.libs python3 -c 'from gi.repository import Gtk; print(Gtk.MenuBar().get_style_context().get_color(Gtk.StateFlags.NORMAL))' make: Gehe in Verzeichnis '/home/martin/upstream/gtk+/gtk' CC libgtk_3_la-gtkwidget.lo CCLD libgtk-3.la make: Verlasse Verzeichnis '/home/martin/upstream/gtk+/gtk' Gtk-Message: Failed to load module "canberra-gtk-module" Gtk-Message: Failed to load module "canberra-gtk-module" Gtk-Message: Failed to load module "overlay-scrollbar" (-c:26600): Gtk-WARNING **: XXX gtk_style_context_set_path 0x27e3a90 (-c:26600): Gtk-WARNING **: XXX gtk_style_context_set_widget 0x27c0480 <Gdk.Color(red=0.000000, green=0.000000, blue=0.000000, alpha=1.000000)> (-c:26600): Gtk-WARNING **: XXX gtk_widget_finalize 0x27c0480 (-c:26600): Gtk-WARNING **: XXX gtk_style_context_set_widget (nil) (-c:26600): Gtk-WARNING **: XXX gtk_style_context_finalize ---------------------- Gtk-Message: Failed to load module "canberra-gtk-module" Gtk-Message: Failed to load module "canberra-gtk-module" Gtk-Message: Failed to load module "overlay-scrollbar" (-c:26601): Gtk-WARNING **: XXX gtk_style_context_set_path 0x24f5aa0 (-c:26601): Gtk-WARNING **: XXX gtk_style_context_set_widget 0x24d2480 (-c:26601): Gtk-WARNING **: XXX gtk_widget_finalize 0x24d2480 (-c:26601): Gtk-WARNING **: XXX gtk_style_context_set_widget (nil) (-c:26601): Gtk-CRITICAL **: gtk_style_context_get_property: assertion `priv->widget != NULL || priv->widget_path != NULL' failed /usr/lib/python3/dist-packages/gi/types.py:47: Warning: /build/buildd/glib2.0-2.33.2/./gobject/gtype.c:4206: type id `0' is invalid return info.invoke(*args, **kwargs) /usr/lib/python3/dist-packages/gi/types.py:47: Warning: can't peek value table for type `<invalid>' which is not currently referenced return info.invoke(*args, **kwargs) Speicherzugriffsfehler (Speicherabzug geschrieben) ---------------- 8< -------------- This shows that with the first run (when we hold an explicit reference to the MenuBar) the MenuBar gets destroyed (finalized) and gtk_style_context_set_widget(NULL) called after the property gets read. In the second run, which triggers the bug, the menu bar is destroyed immediately after the get_style_context() call, as at that point there is nothing that references the MenuBar any more. This causes its finalization and gtk_style_context_set_widget(NULL), and thus the get_property() calls on the GtkStyleContext fail because its priv->widget is invalid.
Created attachment 217184 [details] [review] GtkStyleContext: Do not lose widget reference This GTK patch fixes the crash. Would a patch like this be acceptable, or is it prone to cause reference loops and thus memory leaks? In the latter case, is that a case of "just don't do that then"?
Review of attachment 217184 [details] [review]: the Widget is already keeping a reference on the StyleContext, considering that is the Widget that is creating it; this change would introduce a reference cycle. for starters, the _gtk_style_context_set_widget (style_context, NULL) call should be moved from gtk_widget_finalize() to gtk_widget_dispose(). what I want to understand, tho, is how the style context can exist outside of the widget's life time, given that the widget owns the style context at all times. is python keeping a reference on a GtkStyleContext instance? ::: gtk/gtkstylecontext.c @@ +1090,3 @@ + if (context->priv->widget != NULL) + g_object_unref (context->priv->widget); this should be replaced by g_clear_object(). @@ +1094,1 @@ widget can be NULL, so this ref has to be checked against that.
(In reply to comment #13) > the Widget is already keeping a reference on the StyleContext, considering that > is the Widget that is creating it; this change would introduce a reference > cycle. Right, that's what I am concerned about. Indeed with that patch pygobject does not finalize the MenuBar object any more, so it's not sufficient either way. > for starters, the _gtk_style_context_set_widget (style_context, NULL) > call should be moved from gtk_widget_finalize() to gtk_widget_dispose(). _dispose() runs earlier than _finalize(), right? It doesn't make a difference for this crash, but that might mitigate the ref cycles. > what I want to understand, tho, is how the style context can exist outside of > the widget's life time, given that the widget owns the style context at all > times. is python keeping a reference on a GtkStyleContext instance? pygobject does not really have an understanding of API specific contracts like "this object is only useful while that other object exists". It just sees a chain like my_value = ctor().method1().method2() and after calling method1() it notices that nothing else refers to the object generated by ctor() any more, and thus unrefs it, causing it to get finalized. method1() is returning a newly allocated object which just happens to hold an internal reference to the ctor() generated object, but there is no way to express "do not unallocate that object while this object is stilll around" in annotations. I. e. in this case: Gtk.MenuBar().get_style_context().get_color() the following happens: * a MenuBar object gets created * its get_style_context() method is called, and the return value has a temporary reference * no refs to that MenuBar object any more, so Python unallocates it, causing its dispose and then finalize methods to be called * calls get_color() on the returned StyleContext, which now asserts because the priv->widget reference is not valid any more. With the proposed workaround m = Gtk.MenuBar(); m.get_style_context().get_color() the early cleaning of MenuBar does not happen any more, as "m" still refers to it. This implements the contract "StyleContext objects are only valid as long as their widget is valid" in Python. So as I said, this might very well be a case of just documenting this implicit assumption in gtk_widget_get_style_context(), and a "then just don't do that". > this should be replaced by g_clear_object(). Ah, I keep forgetting about this one. Done locally now. > widget can be NULL, so this ref has to be checked against that. Thanks, done locally. But either way the patch is incomplete yet, as it now causes a ref cycle/leak, so not attaching a new one for now.
Gtk.Widget.get_style_context() is already annotated as (transfer none), as it returns a "pointer", not a fully referenced object; so, in a sense, the API is already telling you that the widget must stay alive if you want to use its style context.
That only tells python that it must not free the returned GtkStyleContext itself. So from a human POV I agree, but this doesn't generalize to all APIs. I think for now I'll reassign it back to pygobject as a minor bug to see whether something can be done about this case, and otherwise go with the "don't do that then" approach. Thanks for your input, Emmanuele!
I think a sane fix would be to add a fallback to GtkStyleContext so that we use the default values for properties when looking them up. This would also help with uninitialized style contexts as returned from gtk_Style_context_new().
Created attachment 217200 [details] [review] stylecontext: Handle uninitialized case When neither a widget nor a widget path is set on the style context, just use the default values everywhere. This avoids crashes in careless code.
Created attachment 217202 [details] [review] stylecontext: Handle uninitialized case When neither a widget nor a widget path is set on the style context, just use the default values everywhere. This avoids crashes in careless code.
What are "default values" in this context -- i.e. the colors specified by the user's Gtk+ theme, or the default Gtk+ grey? (Because a crash might be a better indication of "you're doing this wrong" than returning plausible-but-incorrect values.)
As a general rule, it should never be possible to crash the Python interpreter from Python. Or IOW, it should always be considered a bug. It would be much better if possible to raise an exception when someone does something the wrong way.
It is trivial to crash the Python interpreter from any gi-using application if you really want to. GTK (or GNOME in general) APIs were never designed with this crash-safety in mind. I'd like this to change, but so far that idea hasn't gotten a lot of traction inside the larger GNOME community (see shell plugins for an example of this issue).
(In reply to comment #20) > What are "default values" in this context -- i.e. the colors specified by the > user's Gtk+ theme, or the default Gtk+ grey? > > (Because a crash might be a better indication of "you're doing this wrong" than > returning plausible-but-incorrect values.) > The default CSS color - ie black for backgrounds and white for text. No theme will be consulted I think in querying that value. The style context for a widget is bound to that widget and is only guaranteed to return correct values when the widget needs to be rendered or layouted.
My gut feeling is that it is more useful to throw a proper exception to point out the programming error than returning the default black/white values which just silently break your logic.
*** Bug 675332 has been marked as a duplicate of this bug. ***
*** Bug 688249 has been marked as a duplicate of this bug. ***
Comment on attachment 217202 [details] [review] stylecontext: Handle uninitialized case > My gut feeling is that it is more useful to throw a proper exception to point > out the programming error than returning the default black/white values which > just silently break your logic. For this reason, and because this has been stalled for a while, I'm going to mark that GTK patch as rejected. Benjamin, if you disagree please feel free to push it, but I wouldn't do it on my own accord. Thanks!
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/pygobject/issues/28.