GNOME Bugzilla – Bug 682933
Add GenericTreeModel from pygtk
Last modified: 2013-02-18 11:06:23 UTC
PyGtk included the very useful utility class GenericTreeModel. This is unfortunately missing form the pygobject implementation. While this is very Gtk specific, it has also been one of the main hold outs for myself and I'm sure a lot of others when trying to upgrade from pygtk. Renaming stuff is easy compared to trying to figure out how to implement the Gtk.TreeModel interface without GenericTreeModel. There have been a few bug fixes recently to support this and a preliminary patch is finally ready.
Created attachment 222733 [details] [review] Initial attempt supporting GenericTreeModel This is an initial pure python implementation of the GenericTreeModel which is fully compatible with what was available in PyGtk. The class lives in its own gi/generictreemodel.py module and can be imported directly or it gets pulled into the gtk namespace through the pygtkcompat layer automatically when enable_gtk() is used. I think this utility class is one of the more difficult incompatibilities to deal with when trying to upgrade from PyGtk. Therefor potentially important if a goal is getting apps off of PyGtk and into gi.
While I agree with you that the GenericTreeModel is a blocker in porting, one of the issues I had with the GenericTreeModel was that it is an inherently broken API due to the expectation of static pointers for the iter user_data. Since python has to generate them it becomes a source of either memory leaks or if the GC does manage to clean it up, a potential dangling pointer access since there is no lifecycle management of the iter or its user_data raw pointers. I will leave it up to the current maintainers to sort this out and decide if they are willing to take that tradeoff. However, know if you do go this route you will propagate its usage and there will be less momentum to fix the issues lower down the stack. I would suggest printing a warning about the potential leak.
Thanks for the input. Early on when looking at this I was hoping there would be some kind of ref/unref callback that could be implemented for TreeModels to manage the user_data lifetime. I did find gtk_tree_model_ref/unref_node but the docs make me think it is not what we want. However, I did find some discussion in the following ticket which made me think that it might actually help: https://bugzilla.gnome.org/show_bug.cgi?id=326852 This issue seems like it will always be a problem trying to implement the TreeModel interface from within python (or any language). Even if something like numeric indices are used as the user_data, it will still leak (at least in python). If I was implementing an app purely in C and want to store GObject pointers in the user_data, it seems I would run into the same problems. In my specific use cases I can use leak-references=False as the objects being reffed are stored in a list on the model anyhow. It is a little tricky because you must notify the model of deletions before de-referencing the stored objects to ensure safety and is less then ideal. I will do a little more research to see if gtk_tree_model_ref/unref_node can work or if something can be implemented in gtk+ to fix this up as I totally agree it is a problem.
So in C the iters are usually opaque unless you are doing something complicated. They are also assumed to be static, meaning you don't refcount whatever you put in it because you are almost assured that once you exit the calling function it will no longer be referenced. Iters are pretty useless to keep around anyway since once the model is updated the iter is invalid anyway. One thought is to special case all iter returns in pygobject to add them to a cleanup pool but then again the issue isn't the iter iteself but the data inside. How do we know if the user_data is a C pointer or a Python object? That is a contract that is decided by the application. There are no flags to indicate the type of data stored. You can't frob it either (use Py_IsObject or whatever the API is) because inevitably you will end up reading beyond an allocated memory block which will cause crashes on most systems. One could keep track of all existing iter memory locations and check that table during cleanup. This would mean you would have to make sure the generic tree model nulls out all of the unused user data and that all iters must come from python. Or perhaps you flag user_data with a false pointer that is unlikely to ever be assigned (like 0x4F - J + 5 ;) ) to indicate that user_data2 has a python object that can be unreffed during destroy. That would actually be an elegant solution since you can read the pointer as an int without fear of reading out of bounds, and all of that can be done with ctypes without touching core pygobject I think. It might fail 0.000001% of the time that 0x4F gets mapped to user_data and user_data doesn't get set but I don't think that is ever the case inside of GTK+ and it can be mitigated by using g_malloc0/g_slice0 for caller allocated structs in pygobject if we don't already.
> How do we know if the user_data is a C pointer or a Python object? Yes, generally we cannot. But in the context of GenericTreeModel, we can assume user_data is always a python object. Since GenericTreeModel should handle all management of TreeIters and user_data assignment. > Or perhaps you flag user_data with a false pointer that is unlikely to ever be assigned (like 0x4F - J + 5 ;) ) to indicate that user_data2 has a python object that can be unreffed during destroy. This would be great but I cannot find a way to get notifications when TreeIters are destroyed, the gtk_tree_model_ref/unref_node will not work out. I think I'm missing something here, by "unreffed during destroy" what object are you talking about being destroyed? I think the idea of pooling the actual user_data during assignment to TreeIter.user_data is the right direction. By not reffing the user_data during assignment, but adding it to a "set" stored on the model, we can guarantee a maximum of 1 reference will ever leak. Whereas with the original implementation it could leak like a sieve. This has the added benefits of these refs will be decremented when the model itself is destroyed and there would then be no leaks in this case. Furthermore, the "row_deleted" method on GenericTreeModel could be extended to help manage leaks by accepting an optional user_data parameter which would then be pulled out of the local pool and remove the models cached reference. The protocol for GenericTreeModel will need to be followed closely for subclasses to have a safe and leak free model, but it does seem possible. It will provide something which is easier and safer then someone unknowingly trying to figure out how to use Gtk.TreeModel and Gtk.TreeIter without it.
I have used the GenericTreeModel in pygtk. Is it possible to re-implement methods of an interface in pygobject? It would be useful to be able to sub-class a ListStore and re-implement methods in the TreeModel interface. I have tried it but couldn't get it to work.
Hi Nick, I'm not sure if this is possible. Did you try overriding the methods by prefixing them with "do_"? Depending on which method you override, you might need very specific knowledge of how the ListStore is implemented in order for it to continue working. For instance, if the ListStore uses the user_data fields to store opaque pointers to internal objects, trying to access this in python would surely crash as (currently) pygobject would try to read them as python object pointers.
My idea was to provide a ListStore or TreeStore with virtual columns, by re-implementing the get_value method. If the column number was greater than the number of columns in the store, I could get a value from a database by retrieving a key value from the store. I tried the following: from gi.repository import Gtk class MyListStore(Gtk.ListStore): def __init__(self, *args): Gtk.ListStore.__init__(self, *args) def do_get_value(self, iter, col): return 'abc' l = MyListStore(str) r = l.append(['row 1']) print l.get_value(r, 0) but the do_get_value method doesn't seem to get called.
Nick, A composition pattern might work better than inheritance for this type of thing. Basically implement your own TreeModel which uses a ListStore as an instance variable. Forward methods for columns within the list store to the instance variable and anything else, use the db: def do_get_value(self, iter, col): if col < self.list_store.get_column_count(): return self.list_store.get_value(iter, col) else: return db access...
When thinking about memory in PyGobject you have to realize you are dealing with the lifecycle of 2 objects - the glib struct and the PyObject wrapper for that glib struct. With iters you have to take in account a third object - the userdata. In normal operations using C your iter is lifecycle managed by allocating on the stack and being reclaimed when the frame is exited. In python we need to instantiate the wrapper which then dynamically allocates the Iter using g_slice or g_malloc. When the wrapper goes out of scope and is garbage collected we free that allocated Iter because we own the ref. What we don't own is the user_data since it is a generic pointer without type information. This is fine in C because lifecycle is within one frame so you are sure that userdata object will not be freed and in most cases I think we are just shoving an integer in there. You can get crazy with them if you want but in that case we can just blame the developer and they own the broken pieces. In the PyGObject case we have to shove object and do complicated lifecycle management ourselves so we are to blame if anything goes wrong. Those objects have refcounts on the python layer and also on the C layer if it is a wrapper. Lets assume we are using pure python to avoid too many complications here. You could shove the object in the iter and not ref it and it will work most of the time. But there will be that time when the user holds onto the iter but the underlying user data get reclaimed. To fix this the old code would ref and leak because you can't generically unref the userdata when the iter's wrapper is destroyed. What I propose you do is fix this by doing the ref but instead of assigning the data to userdata you assign userdata some integer tag that is used as metadata. userdata2 would then contain the python object. On destroy the wrappers for iters would be special cased in the overrides to check the integer value of the pointer in userdata and if it sees the flag PY_DECREF's the object in userdata2. This would generically solve the memory issues as whatever type of python object the userdata2 is pointing to, it will correctly handle the memory lifecycle of its children.
(In reply to comment #10) > When thinking about memory in PyGobject you have to realize you are dealing > with the lifecycle of 2 objects - the glib struct and the PyObject wrapper for > that glib struct. With iters you have to take in account a third object - the > userdata. > > In normal operations using C your iter is lifecycle managed by allocating on > the stack and being reclaimed when the frame is exited. > > In python we need to instantiate the wrapper which then dynamically allocates > the Iter using g_slice or g_malloc. When the wrapper goes out of scope and is > garbage collected we free that allocated Iter because we own the ref. What we > don't own is the user_data since it is a generic pointer without type > information. > > This is fine in C because lifecycle is within one frame so you are sure that > userdata object will not be freed and in most cases I think we are just shoving > an integer in there. You can get crazy with them if you want but in that case > we can just blame the developer and they own the broken pieces. > > In the PyGObject case we have to shove object and do complicated lifecycle > management ourselves so we are to blame if anything goes wrong. Those objects > have refcounts on the python layer and also on the C layer if it is a wrapper. > Lets assume we are using pure python to avoid too many complications here. Thanks for the info. Reading parts of gtktreeview helped me to understand a lot of how the C side of things work with iters. But there is a more generic problem that I think needs to be prefaced here before moving on: https://bugzilla.gnome.org/show_bug.cgi?id=683599 The issue is gpointer struct fields are interpreted as python object pointers in all cases as the default behaivour. I think restricting the interpretation of gpointer struct fields to int values in the bindings is the only way to safely move forward as a default. It still leaves the option to interpret these ints as PyObject pointers using ctypes or other methods in specific struct overrides. But I think this should happen explicity on a per-struct basis in the overrides. Interpretation of the gpointer fields can also be left as a specific implementation detail for the larger system which uses the struct in most cases. For instance, using the id of the object as a key for retrieving the actual object from a dictionary. The dictionary can be stored in the higher level structures which makes use of the struct, e.g. GenericTreeModel being the higher level structure. > You could shove the object in the iter and not ref it and it will work most of > the time. But there will be that time when the user holds onto the iter but > the underlying user data get reclaimed. > > To fix this the old code would ref and leak because you can't generically unref > the userdata when the iter's wrapper is destroyed. This is what the GenericTreeModel leak-references property was all about. Likewise not refing the data and it working most of the time can also be specific to certain view implementations. Looking at gtktreeview and gtkcombobox, I didn't see anything that stuck out that would be unsafe if the pyobject was not refed. But there is a lot code there and I could have easily missed something. > What I propose you do is fix this by doing the ref but instead of assigning the > data to userdata you assign userdata some integer tag that is used as metadata. > userdata2 would then contain the python object. > > On destroy the wrappers for iters would be special cased in the overrides to > check the integer value of the pointer in userdata and if it sees the flag > PY_DECREF's the object in userdata2. This would generically solve the memory > issues as whatever type of python object the userdata2 is pointing to, it will > correctly handle the memory lifecycle of its children. Regardless of the lifetime of the wrapper, the struct it holds will be copied around which would have a pointer to the python user data. In a vfunc callback like "do_get_iter", the function creates a wrapper which manages the lifetime of the struct. This struct is copied into the caller allocated memory upon return marshalling and then both the wrapper and its held struct should be destroyed. So the strategy of decrefing userdata2 if userdata is some key during destroy seems like it would suffer the same problems as not refing the pyobject at all. I stepped through the calling of get_iter on a basic implementation of Gtk.TreeModel to verify this. This was very helpful in understanding how this all works and a few leaks were also reveiled along the way. While I know you wrote a lot of the code involved, I think documenting this for the sake of the conversation here is useful. The following example marshals a new iter from a python vfunc to C back to the calling python: from gi.repository import GObject, Gtk class Model(GObject.GObject, Gtk.TreeModel): def do_get_iter(self, path): it = Gtk.TreeIter() it.user_data = 123 return (True, it) model = Model() it = model.get_iter(0) 1. The last line "model.get_iter(0)" allocs a GtkTreeIter as a caller allocated struct and uses it as the iter out arg for gtk_tree_model_get_iter. 2. gtk_tree_model_get_iter calls the vfunc python implementation do_get_iter. This method creates a python wrapper holding new memory representing a GtkTreeIter struct, the lifetime of which is managed by the wrapper. 3. The pointer address of the 123 PyObject is then assigned and increfed. This will also change depending on: https://bugzilla.gnome.org/show_bug.cgi?id=683599 4. The vfunc returns, and the out argument marshaler (_pygi_closure_set_out_arguments) does a memcpy of the struct created in do_create_iter to the caller allocated struct passed in. After marshalling the iter returned from the python vfunc, the wrapper along with the struct it owns should be freed, but actually it ended up leaking: https://bugzilla.gnome.org/show_bug.cgi?id=686140 5. Finally after returning from gtk_tree_model_get_iter, the out argument marshaler wraps the initial caller allocated struct from step 1 with a PyGBoxed and returns it back to python. Unfortunitally I found this to also be leaking the struct memory (https://bugzilla.gnome.org/show_bug.cgi?id=686205) Given the above, any strategy of using iters to store a pyobject pointer will result in either a reference leak or an unrefed pyobject pointer potentially held in a view (even with the bugs fixed). Of course the pyobject used in the user data would still be valid if it is refed elsewhere in the python app. But I don't think it is something that should be relied upon and the strategy of using a pool ensures it will be available explicitly.
Comment on attachment 222733 [details] [review] Initial attempt supporting GenericTreeModel Patch needs work to implement pooling idea and more unittests.
Created attachment 230148 [details] [review] Work in progress Patch still needs work. Here's what is left: * More unittests. Specifically ensuring exceptions raised in do_* virtual methods behave the same as they did in pygtk.GenericTreeModel. * The handle_exception usage could be removed based on the results of the above. * An example of a tree model with hierarchy. * Real world tests, update an existing pygtk app that uses GenericTreeModel to use pygtkcompat and this GenericTreeModel implementation.
Thanks for the new patch Simon. I'm seeing the following 3 problems: A)
+ Trace 231241
titer = self.create_tree_iter(i)
self.set_user_data(iter, user_data)
self._held_refs[user_data_id] = user_data
self.data[key] = KeyedRef(value, self._remove, key)
self = ref.__new__(type, ob, callback)
If I disable the weak reference dictionary I get: B) File "/home/johan/dev/stoq/master/kiwi/kiwi/ui/lazyobjectlist.py", line 266, in load_items_from_results self.row_changed(path, titer) File "/usr/lib/python2.7/dist-packages/gi/types.py", line 43, in function return info.invoke(*args, **kwargs) TypeError: Expected Gtk.TreePath, but got StructMeta path = (i, ) titer = self.create_tree_iter(i) If I override row_changed() to do nothing I get: C) File "/home/johan/dev/stoq/master/kiwi/kiwi/ui/lazyobjectlist.py", line 110, in on_get_iter return self._iters[path[0]] TypeError: 'TreePath' object does not support indexing self._iters is a list path is the first received in on_get_iter() If I implement GtkTreePath.__getitem__ as: def gtk_tree_path_getitem(path, item): return path.get_indices()[item] There are no more warnings, but nothing is rendered in the GtkTreeView. Source for my custom tree view can be found here: http://bazaar.launchpad.net/~stoq-dev/kiwi/master/view/head:/kiwi/ui/lazyobjectlist.py or via: bzr branch lp:kiwi There are no easy ways to test this outside of my large application, but I can spend some time writing one if that's helpful for you.
Created attachment 232439 [details] [review] Add pygtk compatible GenericTreeModel implementation Thanks for trying this out. A) This issue is a design flaw in the implementation. The new patch now uses ctypes when leak-references=False B) Another patch will follow which overrides TreeModel signal emission methods to accept lists/tuples. C) TreePath.__getitem__ was already added in bug 680353
Created attachment 232440 [details] [review] Add signal emission methods to TreeModel which coerce the path argument
Comment on attachment 232440 [details] [review] Add signal emission methods to TreeModel which coerce the path argument This looks good, thanks! For the record, rows_reordered will not work until fixing bug 684558, but the override doesn't hurt until then. Applied after some unfuzzing for current trunk.
Comment on attachment 232439 [details] [review] Add pygtk compatible GenericTreeModel implementation Simon says that this patch needs some more work and testing, so marking accordingly to clean up the patch review queue.
Okay, I tested this patch, but I'm not quite able to use it yet because I'm also implementing GtkTreeSortable. The problems mentioned above went away though. My current implementation is sort of like this: class Model(gtk.GenericTreeModel, gtk.TreeSortable): def do_get_sort_column_id(self, *args): return self._sort_column_id, self._sort_order def do_set_sort_column_id(self, sort_column_id, sort_order): self._sort_column_id = sort_column_id self._sort_order = sort_order PyGObject segfaults when calling into do_get_sort_column_id(), I can write up a test case for this later. It should probably be returning True as the first time, but even so it crashes, just in a different place.
Thanks for trying this again. When implementing a TreeSortable interface I noticed the following problems: 1. do_get_sort_column_id needs to return three things (bool, column_id, order). There does not seem to be a way around this with pygi due to the vfunc already being wired in (overrides or monkey patches won't help). So you could switch things out as follows in your implementation (if you want to maintain compatibility between pygtk and pygi). def do_get_sort_column_id(self): if 'pygtkcompat' in sys.modules: # Or something similar return self._sort_order >= 0, self._sort_column_id, self._sort_order else: return self._sort_column_id, self._sort_order 2. The Python vfunc for the same method will also smash the callers stack frame. I found and re-opened bug 637832.
Created attachment 236519 [details] [review] Add pygtk compatible GenericTreeModel implementation. Add python implementation of the GenericTreeModel that was available in pygtk. The implementation attempts a better job than the original at ref counting by guaranteeing no leaks upon deletion of the model itself. Or by using the extra "node" argument to the row_deleted signal. Add file list and tree demos making use of GenericTreeModel to gtk-demo. Auto-expand gtk-demo app tree to give a better overview of the demos available. Add support for iterables besides tuple for TreePath creation.
Comment on attachment 236519 [details] [review] Add pygtk compatible GenericTreeModel implementation. Thanks for this! This is quite some extra code to carry, but I'm generally not too concerned about pygtkcompat additions as at some point we can drop it all, and at least the pygtk interface is not ever going to change any more. Some remarks: +++ b/demos/gtk-demo/demos/Tree View/treemodel_filelist.py Please avoid files and directories with spaces, it's going to create confusion or breakage somewhere. --- a/gi/overrides/Gtk.py +++ b/gi/overrides/Gtk.py @@ -1160,7 +1160,7 @@ class TreePath(Gtk.TreePath): def __new__(cls, path=0): if isinstance(path, int): path = str(path) - elif isinstance(path, tuple): + elif not isinstance(path, _basestring): This doesn't seem related to the addition of pygtkcompat, but rather seems to be a separate fix. Can you please commit it separately, with an appropriate description? +# Copyright (C) 2012 Simon Feltman You might want to bump the year? (Two places) Please ensure that this passes distcheck for both PYTHON=python and PYTHON=python3, then please go ahead after above fixes. Thanks!
Comment on attachment 236519 [details] [review] Add pygtk compatible GenericTreeModel implementation. Committed with suggested changes.