After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 693402 - Memory leaks in array marshaling
Memory leaks in array marshaling
Status: RESOLVED FIXED
Product: pygobject
Classification: Bindings
Component: general
3.2.x
Other Linux
: Normal normal
: ---
Assigned To: Martin Pitt
Python bindings maintainers
Depends on: 691501
Blocks: 693111 703662 709700
 
 
Reported: 2013-02-08 07:42 UTC by niktar
Modified: 2013-10-14 22:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
bug demonstration (470 bytes, text/x-java)
2013-02-08 07:42 UTC, niktar
  Details
demonstration without overrides.py (742 bytes, text/plain)
2013-02-27 17:59 UTC, Martin Pitt
  Details
insert_with_valuesv() C test (714 bytes, text/plain)
2013-02-27 19:12 UTC, Martin Pitt
  Details
Fix GValue array marshaling leaks and crash fallout (26.96 KB, patch)
2013-10-06 11:55 UTC, Simon Feltman
none Details | Review
Cleanup per-item array marshaling code for flat arrays (11.08 KB, patch)
2013-10-06 11:55 UTC, Simon Feltman
committed Details | Review
Fix GValue array marshaling leaks and crash fallout (26.29 KB, patch)
2013-10-07 02:39 UTC, Simon Feltman
none Details | Review
Fix to Python marshaling leaks for arrays holding GVariants (4.96 KB, patch)
2013-10-07 05:30 UTC, Simon Feltman
committed Details | Review
Fix GValue array marshaling leaks and crash fallout (26.30 KB, patch)
2013-10-07 09:54 UTC, Simon Feltman
committed Details | Review
Add cleanup_data argument used for Python to C marshaler cleanup (39.38 KB, patch)
2013-10-09 06:54 UTC, Simon Feltman
committed Details | Review
Fix GArray, GList, GSList, and GHashTable marshaling leaks (9.94 KB, patch)
2013-10-11 03:56 UTC, Simon Feltman
none Details | Review
Fix GArray, GList, GSList, and GHashTable marshaling leaks (8.90 KB, patch)
2013-10-11 06:04 UTC, Simon Feltman
committed Details | Review

Description niktar 2013-02-08 07:42:43 UTC
Created attachment 235486 [details]
bug demonstration

After the removing all references to Gtk.ListStore memory is not released.
Demonstration of the bug in the attachment.
Each iteration increases the memory of the process.
Comment 1 Martin Pitt 2013-02-27 17:59:58 UTC
Created attachment 237550 [details]
demonstration without overrides.py

Confirmed. I temporarily disabled the ListStore overrides and changed boo.py to use the plain Gtk API without overrides. With that there is no leak, and the liststore also uses a lot less memory.

So the leak is in the overrides somewhere.
Comment 2 Martin Pitt 2013-02-27 18:47:48 UTC
The disappears if I disable the GObject.Value conversion here:

    def _convert_value(self, column, value):
        '''Convert value to a GObject.Value of the expected type'''

        if isinstance(value, GObject.Value):
            return value
        return GObject.Value(self.get_column_type(column), value)

and instead just "return value", and using the non-overridden append() method. However, creating and destroying many Values by themselves does not leak any memory.

Adding an explicit call to self.clear() into ListStore.__del__ does not help.

So we have a leak with _convert_value(), and another leak with the overridden append(), i. e. when specifying a row.
Comment 3 Martin Pitt 2013-02-27 19:00:12 UTC
I can reproduce the leak when merely calling this as the body of boo.py:

cols = [0, 1, 2, 3, 4]
vals = ['Aaaaaaa', 'Bbbbbbb', 'Ccccccc', 'Ddddddd', 'Eeeeeee']

def gen_store(): 
    st = Gtk.ListStore.new([GObject.TYPE_STRING]*5)
    for r in range(10000):
        st.insert_with_valuesv(-1, cols, vals)
    return st

So st.set_value(i, 0, 'Aaaaaa') etc. doesn't leak at all, but insert_with_valuesv() does leak.
Comment 4 Martin Pitt 2013-02-27 19:12:39 UTC
Created attachment 237558 [details]
insert_with_valuesv() C test

Just to make sure that it's not a bug in GTK itself, I wrote a small test for insert_with_valuesv(), and that does not leak memory.

So I suppose somewhere we don't clean up GValues during marshalling.
Comment 5 Martin Pitt 2013-02-27 20:38:50 UTC
This could be a dupe of bug 691501.
Comment 6 Martin Pitt 2013-02-28 11:37:00 UTC
With bug 691501 fixed, the remaining thing that leaks here is the construction of explicit GObject.Value objects as arguments. When short-circuiting _convert_value() in gi/overrides/Gtk.py to return the original value, the boo.py example by and large stays constant. There is some jitter in the first three iterations, but it's definitively stopping to leak constantly.

Incidentally I noticed that _convert_values() has a huge performance penalty. I filed that as bug 694857.
Comment 7 Martin Pitt 2013-03-01 13:30:40 UTC
I updated the bug title as current master has fixed the other memory leaks which are relevant to this. I'm now trying to find a reduced case for this for valgrinding.


The closest existing test case that we have is this:


    def test_gvalue_flat_array_in(self):
        # the function already asserts the correct values
        GIMarshallingTests.gvalue_flat_array([42, "42", True])

but that doesn't leak.

For the record, test_gi.TestGValue.test_gvalue_flat_array_out leaks, but that's a different bug than the one affecting ListStore:

==6163== 72 bytes in 3 blocks are definitely lost in loss record 2,253 of 4,729
==6163==    at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6163==    by 0xAC4FCD0: g_malloc (gmem.c:159)
==6163==    by 0xA559F17: _pygi_marshal_to_py_array (pygi-marshal-to-py.c:439)
==6163==    by 0xA54B8C6: _invoke_marshal_out_args (pygi-invoke.c:553)
==6163==    by 0xA54BF0A: pygi_callable_info_invoke (pygi-invoke.c:667)
==6163==    by 0xA54C055: _wrap_g_callable_info_invoke (pygi-invoke.c:685)
==6163==    by 0x4814AD: PyEval_EvalFrameEx (ceval.c:4374)
==6163==    by 0x47A90D: PyEval_EvalCodeEx (ceval.c:3433)
==6163==    by 0x4818A6: PyEval_EvalFrameEx (ceval.c:4160)
==6163==    by 0x481BF1: PyEval_EvalFrameEx (ceval.c:4150)
==6163==    by 0x47A90D: PyEval_EvalCodeEx (ceval.c:3433)
==6163==    by 0x4818A6: PyEval_EvalFrameEx (ceval.c:4160)
Comment 8 Martin Pitt 2013-03-01 14:03:29 UTC
I fixed the previously mentioned leak with (out) arrays in http://git.gnome.org/browse/pygobject/commit/?id=b388c3.
Comment 9 Simon Feltman 2013-06-17 14:32:50 UTC
It is unclear if this the following is related but it looks like ListStore does some amount of caching which in certain cases might look like a memory leak:

https://mail.gnome.org/archives/gtk-app-devel-list/2013-June/msg00104.html
Comment 10 Simon Feltman 2013-10-05 10:55:28 UTC
I've verified that the following still leaks:

## Python
st = Gtk.ListStore.new([GObject.TYPE_STRING] * 5)
cols = [0, 1, 2, 3, 4]
for r in range(100):
    vals = [GObject.Value(GObject.TYPE_STRING, 'Aaaaaaa'),
            GObject.Value(GObject.TYPE_STRING, 'Bbbbbbb'),
            GObject.Value(GObject.TYPE_STRING, 'Ccccccc'),
            GObject.Value(GObject.TYPE_STRING, 'Ddddddd'),
            GObject.Value(GObject.TYPE_STRING, 'Eeeeeee')]
    st.clear()
    st.insert_with_valuesv(-1, cols, vals)
## end

And this crashes:

## Python
st = Gtk.ListStore.new([GObject.TYPE_STRING] * 5)
cols = [0, 1, 2, 3, 4]
vals = [GObject.Value(GObject.TYPE_STRING, 'Aaaaaaa'),
        GObject.Value(GObject.TYPE_STRING, 'Bbbbbbb'),
        GObject.Value(GObject.TYPE_STRING, 'Ccccccc'),
        GObject.Value(GObject.TYPE_STRING, 'Ddddddd'),
        GObject.Value(GObject.TYPE_STRING, 'Eeeeeee')]

for r in range(100):
    st.clear()
    st.insert_with_valuesv(-1, cols, vals)
## end
Comment 11 Simon Feltman 2013-10-05 11:15:07 UTC
Ah, I think the crash is bug 703662
Comment 12 Simon Feltman 2013-10-06 03:40:54 UTC
The problem turns out to be that we use PySequence_GetItem for marshaling the individual py items into the array and never call Py_DECREF on the item (needed for PySequence_GetItem). However, fixing that causes all kinds of havoc because it turns out we rely on the leak to keep the memory around and valid for lists containing GValue and GVariant. In short, this is a large can of worms that will take some work to get right. Fixing any one problem causes another one to popup and it might need to be fixed in combination with bug 703662.

https://git.gnome.org/browse/pygobject/tree/gi/pygi-marshal-from-py.c?id=3.10.0#n791
Comment 13 Simon Feltman 2013-10-06 11:55:40 UTC
Created attachment 256561 [details] [review]
Fix GValue array marshaling leaks and crash fallout

* Decrement references for results of PySequence_GetItem. There were a few
places we were not decrementing the Python reference, leaking the value.
* Add tracking of Python arguments with recursive marshaling cleanup. This
allows arrays of GValues which have been coerced from Python types to be
properly free'd (also fixes bug 703662).
* Use g_variant_ref for variant arguments marked as transfer everything.
This fixes double free's caused by the decrementing of PySequence_GetItem
results.
Comment 14 Simon Feltman 2013-10-06 11:55:42 UTC
Created attachment 256562 [details] [review]
Cleanup per-item array marshaling code for flat arrays

Add an early per-item check which tests if the item being marshaled is a
pointer and simply copies the pointer into the array. This takes care of the
GdkAtom and GVariant special cases because these items are always reported
as pointers.
Fix error condition cleanup code when an item fails marshaling in the middle
of an array.
Comment 15 Simon Feltman 2013-10-07 02:39:18 UTC
Created attachment 256595 [details] [review]
Fix GValue array marshaling leaks and crash fallout

Cleaned up un-needed auxilary changes.
Comment 16 Simon Feltman 2013-10-07 03:34:50 UTC
Notes on the main patch:

I was not able get valgrind to recognize the memory leak or fix but it is very obvious with a test script (and in the code). It is possible this is due to something like what is mentioned in https://wiki.gnome.org/Valgrind regarding "resident-modules" and it looks like GI uses GModule. I will retest with "resident-modules" set.

valgrind actually reports a new leak with the patch regarding arrays of GVariant:

86 bytes, 1 blocks
malloc
g_malloc
g_slice_alloc
g_variant_alloc
g_variant_new_from_bytes
g_variant_new_from_trusted
g_variant_new_string
ffi_call_unix64
ffi_call
g_callable_info_invoke
g_function_info_invoke
_invoke_callable

This is coming from test_gi.TestArrayGVariant.test_array_gvariant_full_in (gi_marshalling_tests_array_gvariant_full_in). This is a side effect of our out array marshaling leaking because the test method returns one of the variants passed in as new array without unrefing it. Out marshaling of arrays containing GVariant show obvious leaks in the other tests which exist before and after the patch.
Comment 17 Simon Feltman 2013-10-07 05:30:40 UTC
Created attachment 256596 [details] [review]
Fix to Python marshaling leaks for arrays holding GVariants

Add early check for array items holding pointers and simply assign the
pointer to GIArgument.v_pointer prior giving it to the per-item marshaler.
This simplifies marshaling and fixes leaks regarding arrays of GVariants by
removing the unneeded g_variant_ref_sink (variants are always pointers).
Conditionalize the use of g_variant_ref_sink based on transfer mode in the
per-item marshaler. This fixes a reference leak where we are given ownership
of the variant (transfer full) but added a new ref anyway.

Notes:
I've verified this fixes both the "new" leak which was showing up as a side 
effect of from Python array marshaling fixes in the prior patch. This also 
fixes older to Python variant leak that was showing up prior to all these 
fixes. Still seems like some amount of cleanup can be done in the to Python 
array marshaling for structs and boxed types in general though.
Comment 18 Simon Feltman 2013-10-07 09:54:46 UTC
Created attachment 256607 [details] [review]
Fix GValue array marshaling leaks and crash fallout

Fix missing assignement in array cleanup.
Comment 19 Martin Pitt 2013-10-07 10:17:31 UTC
Review of attachment 256562 [details] [review]:

Thanks for this! That makes the code simpler, too.
Comment 20 Martin Pitt 2013-10-07 10:19:22 UTC
Review of attachment 256596 [details] [review]:

LGTM.
Comment 21 Martin Pitt 2013-10-07 10:23:20 UTC
Review of attachment 256607 [details] [review]:

Thanks! Unless you feel really sure about those, I think we should perhaps let these patches mature in 3.11.1/trunk for a bit, and cherry-pick them into 3.10.2 perhaps?
Comment 22 Simon Feltman 2013-10-08 01:20:33 UTC
Attachment 256562 [details] pushed as c9580ce - Cleanup per-item array marshaling code for flat arrays
Attachment 256596 [details] pushed as 31263ac - Fix to Python marshaling leaks for arrays holding GVariants
Attachment 256607 [details] pushed as 4623caa - Fix GValue array marshaling leaks and crash fallout

Leaving this open because there are still leaks with array inout args.
Comment 23 Simon Feltman 2013-10-08 01:39:50 UTC
(In reply to comment #21)
> Review of attachment 256607 [details] [review]:
> 
> Thanks! Unless you feel really sure about those, I think we should perhaps let
> these patches mature in 3.11.1/trunk for a bit, and cherry-pick them into
> 3.10.2 perhaps?

I'm fairly confident with the larger patch "Fix GValue array marshaling leaks and crash fallout", because it is actually quite simple. It's just big because of the function signature change for marshaling cleanup functions. The rest of it is ensuring proper reference counting for the PyObjects and GVariant reference. The passing of the py_arg to cleanup functions is only used in one case:
https://git.gnome.org/browse/pygobject/tree/gi/pygi-marshal-cleanup.c?id=3.10.0#n262

And this case was incorrectly used prior to the patch causing the crash in bug 703662.

It is a bit harder to consolidate the side effects of the array cleanup and leak fixes and hence a more possibility of regression. We have very thorough tests for a lot of cases here with the exception of tests for marshaling flat arrays of boxed, and pointer arrays of simple structs, from C to Python. I'm not sure if those are real world cases but our code paths allow for it and these patches definitely affect this.
Comment 24 Simon Feltman 2013-10-09 05:40:12 UTC
Pushed: https://git.gnome.org/browse/pygobject/commit/?id=d866d422

Which cleans up some of the following valgrind output:

---
234 bytes, 3 blocks
malloc
g_malloc
g_slice_alloc
g_array_sized_new
_pygi_marshal_from_py_array
_invoke_marshal_in_args
pygi_callable_info_invoke
_wrap_g_callable_info_invoke
_callable_info_call
_function_info_call
PyObject_Call
do_call

Is now:
156 bytes, 2 blocks

---
498 bytes, 3 blocks
malloc
g_malloc
g_slice_alloc
g_ptr_array_sized_new
g_ptr_array_new
_pygi_marshal_from_py_array
_invoke_marshal_in_args
pygi_callable_info_invoke
_wrap_g_callable_info_invoke
_callable_info_call
_function_info_call
PyObject_Call

Is now:
332 bytes, 2 blocks
Comment 25 Simon Feltman 2013-10-09 06:54:58 UTC
Created attachment 256786 [details] [review]
Add cleanup_data argument used for Python to C marshaler cleanup

Add a new output argument to all from_py marshalers which is used for
keeping track of marshaling data that later needs cleanup. Previously most
marshalers would rely on the GIArgument->v_pointer as the means for data
cleanup. However, this pointer would get clobbered in the case of
bi-directional arguments (inout) and the memory lost.
Use the new cleanup_data for storing temporarily wrapped C arrays so we
don't need to re-calculate the length argument during cleanup.

Additionally delay the from_py marshaling cleanup function until after
_invoke_marshal_out_args is called. This gives inout arguments which don't
modify the pointer sufficient time to exist until they marshaled back to
Python (gi_marshalling_tests_gvalue_inout).

Notes:
This ends up clearing out a bunch of various memory leaks due to inout 
arguments always having a chance to be cleaned up properly.

Testing with test_gi shows the following leak fixes:
_pygi_marshal_from_py_ghash (801 bytes, 3 blocks) to (267 bytes, 1 blocks) 
_pygi_marshal_from_py_utf8 (15 bytes, 1 blocks) is now gone
_pygi_marshal_from_py_glist (234 bytes, 3 blocks) to (78 bytes, 1 blocks) 
_pygi_marshal_from_py_gvalue (27 bytes, 1 blocks) is now gone
_pygi_marshal_from_py_gslist (162 bytes, 3 blocks) to (54 bytes, 1 blocks)
Comment 26 Simon Feltman 2013-10-11 03:56:51 UTC
Created attachment 256967 [details] [review]
Fix GArray, GList, GSList, and GHashTable marshaling leaks

Remove calling of cleanup code for transfer-everything modes by ensuring
cleanup_data is set to NULL in from_py marshalers. Use array and hash
table ref/unref functions for container transfer mode to ensure we have a
valid container ref after invoke and during from_py cleanup of contents.
Rework restrictions with to_py marshaling cleanup so we always unref the
container for transfer-everything and transfer-container modes.

Notes:
This cleans up quite a few leaks in test_gi:

_pygi_marshal_from_py_ghash (38 bytes, 16 blocks) to (19 bytes, 8 blocks)
ghashtable_utf8_container_inout (248 bytes, 1 blocks) gone
ghashtable_utf8_container_out (248 bytes, 1 blocks) gone
ghashtable_utf8_container_return (248 bytes, 1 blocks) gone
glist_utf8_container_out (72 bytes, 1 blocks) gone
glist_utf8_container_return (72 bytes, 1 blocks) gone
glist_utf8_container_inout (96 bytes, 1 blocks) gone
gslist_utf8_container_inout (64 bytes, 1 blocks) gone
gslist_utf8_container_out (48 bytes, 1 blocks) gone
gslist_utf8_container_return (48 bytes, 1 blocks) gone

We are left with a few loose fragments going through standard invoke 
marshaling paths. After that, the remaining leaks are coming from closure 
args and property get/sets which will be a whole different can of worms...
Comment 27 Simon Feltman 2013-10-11 05:55:19 UTC
I've created bug 709880 for tracking the remaining element leakage with transfer everything of GArray, GPtrArray, and GHashTable, since it is a somewhat separate issue and will require new architecture. If the remaining patches here are okay, I think we can call this one fixed!
Comment 28 Simon Feltman 2013-10-11 06:04:14 UTC
Created attachment 256968 [details] [review]
Fix GArray, GList, GSList, and GHashTable marshaling leaks

Wrong patch submitted before, here is the full version.
Comment 29 Martin Pitt 2013-10-14 08:19:39 UTC
Comment on attachment 256786 [details] [review]
Add cleanup_data argument used for Python to C marshaler cleanup

Good one. I remember stumbling over this when trying to fix some memleaks. Thanks!
Comment 30 Simon Feltman 2013-10-14 22:10:12 UTC
Attachment 256786 [details] pushed as 7407367 - Add cleanup_data argument used for Python to C marshaler cleanup
Attachment 256968 [details] pushed as fe217e0 - Fix GArray, GList, GSList, and GHashTable marshaling leaks