GNOME Bugzilla – Bug 696757
Do not use dynamic keys in g_object_set_data()
Last modified: 2013-04-12 12:36:46 UTC
This leak showed up when switching SyncEvolution nightly testing from something based on 4250cb (shortly before 3.6.3) to something based on 7babb (shortly after 3.6.4). Not sure whether really any change inbetween caused this or whether it simply didn't show up before. Valgrind reports: ==30822== 32,768 bytes in 1 blocks are definitely lost in loss record 4,033 of 4,039 ==30822== at 0x4C28BED: malloc (vg_replace_malloc.c:263) ==30822== by 0x8AF8D40: g_malloc (gmem.c:159) ==30822== by 0x8B01AF5: g_quark_from_string (gquark.c:278) ==30822== by 0x8673FFC: g_object_set_data (gobject.c:3318) ==30822== by 0x735F1C9: e_gdbus_proxy_call_sync (e-gdbus-templates.c:1524) ==30822== by 0x73627B7: e_gdbus_proxy_call_sync_boolean__void (e-gdbus-templates.c:1724) ==30822== by 0x735A350: e_client_proxy_call_sync_boolean__void (e-client.c:2622) ==30822== by 0x7356841: e_client_open_sync (e-client.c:1383) And: ==30822== 49,152 bytes in 1 blocks are definitely lost in loss record 4,035 of 4,039 ==30822== at 0x4C28BED: malloc (vg_replace_malloc.c:263) ==30822== by 0x8AF8D40: g_malloc (gmem.c:159) ==30822== by 0x8B01AF5: g_quark_from_string (gquark.c:278) ==30822== by 0x8673FFC: g_object_set_data (gobject.c:3318) ==30822== by 0x735F1C9: e_gdbus_proxy_call_sync (e-gdbus-templates.c:1524) ==30822== by 0x7362B68: e_gdbus_proxy_call_sync_string__string (e-gdbus-templates.c:1811) ==30822== by 0x5A1C7FA: e_gdbus_cal_call_get_view_sync (e-gdbus-cal.c:1131) ==30822== by 0x735B1FB: e_client_proxy_call_sync_string__string (e-client.c:2725) ==30822== by 0x5A00956: e_cal_client_get_view_sync (e-cal-client.c:4706) The reason is a known leak in quark_new(): /* This leaks the old quarks array. Its unfortunate, but it allows * us to do lockless lookup of the arrays, and there shouldn't be that * many quarks in an app */ g_atomic_pointer_set (&quarks, quarks_new); However, I think the real reason is that EDS requests quarks for an ever increasing number of strings: e_gdbus_proxy_call_sync: sync_opid = g_atomic_int_add (&sync_op_counter, 1); sync_opid_ident = g_strdup_printf ("EGdbusTemplates-SyncOp-%d", sync_opid); g_object_set_data (G_OBJECT (proxy), sync_opid_ident, &sync_data); This sync_opid_ident is the string which gets passed to q_quark_new_from_string(), where it gets copied (and never freed!) and eventually causes the quarks array to get reallocated (the leak above). Even without that leak, memory will get exhausted eventually. The same code still exists in master.
(In reply to comment #0) > This leak showed up when switching SyncEvolution nightly testing from something > based on 4250cb (shortly before 3.6.3) to something based on 7babb (shortly > after 3.6.4). Not sure whether really any change inbetween caused this or > whether it simply didn't show up before. "git annotate" points towards: commit 7c52fbd81093c1264e3d8aa6cdcf5c8bdc7b1772 Author: Milan Crha <mcrha@redhat.com> Date: Fri Jan 25 17:27:55 2013 +0100 EGdbusTemplates: Address crash on operation cancel In situations when a synchronous operation was cancelled, but the response was already piled in main context the client could receive two replies, one from the reply, the other from the cancelled operation, effectively accessing invalid memory in the second response. This may address also other similar situations caused by cancelled operations. 7c52fbd8 (Milan Crha 2013-01-25 17:27:55 +0100 1522) sync_opid = g_atomic_int_add (&sync_op_counte 7c52fbd8 (Milan Crha 2013-01-25 17:27:55 +0100 1523) sync_opid_ident = g_strdup_printf ("EGdbusTem 7c52fbd8 (Milan Crha 2013-01-25 17:27:55 +0100 1524) g_object_set_data (G_OBJECT (proxy), sync_opi
(In reply to comment #1) > (In reply to comment #0) > > This leak showed up when switching SyncEvolution nightly testing from something > > based on 4250cb (shortly before 3.6.3) to something based on 7babb (shortly > > after 3.6.4). Not sure whether really any change inbetween caused this or > > whether it simply didn't show up before. > > "git annotate" points towards: > > commit 7c52fbd81093c1264e3d8aa6cdcf5c8bdc7b1772 > Author: Milan Crha <mcrha@redhat.com> > Date: Fri Jan 25 17:27:55 2013 +0100 ... and that was indeed added after the 3.6.3 release.
The 7c52f has another downside: after merging the 3.6.x branch into openismus-work, I got EDB_OPENING_ERROR = "Cannot process, book backend is opening" errors from e_client_open when called directly after e_book_client_new. It's timing dependent and may also depend on another client talking to the same address book at the same time. I could reproduce the problem reliably in SyncEvolution's "testpim.py TestContacts.testFilterStartup" test. After reverting 7c52f the test passes again. I am unsure how this patch can have this effect; I'm merely reporting my observation here so that wiser men can figure that out ;-} One question, though: is this EDB_OPENING_ERROR something that a user of e_client_open() is expected to handle, for example by retrying? That seems very counter-intuitive and I doubt that any client handles the error. So the real culprit might not be 7c52f at all; perhaps it just makes the race condition more likely to go wrong.
(In reply to comment #3) > The 7c52f has another downside: after merging the 3.6.x branch into > openismus-work, I got EDB_OPENING_ERROR = "Cannot process, book backend is > opening" errors from e_client_open when called directly after > e_book_client_new. It's timing dependent and may also depend on another client > talking to the same address book at the same time. > > I could reproduce the problem reliably in SyncEvolution's "testpim.py > TestContacts.testFilterStartup" test. After reverting 7c52f the test passes > again. The problem in DRA occurred again later, so it is not certain whether reverting 7c52f really helped: https://nightly.syncevolution.org/2013-03-28-09-44_eds_client_amd64_dbus_eds/edsclient-amd64/8-pim/Contacts_FilterStartupRefineSmart.log.html [WARNING syncevo-dbus-server 00:00:18] libedata-book: (e-book-backend-sqlitedb.c:3702):book_backend_sqlitedb_search_query: runtime check failed: (vcard_data == NULL) [WARNING syncevo-dbus-server 00:00:18] libebookbackend: Failed to fetch contact ids: database disk image is malformed [DEBUG syncevo-dbus-server 00:00:18] edfs pim-manager-test-dbus-foo: reading failed: Cannot get contact list: database disk image is malformed
As we spoke on IRC, the commit was meant to fix cancellations, which could crash application. These two side-effects are unexpected. The memory leak is also caused by GObject, not by the newly allocated string in eds (which is later freed) being used for an "internal" data value name on a GObject descendant, which we agreed on too. I was told that Tristan is looking for a reproducer with our eds tests, thus let's wait whether he'll find it.
(In reply to comment #5) > As we spoke on IRC, the commit was meant to fix cancellations, which could > crash application. These two side-effects are unexpected. Agreed. Let's ignore the problem with open failing with E_CLIENT_ERROR_BUSY (the equivalent of EDB_OPENING_ERROR = E_DATA_BOOK_STATUS_BUSY) in this bug report. > The memory leak is also caused by GObject, not by the newly allocated string in > eds (which is later freed) being used for an "internal" data value name on a > GObject descendant, which we agreed on too. However, unless someone convinces the glib devs to change their implementation *and* EDS can depend on a glib with that change, a fix needs to go into EDS to work with a glib that works as described above. > I was told that Tristan is looking for a reproducer with our eds tests, thus > let's wait whether he'll find it. Fixing the leak is independent of that.
Created attachment 240619 [details] [review] proposed eds patch for evolution-data-server; I could not think of anything better than replacing the dynamically created key with a static key which is holding a GHashTable of sync operation IDs. It also does less memory allocations on eds side (I've no idea how on the GHashTable side). Could you try with this patch, please? If you find it working, I'll commit to gnome-3-6, gnome-3-8 and master branches.
(In reply to comment #7) > Created an attachment (id=240619) [details] [review] > proposed eds patch > > for evolution-data-server; > > I could not think of anything better than replacing the dynamically created key > with a static key which is holding a GHashTable of sync operation IDs. It also > does less memory allocations on eds side (I've no idea how on the GHashTable > side). > > Could you try with this patch, please? I've applied it and run tests at the moment. From looking at the patch I wonder: you protect the creation of the GHashTable with a mutex, but not the actual insert/lookup/remove of values inside it. If one needs to be thread-safe, doesn't the other need the same protection? Or is GHashTable itself thread-safe?
(In reply to comment #8) > (In reply to comment #7) > > Could you try with this patch, please? > > I've applied it and run tests at the moment. Passes my tests without leaks. > From looking at the patch I wonder: you protect the creation of the GHashTable > with a mutex, but not the actual insert/lookup/remove of values inside it. If > one needs to be thread-safe, doesn't the other need the same protection? Or is > GHashTable itself thread-safe? I still wonder about that.
(In reply to comment #8) > From looking at the patch I wonder: you protect the creation of the GHashTable > with a mutex, but not the actual insert/lookup/remove of values inside it. If > one needs to be thread-safe, doesn't the other need the same protection? Or is > GHashTable itself thread-safe? My idea behind the lock is to not create two hash tables if there is one already. The global lock makes sure that one GDBusProxy descendant doesn't get created two hash tables if there will be two concurrent threads with sync calls on one proxy object. With a "good" thread interleaving one would create a hash, push it into the object and immediately after that the second call in the second thread would overwrite the hash table in the proxy, together with a free of the previous one. I do not expect GHashTable being thread safe, and I didn't consider the manipulation of the items in it as an issue, but maybe you are right. I'm updating the patch.
Created attachment 240706 [details] [review] proposed eds patch ][ for evolution-data-server; With more added locking around sync_data_hash table.
(In reply to comment #11) > Created an attachment (id=240706) [details] [review] > proposed eds patch ][ > > for evolution-data-server; > > With more added locking around sync_data_hash table. I've tested this with EDS 3.8 and it works. I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or is that something else?
(In reply to comment #12) > I've tested this with EDS 3.8 and it works. Thanks, I'll commit it shortly. > I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in > 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or > is that something else? You are right, the code is obsolete in 3.8.0+, I guess Matthew left it there only in case of it being used by someone else (unlikely). The files might be dropped in 3.9.x, from my point of view, but better to ensure with Matthew.
Created commit 5da2c8f in eds master (3.9.1+) Created commit 29694b0 in eds gnome-3-8 (3.8.1+) Created commit 319d8d9 in eds gnome-3-6 (long after 3.6.4 release)
(In reply to comment #13) > (In reply to comment #12) > > I've tested this with EDS 3.8 and it works. > > Thanks, I'll commit it shortly. > > > I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in > > 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or > > is that something else? > > You are right, the code is obsolete in 3.8.0+, Then my testing of the revised patch with 3.8 was not very useful... but the patched looked sane and the earlier version worked in 3.6.
(In reply to comment #13) > (In reply to comment #12) > > I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in > > 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or > > is that something else? > > You are right, the code is obsolete in 3.8.0+, I guess Matthew left it there > only in case of it being used by someone else (unlikely). The files might be > dropped in 3.9.x, from my point of view, but better to ensure with Matthew. I think there may be some miscellaneous bits in e-gdbus-templates.c still being used by view objects. The view objects are kind of a separate project which I hope to finish by 3.10. I suppose we could start stripping out the unused bits from e-gdbus-templates.c though.