After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 696757 - Do not use dynamic keys in g_object_set_data()
Do not use dynamic keys in g_object_set_data()
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: general
3.6.x (obsolete)
Other Linux
: Normal normal
: ---
Assigned To: Evolution Shell Maintainers Team
Evolution QA team
Depends on: 697114
Blocks:
 
 
Reported: 2013-03-28 09:24 UTC by Patrick Ohly
Modified: 2013-04-12 12:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proposed eds patch (3.18 KB, patch)
2013-04-04 14:52 UTC, Milan Crha
none Details | Review
proposed eds patch ][ (3.30 KB, patch)
2013-04-05 08:50 UTC, Milan Crha
committed Details | Review

Description Patrick Ohly 2013-03-28 09:24:47 UTC
This leak showed up when switching SyncEvolution nightly testing from something based on 4250cb (shortly before 3.6.3) to something based on 7babb (shortly after 3.6.4). Not sure whether really any change inbetween caused this or whether it simply didn't show up before.

Valgrind reports:

==30822== 32,768 bytes in 1 blocks are definitely lost in loss record 4,033 of 4,039
==30822==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
==30822==    by 0x8AF8D40: g_malloc (gmem.c:159)
==30822==    by 0x8B01AF5: g_quark_from_string (gquark.c:278)
==30822==    by 0x8673FFC: g_object_set_data (gobject.c:3318)
==30822==    by 0x735F1C9: e_gdbus_proxy_call_sync (e-gdbus-templates.c:1524)
==30822==    by 0x73627B7: e_gdbus_proxy_call_sync_boolean__void (e-gdbus-templates.c:1724)
==30822==    by 0x735A350: e_client_proxy_call_sync_boolean__void (e-client.c:2622)
==30822==    by 0x7356841: e_client_open_sync (e-client.c:1383)

And:


==30822== 49,152 bytes in 1 blocks are definitely lost in loss record 4,035 of 4,039
==30822==    at 0x4C28BED: malloc (vg_replace_malloc.c:263)
==30822==    by 0x8AF8D40: g_malloc (gmem.c:159)
==30822==    by 0x8B01AF5: g_quark_from_string (gquark.c:278)
==30822==    by 0x8673FFC: g_object_set_data (gobject.c:3318)
==30822==    by 0x735F1C9: e_gdbus_proxy_call_sync (e-gdbus-templates.c:1524)
==30822==    by 0x7362B68: e_gdbus_proxy_call_sync_string__string (e-gdbus-templates.c:1811)
==30822==    by 0x5A1C7FA: e_gdbus_cal_call_get_view_sync (e-gdbus-cal.c:1131)
==30822==    by 0x735B1FB: e_client_proxy_call_sync_string__string (e-client.c:2725)
==30822==    by 0x5A00956: e_cal_client_get_view_sync (e-cal-client.c:4706)

The reason is a known leak in quark_new():
      /* This leaks the old quarks array. Its unfortunate, but it allows
       * us to do lockless lookup of the arrays, and there shouldn't be that
       * many quarks in an app
       */
      g_atomic_pointer_set (&quarks, quarks_new);

However, I think the real reason is that EDS requests quarks for an ever increasing number of strings:

e_gdbus_proxy_call_sync:
	sync_opid = g_atomic_int_add (&sync_op_counter, 1);
	sync_opid_ident = g_strdup_printf ("EGdbusTemplates-SyncOp-%d", sync_opid);
	g_object_set_data (G_OBJECT (proxy), sync_opid_ident, &sync_data);

This sync_opid_ident is the string which gets passed to q_quark_new_from_string(), where it gets copied (and never freed!) and eventually causes the quarks array to get reallocated (the leak above).

Even without that leak, memory will get exhausted eventually.

The same code still exists in master.
Comment 1 Patrick Ohly 2013-03-28 09:27:55 UTC
(In reply to comment #0)
> This leak showed up when switching SyncEvolution nightly testing from something
> based on 4250cb (shortly before 3.6.3) to something based on 7babb (shortly
> after 3.6.4). Not sure whether really any change inbetween caused this or
> whether it simply didn't show up before.

"git annotate" points towards:

commit 7c52fbd81093c1264e3d8aa6cdcf5c8bdc7b1772
Author: Milan Crha <mcrha@redhat.com>
Date:   Fri Jan 25 17:27:55 2013 +0100

    EGdbusTemplates: Address crash on operation cancel
    
    In situations when a synchronous operation was cancelled, but the response
    was already piled in main context the client could receive two replies, one
    from the reply, the other from the cancelled operation, effectively accessing
    invalid memory in the second response. This may address also other similar
    situations caused by cancelled operations.


7c52fbd8        (Milan Crha     2013-01-25 17:27:55 +0100       1522)   sync_opid = g_atomic_int_add (&sync_op_counte
7c52fbd8        (Milan Crha     2013-01-25 17:27:55 +0100       1523)   sync_opid_ident = g_strdup_printf ("EGdbusTem
7c52fbd8        (Milan Crha     2013-01-25 17:27:55 +0100       1524)   g_object_set_data (G_OBJECT (proxy), sync_opi
Comment 2 Patrick Ohly 2013-03-28 09:28:24 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > This leak showed up when switching SyncEvolution nightly testing from something
> > based on 4250cb (shortly before 3.6.3) to something based on 7babb (shortly
> > after 3.6.4). Not sure whether really any change inbetween caused this or
> > whether it simply didn't show up before.
> 
> "git annotate" points towards:
> 
> commit 7c52fbd81093c1264e3d8aa6cdcf5c8bdc7b1772
> Author: Milan Crha <mcrha@redhat.com>
> Date:   Fri Jan 25 17:27:55 2013 +0100

... and that was indeed added after the 3.6.3 release.
Comment 3 Patrick Ohly 2013-03-28 14:55:30 UTC
The 7c52f has another downside: after merging the 3.6.x branch into openismus-work, I got EDB_OPENING_ERROR = "Cannot process, book backend is
 opening" errors from e_client_open when called directly after e_book_client_new. It's timing dependent and may also depend on another client talking to the same address book at the same time.

I could reproduce the problem reliably in SyncEvolution's "testpim.py TestContacts.testFilterStartup" test. After reverting 7c52f the test passes again.

I am unsure how this patch can have this effect; I'm merely reporting my observation here so that wiser men can figure that out ;-}

One question, though: is this EDB_OPENING_ERROR something that a user of e_client_open() is expected to handle, for example by retrying? That seems very counter-intuitive and I doubt that any client handles the error.

So the real culprit might not be 7c52f at all; perhaps it just makes the race condition more likely to go wrong.
Comment 4 Patrick Ohly 2013-03-28 20:16:13 UTC
(In reply to comment #3)
> The 7c52f has another downside: after merging the 3.6.x branch into
> openismus-work, I got EDB_OPENING_ERROR = "Cannot process, book backend is
>  opening" errors from e_client_open when called directly after
> e_book_client_new. It's timing dependent and may also depend on another client
> talking to the same address book at the same time.
> 
> I could reproduce the problem reliably in SyncEvolution's "testpim.py
> TestContacts.testFilterStartup" test. After reverting 7c52f the test passes
> again.

The problem in DRA occurred again later, so it is not certain whether reverting 7c52f really helped:

https://nightly.syncevolution.org/2013-03-28-09-44_eds_client_amd64_dbus_eds/edsclient-amd64/8-pim/Contacts_FilterStartupRefineSmart.log.html

[WARNING syncevo-dbus-server 00:00:18] libedata-book: (e-book-backend-sqlitedb.c:3702):book_backend_sqlitedb_search_query: runtime check failed: (vcard_data == NULL)
[WARNING syncevo-dbus-server 00:00:18] libebookbackend: Failed to fetch contact ids: database disk image is malformed
[DEBUG syncevo-dbus-server 00:00:18] edfs pim-manager-test-dbus-foo: reading failed: Cannot get contact list: database disk image is malformed
Comment 5 Milan Crha 2013-04-02 10:07:46 UTC
As we spoke on IRC, the commit was meant to fix cancellations, which could crash application. These two side-effects are unexpected.

The memory leak is also caused by GObject, not by the newly allocated string in eds (which is later freed) being used for an "internal" data value name on a GObject descendant, which we agreed on too.

I was told that Tristan is looking for a reproducer with our eds tests, thus let's wait whether he'll find it.
Comment 6 Patrick Ohly 2013-04-02 10:40:06 UTC
(In reply to comment #5)
> As we spoke on IRC, the commit was meant to fix cancellations, which could
> crash application. These two side-effects are unexpected.

Agreed. Let's ignore the problem with open failing with E_CLIENT_ERROR_BUSY (the equivalent of EDB_OPENING_ERROR = E_DATA_BOOK_STATUS_BUSY) in this bug report.

> The memory leak is also caused by GObject, not by the newly allocated string in
> eds (which is later freed) being used for an "internal" data value name on a
> GObject descendant, which we agreed on too.

However, unless someone convinces the glib devs to change their implementation *and* EDS can depend on a glib with that change, a fix needs to go into EDS to work with a glib that works as described above.

> I was told that Tristan is looking for a reproducer with our eds tests, thus
> let's wait whether he'll find it.

Fixing the leak is independent of that.
Comment 7 Milan Crha 2013-04-04 14:52:55 UTC
Created attachment 240619 [details] [review]
proposed eds patch

for evolution-data-server;

I could not think of anything better than replacing the dynamically created key with a static key which is holding a GHashTable of sync operation IDs. It also does less memory allocations on eds side (I've no idea how on the GHashTable side).

Could you try with this patch, please? If you find it working, I'll commit to gnome-3-6, gnome-3-8 and master branches.
Comment 8 Patrick Ohly 2013-04-05 07:36:00 UTC
(In reply to comment #7)
> Created an attachment (id=240619) [details] [review]
> proposed eds patch
> 
> for evolution-data-server;
> 
> I could not think of anything better than replacing the dynamically created key
> with a static key which is holding a GHashTable of sync operation IDs. It also
> does less memory allocations on eds side (I've no idea how on the GHashTable
> side).
> 
> Could you try with this patch, please?

I've applied it and run tests at the moment.

From looking at the patch I wonder: you protect the creation of the GHashTable with a mutex, but not the actual insert/lookup/remove of values inside it. If one needs to be thread-safe, doesn't the other need the same protection? Or is GHashTable itself thread-safe?
Comment 9 Patrick Ohly 2013-04-05 08:04:19 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Could you try with this patch, please?
> 
> I've applied it and run tests at the moment.

Passes my tests without leaks.

> From looking at the patch I wonder: you protect the creation of the GHashTable
> with a mutex, but not the actual insert/lookup/remove of values inside it. If
> one needs to be thread-safe, doesn't the other need the same protection? Or is
> GHashTable itself thread-safe?

I still wonder about that.
Comment 10 Milan Crha 2013-04-05 08:43:20 UTC
(In reply to comment #8)
> From looking at the patch I wonder: you protect the creation of the GHashTable
> with a mutex, but not the actual insert/lookup/remove of values inside it. If
> one needs to be thread-safe, doesn't the other need the same protection? Or is
> GHashTable itself thread-safe?

My idea behind the lock is to not create two hash tables if there is one already. The global lock makes sure that one GDBusProxy descendant doesn't get created two hash tables if there will be two concurrent threads with sync calls on one proxy object. With a "good" thread interleaving one would create a hash, push it into the object and immediately after that the second call in the second thread would overwrite the hash table in the proxy, together with a free of the previous one.

I do not expect GHashTable being thread safe, and I didn't consider the manipulation of the items in it as an issue, but maybe you are right. I'm updating the patch.
Comment 11 Milan Crha 2013-04-05 08:50:20 UTC
Created attachment 240706 [details] [review]
proposed eds patch ][

for evolution-data-server;

With more added locking around sync_data_hash table.
Comment 12 Patrick Ohly 2013-04-12 06:40:18 UTC
(In reply to comment #11)
> Created an attachment (id=240706) [details] [review]
> proposed eds patch ][
> 
> for evolution-data-server;
> 
> With more added locking around sync_data_hash table.

I've tested this with EDS 3.8 and it works.

I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or is that something else?
Comment 13 Milan Crha 2013-04-12 07:35:35 UTC
(In reply to comment #12)
> I've tested this with EDS 3.8 and it works.

Thanks, I'll commit it shortly.

> I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in
> 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or
> is that something else?

You are right, the code is obsolete in 3.8.0+, I guess Matthew left it there only in case of it being used by someone else (unlikely). The files might be dropped in 3.9.x, from my point of view, but better to ensure with Matthew.
Comment 14 Milan Crha 2013-04-12 08:42:29 UTC
Created commit 5da2c8f in eds master (3.9.1+)
Created commit 29694b0 in eds gnome-3-8 (3.8.1+)
Created commit 319d8d9 in eds gnome-3-6 (long after 3.6.4 release)
Comment 15 Patrick Ohly 2013-04-12 09:06:24 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > I've tested this with EDS 3.8 and it works.
> 
> Thanks, I'll commit it shortly.
> 
> > I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in
> > 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or
> > is that something else?
> 
> You are right, the code is obsolete in 3.8.0+,

Then my testing of the revised patch with 3.8 was not very useful... but the patched looked sane and the earlier version worked in 3.6.
Comment 16 Matthew Barnes 2013-04-12 12:36:46 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > I'm just curious - why is libedataserver/e-gdbus-templates.c still relevant in
> > 3.8? Didn't that version switch to automatically generated D-Bus bindings? Or
> > is that something else?
> 
> You are right, the code is obsolete in 3.8.0+, I guess Matthew left it there
> only in case of it being used by someone else (unlikely). The files might be
> dropped in 3.9.x, from my point of view, but better to ensure with Matthew.

I think there may be some miscellaneous bits in e-gdbus-templates.c still being used by view objects.  The view objects are kind of a separate project which I hope to finish by 3.10.  I suppose we could start stripping out the unused bits from e-gdbus-templates.c though.