After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 695960 - Leaked GSources in user's GMainContext after EBookClient finalizes
Leaked GSources in user's GMainContext after EBookClient finalizes
Status: RESOLVED NOTABUG
Product: evolution-data-server
Classification: Platform
Component: Contacts
3.8.x (obsolete)
Other Linux
: Normal critical
: ---
Assigned To: evolution-addressbook-maintainers
Evolution QA team
Depends on:
Blocks:
 
 
Reported: 2013-03-16 10:43 UTC by Tristan Van Berkom
Modified: 2013-09-14 16:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
stack traces from test-client-self (4.10 KB, application/octet-stream)
2013-03-16 10:43 UTC, Tristan Van Berkom
  Details
Synchronize removal of watcher in D-Bus thread in dispose (3.62 KB, patch)
2013-03-17 10:57 UTC, Tristan Van Berkom
none Details | Review

Description Tristan Van Berkom 2013-03-16 10:43:45 UTC
Created attachment 239022 [details]
stack traces from test-client-self

Here is where the fun starts when implementing this manual threading system
on top of D-Bus...

The problem can be found by running tests/libebook/client/test-client-self

Run the test 20 times in a row, I can usually reproduce the failure/crash
after about 6 or 7 times.

There are a two notable problems here:
  a.) e-book-client.c needs to use g_weak_ref_init()/g_weak_ref_clear()
      to be safe, since it allocates the weak refs on the heap and not
      statically.

  b.) When the EBookClient finalizes, it can leave GSources remaining
      in the callers' GMainContext.

While 'a' is easily fixed, 'b' causes more problems (and can be seen
easily with test-client-self)

To fix 'b', you must track all of the source ids added to the GMainContext
which belongs to the caller (it's not true that unreffing the main context
from EClient->dispose() will cause any sources to be removed from the context,
the EClient implementations do not own the caller's context and thus is
responsible for removing any sources added to it at finalize time).

I tried hacking an AyncQueue into EBookClient for this but gave up, a better
solution would be add api to EClient which would automatically track the
added source ids which it is responsible for removing later (and probably
simplifying the code in EBookClient which adds idle callbacks).

An even better and safer solution would be to just give up on the whole
complex dedicated D-Bus thread, and just use normal GDBusProxy semantics
(IMO, the whole complexity around this threading is still not worth whatever
benefit it seems to bring).
Comment 1 Tristan Van Berkom 2013-03-16 10:47:42 UTC
Somehow I just don't know how to attach a stack trace properly, instead
I'll just paste it here.

This is what happens when you run the second iteration of test-client-self,
while the first iteration of test-client-self has completed, but irresponsibly
leaked a GSource into the caller's main context which immediately fires
as soon as the main loop restarts.

It will happen in any case that the EBookClient manages to finalize
before the source had a chance to get triggered in the callers' main context.

  • #0 g_logv
    at gmessages.c line 981
  • #0 g_logv
    at gmessages.c line 981
  • #1 g_log
    at gmessages.c line 1010
  • #2 g_object_ref
    at gobject.c line 2884
  • #3 g_weak_ref_get
    at gobject.c line 4067
  • #4 book_client_name_vanished_cb
    at e-book-client.c line 464
  • #5 ??
  • #6 ??
  • #7 g_main_dispatch
    at gmain.c line 3054
  • #8 g_main_context_dispatch
    at gmain.c line 3630
  • #9 g_main_context_iterate
    at gmain.c line 3701
  • #10 g_main_context_iterate
    at gmain.c line 3638
  • #11 g_main_loop_run
    at gmain.c line 3895
  • #12 e_test_server_utils_bootstrap_idle
    at e-test-server-utils.c line 225
  • #13 ??
  • #14 ??
  • #15 test_case_run
    at gtestutils.c line 1713
  • #16 g_test_run_suite_internal
    at gtestutils.c line 1767
  • #17 g_test_run_suite_internal
    at gtestutils.c line 1778
  • #18 g_test_run_suite_internal
    at gtestutils.c line 1778
  • #19 g_test_run_suite
    at gtestutils.c line 1823
  • #20 setup_environment
    at e-test-server-utils.c line 75
  • #21 e_test_server_utils_run
    at e-test-server-utils.c line 452
  • #22 main
    at test-client-self.c line 82

Comment 2 Matthew Barnes 2013-03-16 11:23:25 UTC
(In reply to comment #0)
> There are a two notable problems here:
>   a.) e-book-client.c needs to use g_weak_ref_init()/g_weak_ref_clear()
>       to be safe, since it allocates the weak refs on the heap and not
>       statically.

g_weak_ref_init() is only needed when allocating on a stack frame where there's garbage in memory.  Zero-filled memory is sufficient.  I asked Ryan exactly this awhile ago:

Apr 16 16:41:39 <mbarnes>»······desrt: quick question: a GWeakRef embedded in zero-filled memory is sufficiently initialized to not have to call g_weak_ref_init(..., NULL)... correct?
Apr 16 16:59:12 <desrt>»mbarnes: correct
Apr 16 16:59:17 <desrt>»mbarnes: that's one of the nice features of the API, indeed

Similarly, g_weak_ref_set(weakref, NULL) is sufficient to release the weak reference.


>   b.) When the EBookClient finalizes, it can leave GSources remaining
>       in the callers' GMainContext.

Which the GMainContext frees when its own reference count hits zero.  The callback functions attached to those GSources are essentially no-ops when the GWeakRef carried in the payload for the callback turns out to be empty.  I intended it to be unnecessary to track all the GSources we attach.

Although the name vanished case does seem to be racing here.  I'll look at fixing that.
Comment 4 Tristan Van Berkom 2013-03-16 12:08:02 UTC
(In reply to comment #2)
> (In reply to comment #0)
> >   b.) When the EBookClient finalizes, it can leave GSources remaining
> >       in the callers' GMainContext.
> 
> Which the GMainContext frees when its own reference count hits zero.  The
> callback functions attached to those GSources are essentially no-ops when the
> GWeakRef carried in the payload for the callback turns out to be empty.  I
> intended it to be unnecessary to track all the GSources we attach.

Actually I thought that too but no.

Remember that the GMainContext which is referred to by EClient is *not*
created by the EClient, it is not owned by the EClient and when the
EClient finalizes... the GMainContext (which belongs to the user)
reference count does *not* reach zero.

So it is not finalized, and any sources added to it by the EClient are
not automatically removed.

Rather, the EClient calls g_main_context_ref_thread_default() when it
starts up... adding an additional ref to the built-in GLib default
GMainContext... any sources added to the user's GMainContext must be
removed by the EClient before finalizing... because the GSources added
to the client's mainloop belong to the EClient, while the GMainContext
does not.
Comment 5 Tristan Van Berkom 2013-03-16 12:19:14 UTC
Applied patch and recompiled.

As I suspected, I can still reproduce the race, new stack trace here.

Mostly the same, only now book_client_name_vanished_cb() is on line 488
instead of 464.


  • #0 g_logv
    at gmessages.c line 981
  • #1 g_log
    at gmessages.c line 1010
  • #2 g_object_ref
    at gobject.c line 2884
  • #3 g_weak_ref_get
    at gobject.c line 4067
  • #4 book_client_name_vanished_cb
    at e-book-client.c line 488
  • #5 g_main_dispatch
    at gmain.c line 3054
  • #6 g_main_context_dispatch
    at gmain.c line 3630
  • #7 g_main_context_iterate
    at gmain.c line 3701
  • #8 g_main_context_iterate
    at gmain.c line 3638
  • #9 g_main_loop_run
    at gmain.c line 3895
  • #10 e_test_server_utils_bootstrap_idle
    at e-test-server-utils.c line 225
  • #11 ??
  • #12 ??
  • #13 test_case_run
    at gtestutils.c line 1713
  • #14 g_test_run_suite_internal
    at gtestutils.c line 1767
  • #15 g_test_run_suite_internal
    at gtestutils.c line 1778
  • #16 g_test_run_suite_internal
    at gtestutils.c line 1778
  • #17 g_test_run_suite
    at gtestutils.c line 1823
  • #18 setup_environment
    at e-test-server-utils.c line 75
  • #19 e_test_server_utils_run
    at e-test-server-utils.c line 452
  • #20 main
    at test-client-self.c line 82

Comment 6 Matthew Barnes 2013-03-16 12:20:18 UTC
I never said the GMainContext is disposed at the same time as the EClient.

The GSources we attach are always removed after they are dispatched, since the idle callbacks always return FALSE.

If the idle callback is dispatched after the EClient is disposed, then the callback's attempt to acquire a strong reference on the EClient fails and it simply returns FALSE.  It's a no-op, in other words.

Any undispatched GSources are cleaned up when the GMainContext itself is finalized.

So I still don't see any need to track them.
Comment 7 Tristan Van Berkom 2013-03-16 12:24:38 UTC
(In reply to comment #6)
> I never said the GMainContext is disposed at the same time as the EClient.
> 
> The GSources we attach are always removed after they are dispatched, since the
> idle callbacks always return FALSE.
> 
> If the idle callback is dispatched after the EClient is disposed, then the
> callback's attempt to acquire a strong reference on the EClient fails and it
> simply returns FALSE.  It's a no-op, in other words.
> 
> Any undispatched GSources are cleaned up when the GMainContext itself is
> finalized.

Right, however the GMainContext is _not_ finalized at the time that
the EClient is finalized.

Which means that at any time that the EBookClient is unreffed and there
remains an undispatched GSource, once the main loop continues and dispatches
the GSource (which the EBookClient created)... we get this crash.

This will happen regardless of whether:

  a.) The EBookClient is unreffed in the middle of a program with a running
      main loop

  b.) The EBookClient is unreffed and then the main loop quit, and then
      resumed later on (which happens to be the case in our tests)

> 
> So I still don't see any need to track them.

So long as the EBookClient leaves GSources leaked into the calling code's
GMainContext after the EBookClient finalizes, we will see this crash
Comment 8 Tristan Van Berkom 2013-03-16 12:28:35 UTC
Alternatively, one would think that the GWeakRef would make this code
safe, and that g_weak_ref_get() in the latent GSource would simply
return NULL.

If that's the expectation of the GWeakRef API, then I would venture to
say that it's the GWeakRef code which is broken (however it makes more
sense to me that GWeakRef simply does not work like that, and that it's
up to us to ensure we don't access a finalized client after it was
finalized).
Comment 9 Matthew Barnes 2013-03-16 12:36:07 UTC
The whole purpose of GWeakRef is to be thread-safe when the object it's referencing is finalized.
Comment 10 Tristan Van Berkom 2013-03-16 13:07:26 UTC
Indeed it's possible we're looking at a bug in GWeakRef then.

Seems pretty clear in the docs that it should be working:

"A structure containing a weak reference to a GObject. It can either be empty (i.e. point to NULL), or point to an object for as long as at least one "strong" reference to that object exists. Before the object's GObjectClass.dispose method is called, every GWeakRef associated with becomes empty (i.e. points to NULL)"

However, I find it odd that in the stack trace we find that
g_weak_ref_get() calls g_object_ref(), causing the G_IS_OBJECT() assertion
to raise.

Perhaps I should fashion a test case against glib for this as well.
Comment 11 Tristan Van Berkom 2013-03-16 13:21:00 UTC
Just updated glib to fresh master.

Bug still is reproducible, just do:

 a.) ulimit -c unlimited
 b.) ./test-client-self

Repeat 'b' many times... it will crash eventually, 'a' will of course
ensure that there is a 'core' file for gdb...

I'll look into creating a test case reproducing this for glib tomorrow,
then I'll open a bug against glib and make this one block on it
(perhaps it would be better to transfer this bug to glib, but then
it would be gone from the EDS buglist, and as a result we could risk
forgetting about it... so better make a new bug I think).
Comment 12 Tristan Van Berkom 2013-03-16 13:48:33 UTC
I have a theory.

In the D-Bus dedicated thread... the EBookClient is given directly to
the SignalClosure using g_weak_ref_set().

Now... when g_weak_ref_set() enters... it does:

   g_return_if_fail (G_IS_OBJECT ());

And we don't see any assertions... but *then* it goes and acquires the
GRWLock for 'weak_ref_locations'.

Simultaneously, g_object_unref() is being called in the main thread/context,
once g_object_unref() releases the lock... the D-Bus worker thread unlocks
and proceeds to store a finalized/invalid pointer in the 'weak_ref_locations'
list.

THAT is where the bug starts to show.

Solution:
  o The worker thread must also store a thread-wide GWeakRef on the EClient
    pointer
  o When creating a SignalClosure in e-book-client.c, we should call
    g_weak_ref_get() on the thread-wide GWeakRef, this will give us a
    temporary reference which we can carry until the GWeakRef is given
    to the SignalClosure
  o Then unref the temporary strong reference after creating the
    SignalClosure

I'm pretty convinced that this is where things are actually going wrong.
Comment 13 Tristan Van Berkom 2013-03-16 13:57:39 UTC
Then again... it might be safer all around to just bookkeep any GSources
which are added to the client's main loop.

Consider that; to do this reliably:
  o The worker thread must hold a strong reference at least temporarily
    in order to create the GWeakRef
  o Because the thread holds a temporary strong reference, it's possible
    for the client to finalize *from the D-Bus worker thread*, probably
    quite an undesirable can of worms open from there
  o We couldn't properly assert that EClient classes are finalized
    directly after removing the last reference (as we currently do
    in test cases).

An interesting alternative, would be to use something like
 'g_weak_ref_copy (&thread_ref, &closure_ref);', in order to safely
copy the weak reference without ever mandating a strong reference.

However currently no such function exists.
Comment 14 Tristan Van Berkom 2013-03-17 10:57:03 UTC
Created attachment 239060 [details] [review]
Synchronize removal of watcher in D-Bus thread in dispose

WARNING: This patch does not work, it's meant more as an example

I'm sure I can see a part of what is going wrong here, and the attached
patch is an attempt to fix it.

NOTE: I did also try the basic queueing of the removal of the watcher_id
from the appropriate D-Bus thread, but that also had problems, so I tried
to synchronize it with the client (user) thread, still has problems.

Now, the explanation of the previously attached stack traces is as follows:

  o When e_book_client_finalize() runs, it calls g_bus_unwatch_name() on
    the watcher_id directly... from "whatever thread" the client is
    finalized in.

  o This causes the GWeakRef attached to the watcher_id to be cleared
    and freed "immediately"

  o Consequently, in the already attached stack traces, the 
    book_client_name_vanished_cb() is already preparing to run
    in it's D-Bus thread... theres nothing about g_bus_unwatch_name()
    being called from another thread which protects it from being
    run.

  o book_client_name_vanished_cb() proceeds to access the GWeakRef which
    was "immediately freed" by e_book_client_finalize() in the main thread.


The patch I'm attaching here, is an _attempt_ to address that, actually
it does address that problem but then unmasks further problems related
to the D-Bus threading (which I'll post stack traces for in further comments).
Comment 15 Tristan Van Berkom 2013-03-17 11:07:36 UTC
Now first problem I encountered is a lockup, while the code attached
in the example patch seems very straight forward (no way that that code
will cause a lockup so long as the book client is finalized in the
main thread as expected), there seems to be a lockup anyway.

This lockup shows that for some reason, the EBookClient is *not* finalized
in the main client thread, but rather in a deep D-Bus thread as a
consequence of a D-Bus object property change notification.

Here are the stack traces for all active threads at the time of
the lockup (with the afore mentioned example patch applied):

Thread 1:

  • #0 __GI___poll
    at ../sysdeps/unix/sysv/linux/poll.c line 87
  • #1 g_main_context_poll
    at gmain.c line 3995
  • #2 g_main_context_iterate
    at gmain.c line 3696
  • #3 g_main_context_iterate
    at gmain.c line 3638
  • #4 g_main_loop_run
    at gmain.c line 3895
  • #5 e_async_closure_wait
    at e-data-server-util.c line 1483
  • #6 book_client_initable_init
    at e-book-client.c line 918
  • #7 e_book_client_connect_sync
    at e-book-client.c line 1040
  • #8 test_set_self
    at test-client-self.c line 45
  • #9 test_case_run
    at gtestutils.c line 1714
  • #10 g_test_run_suite_internal
    at gtestutils.c line 1767
  • #11 g_test_run_suite_internal
    at gtestutils.c line 1778
  • #12 g_test_run_suite_internal
    at gtestutils.c line 1778
  • #13 g_test_run_suite
    at gtestutils.c line 1823
  • #14 e_test_server_utils_run
    at e-test-server-utils.c line 468
  • #15 main
    at test-client-self.c line 95
  • #0 __GI___poll
    at ../sysdeps/unix/sysv/linux/poll.c line 87
  • #1 g_main_context_poll
    at gmain.c line 3995
  • #2 g_main_context_iterate
    at gmain.c line 3696
  • #3 g_main_context_iterate
    at gmain.c line 3638
  • #4 g_main_loop_run
    at gmain.c line 3895
  • #5 source_registry_object_manager_thread
    at e-source-registry.c line 868
  • #6 g_thread_proxy
    at gthread.c line 798
  • #7 start_thread
    at pthread_create.c line 308
  • #8 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
  • #9 ??
  • #0 pthread_cond_wait
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S line 162
  • #1 g_cond_wait
    at gthread-posix.c line 750
  • #2 book_client_dispose
    at e-book-client.c line 617
  • #3 g_object_unref
    at gobject.c line 2987
  • #4 closure_invoke_notifiers
    at gclosure.c line 285
  • #5 g_closure_invoke
    at gclosure.c line 783
  • #6 signal_emit_unlocked_R
    at gsignal.c line 3584
  • #7 g_signal_emit_valist
    at gsignal.c line 3328
  • #8 g_signal_emit
    at gsignal.c line 3384
  • #9 g_object_dispatch_properties_changed
    at gobject.c line 1042
  • #10 g_object_notify_by_spec_internal
    at gobject.c line 1136
  • #11 g_object_notify
    at gobject.c line 1178
  • #12 e_dbus_address_book_proxy_g_properties_changed
    at e-dbus-address-book.c line 2894
  • #13 ffi_call_unix64
    from /usr/lib/x86_64-linux-gnu/libffi.so.6
  • #14 ffi_call
    from /usr/lib/x86_64-linux-gnu/libffi.so.6
  • #15 g_cclosure_marshal_generic
    at gclosure.c line 1454
  • #16 g_closure_invoke
    at gclosure.c line 777
  • #17 signal_emit_unlocked_R
    at gsignal.c line 3622
  • #18 g_signal_emit_valist
    at gsignal.c line 3328
  • #19 g_signal_emit
    at gsignal.c line 3384
  • #20 on_properties_changed
    at gdbusproxy.c line 1149
  • #21 emit_signal_instance_in_idle_cb
    at gdbusconnection.c line 3715
  • #22 g_main_dispatch
    at gmain.c line 3054
  • #23 g_main_context_dispatch
    at gmain.c line 3630
  • #24 g_main_context_iterate
    at gmain.c line 3701
  • #25 g_main_context_iterate
    at gmain.c line 3638
  • #26 g_main_loop_run
    at gmain.c line 3895
  • #27 book_client_dbus_thread
    at e-book-client.c line 197
  • #28 g_thread_proxy
    at gthread.c line 798
  • #29 start_thread
    at pthread_create.c line 308
  • #30 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
  • #31 ??
  • #0 __GI___poll
    at ../sysdeps/unix/sysv/linux/poll.c line 87
  • #1 g_main_context_poll
    at gmain.c line 3995
  • #2 g_main_context_iterate
    at gmain.c line 3696
  • #3 g_main_context_iterate
    at gmain.c line 3638
  • #4 g_main_loop_run
    at gmain.c line 3895
  • #5 gdbus_shared_thread_func
    at gdbusprivate.c line 278
  • #6 g_thread_proxy
    at gthread.c line 798
  • #7 start_thread
    at pthread_create.c line 308
  • #8 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
  • #9 ??
  • #0 __GI___poll
    at ../sysdeps/unix/sysv/linux/poll.c line 87
  • #1 g_main_context_poll
    at gmain.c line 3995
  • #2 g_main_context_iterate
    at gmain.c line 3696
  • #3 g_main_context_iterate
    at gmain.c line 3638
  • #4 g_main_context_iteration
    at gmain.c line 3762
  • #5 glib_worker_main
    at gmain.c line 5427
  • #6 g_thread_proxy
    at gthread.c line 798
  • #7 start_thread
    at pthread_create.c line 308
  • #8 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 112
  • #9 ??
  • #0 g_logv
    at gmessages.c line 981
  • #1 g_log
    at gmessages.c line 1010
  • #2 g_object_ref
    at gobject.c line 2884
  • #3 g_weak_ref_get
    at gobject.c line 4067
  • #4 e_book_client_remove_contact_by_uid
    at e-book-client.c line 2376
  • #5 ??
  • #6 ??
    at gmain.c line 671
  • #7 ??
  • #8 ??
  • #9 ??

Comment 16 Tristan Van Berkom 2013-03-17 11:12:09 UTC
Gah, buzilla ate up and mangled my stack trace... view
the raw stack trace for a more comprehensive log.

Also, the rest of the comments which bugzilla happily hides
from the above comment are these:

In most cases the lockup does not occur and everything passes,
but in some other cases we also get a crash (also with the above
patch applied), this crash (which I've seen a couple times, same
crash) occurs deep inside e_book_client_remove_contact_by_uid().

Again related to the GWeakRef:

  • #0 g_logv
    at gmessages.c line 981
  • #1 g_log
    at gmessages.c line 1010
  • #2 g_object_ref
    at gobject.c line 2884
  • #3 g_weak_ref_get
    at gobject.c line 4067
  • #4 e_book_client_remove_contact_by_uid
    at e-book-client.c line 2376
  • #5 ??
  • #6 ??
    at gmain.c line 671
  • #7 ??
  • #8 ??
  • #9 ??

Comment 17 Tristan Van Berkom 2013-03-17 11:21:49 UTC
Here are some thoughts on how we can simplify things and make
it a bit safer (if we still insist on keeping this threading
model instead of simply writing out Async calls, which I can't
stress enough, would be much safer).

Basically, we could simplify things by creating one D-Bus thread
per EBookClient:

  o Creating an EBookClient would create it's own D-Bus thread,
    the async/sync initable code could block until the newly
    created thread finishes creating the underlying GDBusProxy
    and connecting to signals in it's local thread.

  o The EBookClient's thread data pointer would belong to
    the thread and it would include a single GWeakRef to
    the EBookClient.

  o Since the thread data created by the EBookClient is
    created before starting the thread, and the GWeakRef
    also created before starting the thread, the GWeakRef
    becomes safer (no races in the creation of the GWeakRef
    since it's created safely in the client-side main thread).

    The thread-side functions would only ever access EBookClient
    via it's own GWeakRef.

  o When the EBookClient finalizes, it would schedule an idle
    in the D-Bus related thread which would *exit* the D-Bus
    thread... in that case the client would be finalized
    and the GWeakRef invalidated... and the thread would
    asynchronously disconnect it's signals from the GDBusProxy,
    unref the proxy, quit it's internal mainloop and return.
Comment 18 Milan Crha 2013-03-18 11:05:23 UTC
(In reply to comment #17)
> Basically, we could simplify things by creating one D-Bus thread
> per EBookClient:

Please please please, NO!

If you ever happen to read any crash reports, or simply backtraces, it's always a pain to see all unrelated threads in the backtrace. It was done in a similar way with CamelFolderSummary, and it was a mistake, which I fixed (after introduced myself), after some releases when I found the db-summary threads being too awful for seeing them in backtraces, adding just a noise there.

That said, using thread per EClient would be just a poor workaround for a wrong design/usage/.... And, of course, this would be one more good reason why
not do [1], which Matthew reintroduced for Contacts view [2], while the [1] gives good reasons why not do that.

[1] https://git.gnome.org/browse/evolution/commit/?id=662ed02cc773a4fd2d90ed
[2] https://git.gnome.org/browse/evolution/commit/?id=7bd9b878a383628dcebda2
Comment 19 Tristan Van Berkom 2013-03-18 12:46:52 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > Basically, we could simplify things by creating one D-Bus thread
> > per EBookClient:
> 
> Please please please, NO!

I think you know that my opinion is that the threads are altogether
an added complexity in execution that we don't need at all in the
first place (async calls and signal notifications can be made directly
in the caller's GMainContext, no need for threads at all).

But that idea aside, to make things less complex while using threads, either

 a.) One thread per client ... or at least

 b.) One thread data per client all belonging to the single "D-Bus thread"

Either way... the GWeakRef to the client should be created at the time
that the client is created, in the main calling thread context, where
a strong reference to the client exists.

After that, the threaded functions should only ever try to access the
EBookClient via the GWeakRef which was created in the calling context.

At dispose time, a message should be queued into the D-Bus thread context
to displose of the thread-side data for a given client... the GWeakRef
which was created in the user's thread context (where a strong reference
existed) will eventually be destroyed in the D-Bus thread's main context,
perhaps after the underlying GDBusProxy is safely finalized (as I understand
it, the GDBusProxy belongs to the thread data related to a Client... but
belongs to the thread... along with the GWeakRef... those need to be
cleaned up in the D-Bus thread).
Comment 20 Matthew Barnes 2013-05-03 04:14:24 UTC
Closing this as NOTABUG since the description mis-characterizes the behavior as a leak, when in fact all GSources are accounted for at all times and free either after dispatching or when the GMainContext is finalized.
Comment 21 Tristan Van Berkom 2013-05-03 07:12:05 UTC
Woah wait a minute.

Does ./test-client-self have no race condition any more ?

Can we close this with a reference to the commit which causes
the EBookClient finalization to be safe ?
Comment 22 Tristan Van Berkom 2013-05-03 08:46:27 UTC
The good news is that I cannot get test-client-self to crash anymore.

It took me a while, but I bisected it down to this commit:

https://git.gnome.org/browse/evolution-data-server/commit/?id=35645dc4b8f80e0a398ae395c6c21bfc6f70807d

Which probably fixes the crash combined with the commit Matthew mentions in comment 3

It worries me however that I can't properly explain how the above
commit would fix the race at finalize, the one which I see (and still
seems to present a risk in today's code) is as follows:

  o e_book_client_name_vanished_cb is scheduled but does not yet run

  o e_book_client_finalize() calls g_bus_unwatch_name(), which does
    two things:
      a.) Prevents further invocations of e_book_client_name_vanished_cb(),
          but it's too late at this point, cause it's already scheduled
          in the D-Bus thread
      b.) Frees the GWeakRef which the name_vanished_cb() still needs

  o e_book_client_name_vanished_cb() runs in the D-Bus thread and accesses
    the GWeakRef which has recently been freed by the main thread.

I still have a hard time trusting this D-Bus thread, but I suppose it will
take further testing to track down the races.