Bug 726973 – most e-d-s tests fail

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 726973 - most e-d-s tests fail


Summary:	most e-d-s tests fail


Status:	RESOLVED FIXED

Product:	folks
Classification:	Platform
Component:	e-d-s backend
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	Unset
Assigned To:	folks-maint
QA Contact:	folks-maint

URL:
Whiteboard:

Depends on:	729150 729152
Blocks:

Reported:	2014-03-24 15:58 UTC by Simon McVittie
Modified:	2014-06-21 16:39 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
FolksTestDBus: flush and close connection before disposing of it (1.35 KB, patch) 2014-03-24 16:00 UTC, Simon McVittie	committed	Details \| Review
Make warnings non-fatal during "final teardown" (2.08 KB, patch) 2014-03-24 16:13 UTC, Simon McVittie	committed	Details \| Review
tests: avoid using dconf (1.17 KB, patch) 2014-03-24 16:41 UTC, Simon McVittie	committed	Details \| Review
tests: Add an assertion that the main context is empty on tear_down (1.99 KB, patch) 2014-04-04 07:32 UTC, Philip Withnall	none	Details \| Review
tests: Add an assertion that the main context is empty on tear_down (1.99 KB, patch) 2014-04-04 07:33 UTC, Philip Withnall	committed	Details \| Review
tests: Manually iterate main context on tear_down in various tests (5.19 KB, patch) 2014-04-04 07:34 UTC, Philip Withnall	committed	Details \| Review
bluez: Ignore missing sessions when removing sessions (1.61 KB, patch) 2014-04-04 07:34 UTC, Philip Withnall	committed	Details \| Review
tests: Fix enable-disable-stores EDS test (5.38 KB, patch) 2014-04-07 23:18 UTC, Philip Withnall	committed	Details \| Review
gdb script to run until crash (171 bytes, text/plain) 2014-04-21 21:05 UTC, Travis Reitter		Details
(SLIGHTLY BROKEN) GLib patch to resolve the "no handler with id" issue (3.26 KB, patch) 2014-04-25 00:17 UTC, Travis Reitter	none	Details \| Review
eds: Ignore errors from unimplemented EDS methods (1.17 KB, patch) 2014-06-20 19:38 UTC, Philip Withnall	committed	Details \| Review
eds: Bypass test failures on GTestDBus teardown (2.76 KB, patch) 2014-06-20 19:38 UTC, Philip Withnall	committed	Details \| Review

Description Simon McVittie 2014-03-24 15:58:08 UTC

Now that the tests all use (Folks' local fork of) GTestDBus, most of the e-d-s tests fail.

The main failure mode is the segmentation fault whose backtrace can be seen in <https://bugzilla.gnome.org/show_bug.cgi?id=711807#c9>, but there are others.

Comment 1 Simon McVittie 2014-03-24 16:00:28 UTC

Created attachment 272791 [details] [review]
FolksTestDBus: flush and close connection before disposing of  it

g_dbus_connection_dispose() sets connection->worker = NULL, which
means any attempt to send a message will crash with a segfault.
If anything still has a ref to the connection - in Folks, that's
often a backend reffed by the IndividualAggregator - then it'll
tend to assume that it's still usable.

Attempts to send messages are guarded by a check whether the
connection is closed, which we can use to avoid this.

---

This fixes many of the e-d-s tests: I now get 31/34 passing.

Comment 2 Simon McVittie 2014-03-24 16:02:31 UTC

(In reply to comment #1)
> This fixes many of the e-d-s tests: I now get 31/34 passing.

Sorry, that was a lie. I had an additional workaround to get that far; I'll attach that once I've composed a suitable commit message.

Comment 3 Simon McVittie 2014-03-24 16:13:27 UTC

Created attachment 272796 [details] [review]
Make warnings non-fatal during "final teardown"

Historically, the Folks tests have not been expected to cope gracefully
with dropping off the session bus. This frequently results in warnings,
which default to being fatal. For instance, enable-disable-stores
fails with this fatal warning:

folks-WARNING **: Error preparing persona store 'eds:test': Couldn't get view for address book ‘test’: The connection is closed

The long-term solution would be to make sure that disconnection from
the bus is an identifiable error domain/code that gets propagated
through the stack, but that would have to be done individually
per-backend.

---

I don't much like this either, but it seems better than "*shrug* I have no idea whether e-d-s regressed or not" :-)

Comment 4 Simon McVittie 2014-03-24 16:41:53 UTC

Created attachment 272802 [details] [review]
tests: avoid using dconf

tests/eds/perf reliably fails for me with this warning:

dconf-WARNING **: failed to commit changes to dconf: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name ca.desrt.dconf was not provided by any .service files

and in any case we shouldn't be changing the user's real configuration.

---

I'm not sure whether this really helps this bug or not, but it seems like a good thing to do.

Comment 5 Simon McVittie 2014-03-24 16:42:26 UTC

Again, those patches *do not* close this bug, but they might be useful as part of a full solution.

Comment 6 Travis Reitter 2014-03-27 22:30:17 UTC

Review of attachment 272791 [details] [review]:

Looks good

Comment 7 Travis Reitter 2014-03-27 22:32:32 UTC

Review of attachment 272796 [details] [review]:

Ideally, I'd rather fix the real problems but given the amount of active maintainership on the project, this is probably the best option for now.

Comment 8 Travis Reitter 2014-03-27 22:33:16 UTC

Review of attachment 272802 [details] [review]:

Looks good

Comment 9 Travis Reitter 2014-03-31 23:35:41 UTC

I pushed these to master but I'll keep this bug open since there are still some failures.


commit 71f6f79f7a9e95078d2e5a0ad2a48652197339a9
Author: Simon McVittie <simon.mcvittie@collabora.co.uk>
Date:   Mon Mar 24 14:16:11 2014 +0000

    tests: avoid using dconf
    
    tests/eds/perf reliably fails for me with this warning:
    
    dconf-WARNING **: failed to commit changes to dconf: GDBus.Error:org.freedes
    
    and in any case we shouldn't be changing the user's real configuration.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 tests/lib/test-case.vala | 3 +++
 1 file changed, 3 insertions(+)

commit b16bd87240424632d7881172e87b2571d3df57be
Author: Simon McVittie <simon.mcvittie@collabora.co.uk>
Date:   Mon Mar 24 16:12:23 2014 +0000

    Make warnings non-fatal during "final teardown"
    
    Historically, the Folks tests have not been expected to cope gracefully
    with dropping off the session bus. This frequently results in warnings,
    which default to being fatal. For instance, enable-disable-stores
    fails with this fatal warning:
    
    folks-WARNING **: Error preparing persona store 'eds:test': Couldn't get vie
    
    The long-term solution would be to make sure that disconnection from
    the bus is an identifiable error domain/code that gets propagated
    through the stack, but that would have to be done individually
    per-backend.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 tests/lib/test-case.vala | 10 ++++++++++
 1 file changed, 10 insertions(+)

commit ad92c2a3a6408db165df59a0cc183643ff3f33a3
Author: Simon McVittie <simon.mcvittie@collabora.co.uk>
Date:   Mon Mar 24 12:37:53 2014 +0000

    FolksTestDBus: flush and close connection before disposing of it
    
    g_dbus_connection_dispose() sets connection->worker = NULL, which
    means any attempt to send a message will crash with a segfault.
    If anything still has a ref to the connection - in Folks, that's
    often a backend reffed by the IndividualAggregator - then it'll
    tend to assume that it's still usable.
    
    Attempts to send messages are guarded by a check whether the
    connection is closed, which we can use to avoid this.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 tests/lib/gtestdbus.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comment 10 Philip Withnall 2014-04-04 07:32:11 UTC

Created attachment 273570 [details] [review]
tests: Add an assertion that the main context is empty on tear_down

Leaving the global default main context with pending events is a common
cause of cross-test interference which can cause spurious test failures.
Try and ensure there are no pending events at the end of each test’s
tear_down function.

Comment 11 Philip Withnall 2014-04-04 07:33:56 UTC

Created attachment 273571 [details] [review]
tests: Add an assertion that the main context is empty on tear_down

Leaving the global default main context with pending events is a common
cause of cross-test interference which can cause spurious test failures.
Try and ensure there are no pending events at the end of each test’s
tear_down function.

Comment 12 Philip Withnall 2014-04-04 07:34:00 UTC

Created attachment 273572 [details] [review]
tests: Manually iterate main context on tear_down in various tests

The code under test in these cases doesn’t (yet) correctly ensure there
are no pending events before returning. To avoid assertion failure,
manually iterate the global default main context until all pending
events have been handled, then continue to tear_down.

In future, it would be better to fix the tests and backends themselves
to not leak main context events after unprepare() has been called.

Comment 13 Philip Withnall 2014-04-04 07:34:05 UTC

Created attachment 273573 [details] [review]
bluez: Ignore missing sessions when removing sessions

This occasionally happens when running the unit tests. The check only
applies to unit tests, so will have no effect in production use.

Comment 14 Philip Withnall 2014-04-04 07:35:13 UTC

These three patches don’t completely fix the EDS tests, but do seem to reduce the frequency of failures, and are definitely The Right Thing to do.

Comment 15 Travis Reitter 2014-04-05 19:30:40 UTC

Review of attachment 273571 [details] [review]:

Looks good

Comment 16 Travis Reitter 2014-04-05 19:32:06 UTC

Review of attachment 273572 [details] [review]:

This needs the same change for the Dummy backend's test lib.

I'd also prefer you file a bug for the remaining issue and update the FIXMEs to reference it.

Comment 17 Travis Reitter 2014-04-05 19:32:26 UTC

Review of attachment 273573 [details] [review]:

Looks good.

Comment 18 Philip Withnall 2014-04-06 09:02:42 UTC

Pushed with the addition of a reference to bug #727700. The lack of main context iteration in the dummy backend test library is deliberate, since I haven't been able to cause a test failure in the dummy backend due to leaking main context events.

Attachment 273571 [details] pushed as 5c0c1c6 - tests: Add an assertion that the main context is empty on tear_down
Attachment 273572 [details] pushed as b2bdc30 - tests: Manually iterate main context on tear_down in various tests
Attachment 273573 [details] pushed as 89ba19d - bluez: Ignore missing sessions when removing sessions

Comment 19 Travis Reitter 2014-04-07 16:41:13 UTC

I pushed the same changes to the Dummy backend that I suggested above (since the Dummy tests otherwise were failing for me).

Split as two tests on account of forgetting to refresh the FIXME to point out the new bug:


commit 12b7985e8f28e66420ec522cb5fcb621e8e07a78
Author: Travis Reitter <travis.reitter@collabora.co.uk>
Date:   Mon Apr 7 09:38:36 2014 -0700

    tests: Manually iterate main context on tear_down in Dummy tests
    
    The code under test in these cases doesn’t (yet) correctly ensure there
    are no pending events before returning. To avoid assertion failure,
    manually iterate the global default main context until all pending
    events have been handled, then continue to tear_down.
    
    In future, it would be better to fix the tests and backends themselves
    to not leak main context events after unprepare() has been called.
    See https://bugzilla.gnome.org/show_bug.cgi?id=727700.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 tests/lib/dummy/test-case.vala | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

commit 45e1af8c28747154278ee5058a3b144899050abb
Author: Travis Reitter <travis.reitter@collabora.co.uk>
Date:   Mon Apr 7 09:12:02 2014 -0700

    tests: Manually iterate main context on tear_down in Dummy tests
    
    The code under test in these cases doesn’t (yet) correctly ensure there
    are no pending events before returning. To avoid assertion failure,
    manually iterate the global default main context until all pending
    events have been handled, then continue to tear_down.
    
    In future, it would be better to fix the tests and backends themselves
    to not leak main context events after unprepare() has been called.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 tests/lib/dummy/test-case.vala | 8 ++++++++
 1 file changed, 8 insertions(+)

Comment 20 Philip Withnall 2014-04-07 23:18:18 UTC

Created attachment 273754 [details] [review]
tests: Fix enable-disable-stores EDS test

The test was occasionally spuriously failing because the persona stores
were being prepare()d after being re-enabled, and these prepare() calls
were being suspended part-way through when the test finished and killed
its main loop. They were being resumed when the main loop was re-started
during tear down — it’s re-started to wait for the singleton D-Bus
connection to be destroyed, so the prepare() calls ended up consistently
having their D-Bus connection drop out from beneath them, resulting in
warnings. These warnings would normally be useful to have, but not in
the middle of a unit test.

Fix this by rearchitecting the test slightly to force each persona store
to be prepared after being re-enabled. Also simplify how this is
tracked.

Comment 21 Philip Withnall 2014-04-07 23:21:41 UTC

This patch fixes the only failure I found in our code (and it was a problem with the unit test, not with folks itself, although folks should provide a much better unprepare() API; that’s a separate bug).

This makes `make check -C tests/eds` pass in its entirety about 80% of the time for me. The remaining failures are characterised by:

> (/opt/gnome3/source/folks/tests/eds/.libs/lt-set-is-favourite:4524): folks-DEBUG: individual-aggregator.vala:521: Destroying IndividualAggregator 0x1190980
> (/opt/gnome3/source/folks/tests/eds/.libs/lt-set-is-favourite:4524): eds-DEBUG: eds-backend.vala:350: Removing address book 'test'.
> (/opt/gnome3/source/folks/tests/eds/.libs/lt-set-is-favourite:4524): folks-DEBUG: backend.vala:45: Destroying Backend ‘telepathy’ (0x1197cc0)
> …
> (/opt/gnome3/source/folks/tests/eds/.libs/lt-set-is-favourite:4524): GLib-GObject-WARNING **: gsignal.c:2580: instance '0x7f5b1400a0a0' has no handler with id '112'

I can’t work this one out at all. As far as I can tell, the GLib code for that signal is solid, and none of this is happening during a very late run of the main loop. I guess tired eyes aren’t helping. I’ll take a better look at this another time.

Comment 22 Travis Reitter 2014-04-08 00:44:21 UTC

Review of attachment 273754 [details] [review]:

Looks good.

I even had all EDS tests pass 3/5 tries. When I did get failures, they looked like this:

(/home/treitter/checkout/gnome/folks/tests/eds/.libs/lt-store-removed:27913): libebook-WARNING **: book_client_view_dispose_cb: The connection is closed
g_dbus_connection_real_closed: Remote peer vanished with error: Error receiving message: Connection reset by peer (g-io-error-quark, 0). Exiting.
g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
cleaning up pid 27928

Comment 23 Travis Reitter 2014-04-08 00:48:07 UTC

(In reply to comment #21)

> > (/opt/gnome3/source/folks/tests/eds/.libs/lt-set-is-favourite:4524): GLib-GObject-WARNING **: gsignal.c:2580: instance '0x7f5b1400a0a0' has no handler with id '112'

I also got this a separate time (with avatar-details).

Comment 24 Philip Withnall 2014-04-08 06:52:40 UTC

Comment on attachment 273754 [details] [review]
tests: Fix enable-disable-stores EDS test

Pushed.

Attachment 273754 [details] pushed as 7b41098 - tests: Fix enable-disable-stores EDS test

Comment 25 Travis Reitter 2014-04-21 21:05:28 UTC

Created attachment 274837 [details]
gdb script to run until crash

Here is a hacky GDB script which will run a test until it crashes, letting us grab the stack trace.

To reproduce the bug we've hit above, just pick an EDS test and run them like:

make -C tests/eds add-persona.gdb
(gdb) source run-until-crash.gdb
run

One major caveat: you will need to either run this with G_DEBUG=fatal-warnings (which currently causes a break much earlier than we'd like on some seemingly unrelated issue) OR you can change the g_warning() in gsignal.c:g_signal_handler_disconnect() to a g_error() and build and run against that version of GLib.

Eventually, gdb will break on the fatal runtime error and you can get the backtrace

Comment 26 Travis Reitter 2014-04-21 21:12:39 UTC

Using the script and technique above, I got two similar, seemingly-meaningful backtraces:

+ Trace 233505

#0 g_signal_handler_disconnect
at gsignal.c line 2572
#1 book_client_dispose
from /opt/gnome/lib64/libebook-1.2.so.14
#2 g_object_unref
at gobject.c line 3075
#3 edsf_persona_store_finalize
from /home/treitter/checkout/gnome/folks/backends/eds/lib/.libs/libfolks-eds.so.25
#4 g_object_unref
at gobject.c line 3112
#5 gee_hash_map_real_clear
from /opt/gnome/lib64/libgee-0.8.so.2
#6 gee_abstract_map_clear
from /opt/gnome/lib64/libgee-0.8.so.2
#7 gee_hash_map_finalize
from /opt/gnome/lib64/libgee-0.8.so.2
#8 g_object_unref
at gobject.c line 3112
#9 folks_individual_aggregator_finalize
from /home/treitter/checkout/gnome/folks/folks/.libs/libfolks.so.25
#10 g_object_unref
at gobject.c line 3112
#11 updating_contacts_tests_finalize
#12 g_object_unref
at gobject.c line 3112
#13 _vala_main
#14 main
#0 _g_log_abort
at gmessages.c line 308
#1 g_logv
at gmessages.c line 1033
#2 g_log
at gmessages.c line 1071
#3 g_signal_handler_disconnect
at gsignal.c line 2581
#4 client_unref
at gdbusnamewatching.c line 100
#5 g_bus_unwatch_name
at gdbusnamewatching.c line 849
#6 book_client_finalize
from /opt/gnome/lib64/libebook-1.2.so.14
#7 g_object_unref
at gobject.c line 3112
#8 edsf_persona_store_finalize
from /home/treitter/checkout/gnome/folks/backends/eds/lib/.libs/libfolks-eds.so.25
#9 g_object_unref
at gobject.c line 3112
#10 add_persona_tests_finalize
#11 g_object_unref
at gobject.c line 3112
#12 _vala_main
#13 main


Both point to an excessive signal handler disconnect related to the D-Bus proxy which makes me think we're doing something wrong in our special instance of the D-Bus setup for our EDS tests. I haven't had time to check that out, so, until I say otherwise, could you please check that area out, Philip?

Comment 27 Travis Reitter 2014-04-22 21:43:22 UTC

(In reply to comment #23)
> (In reply to comment #21)
> 
> > > (/opt/gnome3/source/folks/tests/eds/.libs/lt-set-is-favourite:4524): GLib-GObject-WARNING **: gsignal.c:2580: instance '0x7f5b1400a0a0' has no handler with id '112'
> 
> I also got this a separate time (with avatar-details).

Some extra digging indicates this error happens just after handlers with IDs n+1 and n+2 (eg, 113 and 114) get disconnected in book_client_dispose().

Not sure how important this is but could be worth digging into (given that, if there aren't any other hints, I think we're basically going have to study D-Bus and/or GTestDBus in great detail to figure out what's going wrong).

Comment 28 Travis Reitter 2014-04-24 04:16:54 UTC

So far, the EBookClient signal that corresponds to the failing ID in each case seems to be "closed"

Comment 29 Travis Reitter 2014-04-25 00:17:59 UTC

Created attachment 275090 [details] [review]
(SLIGHTLY BROKEN) GLib patch to resolve the "no handler with id" issue

With some strategic debugging, I've worked out that this excessive GDBusConnection::closed signal handler disconnection that's happening in gio/gdbusnamewatching.c:client_unref() is due to the connection being disposed before client->disconnected_signal_handler_id gets cleared. Thus, when the EBookClient is finalized in addressbook/libebook/e-book-client.c:book_client_finalize() and unwatches its (D-Bus) addressbook proxy, there's an excessive call to g_signal_handler_disconnect() for a handler that's already been disconnected (during dispose).

This patch tries to highlight a potential route for a fix though I clearly get something wrong because it has no change on the tests (I'm not familiar with GWeakRefs) but should highlight approximately what would fix this issue.

And it's possible this isn't the best fix but it is bogus that GLib fails to track the disposal of the connection (then again, you could argue that it's a lot of busywork and excessive disconnections should maybe be handled silently...).

Philip, based on these details, do you have an idea for a solution here?

Comment 30 Simon McVittie 2014-04-25 10:49:13 UTC

Number 73 in an ongoing series of common GLib traps: order of operations during teardown. It's worth looking at gobject.c to see how things actually work here.

In my current git checkout, assuming MyDerivedObject is-a MyBaseObject is-a GObject, you get:

* begin unref with refcount == 1 - it's going to die unless a callback
  or another thread manages to resurrect it in time
* clear thread-safe weak refs (GWeakRef)
* MyDerivedObject dispose() vfunc
* MyBaseObject dispose() vfunc
* GObject dispose() vfunc
  - disconnect all its signals
  - destroy closures with it as object
  - notify non-thread-safe weak refs (g_object_weak_ref(),
    g_object_add_weak_pointer())
* decide that we are probably finalizing it
  - destroy closures with it as object, if any were added during dispose
  - disconnect all its signals, if any were added during dispose
  - notify non-thread-safe weak refs, if any were added during dispose
* decide that we are definitely finalizing it (point of no return)
* MyDerivedObject finalize() vfunc
* MyBaseObject finalize() vfunc
* GObject finalize() vfunc
  - run the destroy-notifiers for its qdata
* actually free it

If you use g_object_run_dispose(), then you just get:

* MyDerivedObject dispose() vfunc
* MyBaseObject dispose() vfunc
* GObject dispose() vfunc
  - disconnect all its signals
  - destroy closures that have it as their object
  - notify non-thread-safe weak refs (g_object_weak_ref(),
    g_object_add_weak_pointer())

and GWeakRef-style weak refs are *not* cleared. That's probably a bug - g_object_run_dispose() should probably trash those weak refs before running the dispose vfunc. Probably my fault, I wrote the initial GWeakRef implementation. Cc'ing desrt since he also contributed heavily to it.

(In reply to comment #29)
> And it's possible this isn't the best fix but it is bogus that GLib fails to
> track the disposal of the connection (then again, you could argue that it's a
> lot of busywork and excessive disconnections should maybe be handled
> silently...).

The problem with that is that if an object is resurrected by being reffed in a dispose callback, it becomes valid to connect to its signals again, and old signal-connection IDs (small integers) could be reused, leading to you disconnecting someone else's signal when you thought you were disconnecting your own signal.

You could maybe use the non-thread-safe weak refs as an alternative - zero out the signal ID when you are notified?

Comment 31 Simon McVittie 2014-04-25 10:52:57 UTC

(In reply to comment #30)
> If you use g_object_run_dispose(), then you just get:
> 
> * MyDerivedObject dispose() vfunc
> * MyBaseObject dispose() vfunc
> * GObject dispose() vfunc
>   - disconnect all its signals
>   - destroy closures that have it as their object
>   - notify non-thread-safe weak refs (g_object_weak_ref(),
>     g_object_add_weak_pointer())
> 
> and GWeakRef-style weak refs are *not* cleared. That's probably a bug -
> g_object_run_dispose() should probably trash those weak refs
> before running the dispose vfunc.

Indeed, looking at its docs, the combination of GWeakRef and g_object_run_dispose() has never behaved as documented. GWeakRef says:

>  ... Before the
>  * object's #GObjectClass.dispose method is called, every #GWeakRef
>  * associated with becomes empty (i.e. points to %NULL).

but that isn't true when g_object_run_dispose() is used.

Then again, g_object_run_dispose() says

> * This functions should only be called from object system implementations.

so arguably GTestDBus shouldn't be using it. *shrug*

Comment 32 Travis Reitter 2014-04-25 19:35:55 UTC

More details:

In the passing case, gdbusnamewatching correctly announces the vanishing of the service from the bus and EBookClient correctly handles that (by not trying to unwatch later, which cases the eventual warning -> error).

In the failing case, gdbusnamewatching never announces the vanishing service.

So, maybe our copy of GTestDBus (or the way we're using it) somehow kills the service so abruptly that gdbusnamewatching doesn't catch it (in time)?

Comment 33 Travis Reitter 2014-04-28 22:19:23 UTC

Comment on attachment 275090 [details] [review]
(SLIGHTLY BROKEN) GLib patch to resolve the "no handler with id" issue

Replaced by patch on https://bugzilla.gnome.org/show_bug.cgi?id=729152

Comment 34 Travis Reitter 2014-04-28 22:28:48 UTC

(In reply to comment #31)
> (In reply to comment #30)
> > If you use g_object_run_dispose(), then you just get:
> > 
> > * MyDerivedObject dispose() vfunc
> > * MyBaseObject dispose() vfunc
> > * GObject dispose() vfunc
> >   - disconnect all its signals
> >   - destroy closures that have it as their object
> >   - notify non-thread-safe weak refs (g_object_weak_ref(),
> >     g_object_add_weak_pointer())
> > 
> > and GWeakRef-style weak refs are *not* cleared. That's probably a bug -
> > g_object_run_dispose() should probably trash those weak refs
> > before running the dispose vfunc.
> 
> Indeed, looking at its docs, the combination of GWeakRef and
> g_object_run_dispose() has never behaved as documented. GWeakRef says:
> 
> >  ... Before the
> >  * object's #GObjectClass.dispose method is called, every #GWeakRef
> >  * associated with becomes empty (i.e. points to %NULL).
> 
> but that isn't true when g_object_run_dispose() is used.

You called it. I created a patch that makes it clear out the weak refs (see bug #729152) and that, combined with the patch to bug #729152, the full set of Folks EDS tests pass 88 to 93% of the time; individual test cases pass 99.8% of the time in my observations.

There are some very rare failures which seem to be specific to either Folks or the specific test cases themselves. I think we can reasonably defer those to their own bugs and close this (once we get the GLib patches merged).

> Then again, g_object_run_dispose() says
> 
> > * This functions should only be called from object system implementations.
> 
> so arguably GTestDBus shouldn't be using it. *shrug*

Ideally, yeah. I'm actually surprised g_object_run_dispose() isn't deprecated. But, thanks to some debug output, it looks like the place where we use it in our copy of GTestDBus, the object in question has a refcount of 54. So relying on g_object_unref() looks like it might not be feasible (and it's not totally unreasonable to be a little forceful in teardown for tests).

Comment 35 Travis Reitter 2014-04-28 22:30:23 UTC

(In reply to comment #34)

> There are some very rare failures which seem to be specific to either Folks or
> the specific test cases themselves. I think we can reasonably defer those to
> their own bugs and close this (once we get the GLib patches merged).

Actually, they seem to mostly end like this:

> g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.

That still might be fair to defer to its own bug but I'll give it a little time while waiting on feedback for the GLib bugs.

Comment 36 Philip Withnall 2014-06-20 18:12:42 UTC

With the patch in bug #731981, all the EDS tests pass consistently for me with ~stock GLib 2.40 (no patches which touch GTestDBus or GDBusConnection).

Comment 37 Philip Withnall 2014-06-20 19:38:19 UTC

Created attachment 278863 [details] [review]
eds: Ignore errors from unimplemented EDS methods

We can assume the methods are deliberately not implemented, so we
shouldn’t warn about them.

This complements the patch in
https://bugzilla.gnome.org/show_bug.cgi?id=731981.

Comment 38 Philip Withnall 2014-06-20 19:38:24 UTC

Created attachment 278864 [details] [review]
eds: Bypass test failures on GTestDBus teardown

GTestDBus can sometimes fail on teardown, causing our EDS tests to
spuriously fail, and this has become a real thorn in our side. While we
wait for the proper solution to the relevant GLib bugs to be hashed out,
let’s simply exit early from the tests to ignore failures in teardown.

This will not affect (non-teardown) test failures from true positives,
as it only happens on final teardown.

Comment 39 Philip Withnall 2014-06-20 19:39:27 UTC

(In reply to comment #36)
> With the patch in bug #731981, all the EDS tests pass consistently for me with
> ~stock GLib 2.40 (no patches which touch GTestDBus or GDBusConnection).

I lied. You also need these two patches, which I think are reasonable for now. The exit() patch can be reverted once the GTestDBus situation is sorted out (and any necessary changes have been copied from upstream GTestDBus to our copy).

Comment 40 Travis Reitter 2014-06-21 01:47:03 UTC

Review of attachment 278863 [details] [review]:

Looks good!

Comment 41 Travis Reitter 2014-06-21 01:47:08 UTC

Review of attachment 278864 [details] [review]:

Looks good!

Comment 42 Travis Reitter 2014-06-21 01:49:56 UTC

The EDS tests pass for me with these patches now as well! Great way to start a
weekend :)

I think we're ready for a release now (whichever one of us gets to it first).
It may require temporarily disabling the Tracker backend (all those tests fail
for me now due possibly to some version mis-matching; I did an uninstall and
fresh install of 0.16.4 with no luck) but I don't see any problem with that.
Literally no one uses it. I don't think any (major?) distros build it.

Comment 43 Philip Withnall 2014-06-21 16:39:07 UTC

Pushed. Let’s mark this as fixed. We can catch up with GTestDBus changes whenever they happen, and then revert commit 0d894ff621680587a3e16d636ca3b517cfe7d1ab.


commit 0d894ff621680587a3e16d636ca3b517cfe7d1ab
Author: Philip Withnall <philip.withnall@collabora.co.uk>
Date:   Fri Jun 20 20:35:28 2014 +0100

    eds: Bypass test failures on GTestDBus teardown
    
    GTestDBus can sometimes fail on teardown, causing our EDS tests to
    spuriously fail, and this has become a real thorn in our side. While we
    wait for the proper solution to the relevant GLib bugs to be hashed out,
    let’s simply exit early from the tests to ignore failures in teardown.
    
    This will not affect (non-teardown) test failures from true positives,
    as it only happens on final teardown.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 tests/lib/Makefile.am    |  1 +
 tests/lib/test-case.vala | 25 +++++++++++++++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

commit ae1f281855d2f47102a61b56188f841807571537
Author: Philip Withnall <philip.withnall@collabora.co.uk>
Date:   Fri Jun 20 20:34:34 2014 +0100

    eds: Ignore errors from unimplemented EDS methods
    
    We can assume the methods are deliberately not implemented, so we
    shouldn’t warn about them.
    
    This complements the patch in
    https://bugzilla.gnome.org/show_bug.cgi?id=731981.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=726973

 backends/eds/lib/edsf-persona-store.vala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)