After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 721921 - GDBusConnection API presents impossible situations when exit-on-close is FALSE
GDBusConnection API presents impossible situations when exit-on-close is FALSE
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gdbus
2.39.x
Other Linux
: Normal normal
: ---
Assigned To: David Zeuthen (not reading bugmail)
gtkdev
Depends on:
Blocks:
 
 
Reported: 2014-01-10 10:23 UTC by Iain Lane
Modified: 2018-05-24 16:12 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Iain Lane 2014-01-10 10:23:41 UTC
We started seeing our runs of glib's installed tests failing in Ubuntu lately, e.g. with:

  https://jenkins.qa.ubuntu.com/view/Trusty/view/AutoPkgTest/job/trusty-adt-glib2.0/ARCH=amd64,label=adt/45/console

 Running test: /usr/share/installed-tests/glib/gdbus-names.test
 /gdbus/validate-names: OK
 /gdbus/bus-own-name: OK
 /gdbus/bus-watch-name: 
 (/usr/lib/glib2.0/installed-tests/glib/gdbus-names:12323): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
 Test glib/gdbus-names.test failed: Child process killed by signal 5

And more recently package builds started failing with a failure in the same test:

  PASS: gdbus-names 1 /gdbus/validate-names
  PASS: gdbus-names 2 /gdbus/bus-own-name
  ERROR: gdbus-names - missing test plan
  ERROR: gdbus-names - exited with status 133 (terminated by signal 5?)

(I can only assume it's the same problem -- can I get more helpful errors somehow?)

Ryan tried it out and said he could reproduce the failure occasionally too. I can get it if I run the test in parallel with itself, like so (this is `parallel' from `moreutils', not GNU parallel):

  # parallel -j4 sh -c "gnome-desktop-testing-runner glib/gdbus-names.test" -- 1 2 3 4
  Running test: /usr/share/installed-tests/glib/gdbus-names.test
  Running test: /usr/share/installed-tests/glib/gdbus-names.test
  Running test: /usr/share/installed-tests/glib/gdbus-names.test
  Running test: /usr/share/installed-tests/glib/gdbus-names.test
  /gdbus/validate-names: OK
  /gdbus/bus-own-name: /gdbus/validate-names: OK
  /gdbus/bus-own-name: /gdbus/validate-names: OK
  /gdbus/bus-own-name: /gdbus/validate-names: OK
  /gdbus/bus-own-name: OK
  /gdbus/bus-watch-name: OK
  /gdbus/bus-watch-name: OK
  /gdbus/bus-watch-name: 
  (/usr/lib/glib2.0/installed-tests/glib/gdbus-names:8978): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
  OK
  /gdbus/bus-watch-name: OK
  PASS: glib/gdbus-names.test
  SUMMARY: total: 1 passed: 1 skipped: 0 failed: 0
  OK
  PASS: glib/gdbus-names.test
  SUMMARY: total: 1 passed: 1 skipped: 0 failed: 0
  OK
  PASS: glib/gdbus-names.test
  SUMMARY: total: 1 passed: 1 skipped: 0 failed: 0
  Test glib/gdbus-names.test failed: Child process killed by signal 5
  SUMMARY: total: 1 passed: 0 skipped: 0 failed: 1
Comment 1 Simon McVittie 2014-02-11 15:34:02 UTC
It's possible to get something similar in bus-own-name, afaics:

ERROR: gdbus-names
==================

# random seed: R02S3c13026d806d9e6265f2c182c7bae9f1
# Start of gdbus tests
ok 1 /gdbus/validate-names
PASS: gdbus-names 1 /gdbus/validate-names

(/home/smcv/build/glib/opt/gio/tests/.libs/lt-gdbus-names:1116): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
ok 2 /gdbus/bus-own-name
PASS: gdbus-names 2 /gdbus/bus-own-name
Trace/breakpoint trap (core dumped)
# GLib-GIO-FATAL-CRITICAL: Error while sending AddMatch() message: The connection is closed
cleaning up pid 1146
ERROR: gdbus-names - missing test plan
ERROR: gdbus-names - exited with status 133 (terminated by signal 5?)
Comment 2 Simon McVittie 2014-02-11 15:35:31 UTC
(In reply to comment #0)
> And more recently package builds started failing with a failure in the same
> test

This might be triggered by running the tests in parallel (make -j$N check, where $N is the number of cores), which is a recent change in Debian and might have been merged into Ubuntu.
Comment 3 Iain Lane 2014-02-11 15:43:58 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > And more recently package builds started failing with a failure in the same
> > test
> 
> This might be triggered by running the tests in parallel (make -j$N check,
> where $N is the number of cores), which is a recent change in Debian and might
> have been merged into Ubuntu.

I tried with -j1 and it still happened—IIRC that's why I wrote "wrt. other tests" in the bug title here. Also, the installed tests I referred to are simple runs of `gnome-desktop-testing-runner glib' which run sequentially unless I'm mistaken.
Comment 4 Dan Winship 2014-02-11 15:55:22 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > And more recently package builds started failing with a failure in the same
> > test
> 
> This might be triggered by running the tests in parallel (make -j$N check,
> where $N is the number of cores), which is a recent change in Debian and might
> have been merged into Ubuntu.

The gio tests definitely don't pass reliably under "make -j check", because the various dbus tests interfere with each other
Comment 5 Colin Walters 2014-02-11 16:05:10 UTC
(In reply to comment #3)
> `gnome-desktop-testing-runner glib' which run sequentially
> unless I'm mistaken.

The Continuous builder runs:

gnome-desktop-testing-runner -p 0

which auto-parallelizes to the number of cores.
Comment 6 Emilio Pozuelo Monfort 2014-02-16 10:16:36 UTC
This happens with -j1 as well.
Comment 7 Allison Karlitskaya (desrt) 2014-02-17 22:43:31 UTC
So this is some kind of an API failure on GDBusConnection.

If _set_exit_on_close(FALSE) was called then the connection can transition to the closed state more or less automatically, at any time (since it happens in the worker thread).

Meanwhile, we have APIs, such as g_dbus_connection_signal_subscribe() that throw criticals if you call them on a closed connection.

We either need to go around and collect the 'invalid on closed' cases and turn them into 'defined as no-op on closed', or figure out something else.
Comment 8 Allison Karlitskaya (desrt) 2014-02-17 23:08:57 UTC
So it turns out that there are two bugs here.  The most immediate cause of this bug actually appears to be bug 711807 (which I will soon reopen).

Long story short, the patch introduced in bug 711807 causes g_test_dbus_down() to exit immediately.  This results in the /gdbus/bus-watch-name testcase sharing the already-used-and-closed bus of the /gdbus/bus-own-name case if it's fast enough.
Comment 9 Emilio Pozuelo Monfort 2014-02-18 09:42:58 UTC
FTR, gapplication also fails with connection closed:

ERROR: gapplication
===================

Failed to register: The connection is closed
**
GLib-GIO:ERROR:/tmp/buildd/glib2.0-2.39.90/./gio/tests/gapplication.c:564:test_quit: assertion failed: (activated)
Aborted (core dumped)
Comment 10 Simon McVittie 2014-02-18 12:24:08 UTC
(In reply to comment #7)
> If _set_exit_on_close(FALSE) was called then the connection can transition to
> the closed state more or less automatically, at any time (since it happens in
> the worker thread).

With bad enough timing, can it do this? (pseudocode)

if (connection.is_still_open())                           # returns TRUE
  {
    [at this point the connection is closed by the other thread]
    connection.do_thing_that_is_invalid_while_closed()    # critical warning
  }
Comment 11 Allison Karlitskaya (desrt) 2014-02-18 13:15:49 UTC
Yes.  That's entirely the point.  Unless we give people the ability to hold the connection lock from outside, then it's possible that even if they're "careful", some other operation will intercede from another thread.

I do not want to give people the ability to hold the connection lock from outside.

I therefore think that the only reasonable thing to do is to turn these situations into defined no-ops.
Comment 12 David Zeuthen (not reading bugmail) 2014-02-18 18:42:41 UTC
(Reassigning to the proper component as I only see bugs filed under 'gdbus' and this was filed on 'gio'.)
Comment 13 Allison Karlitskaya (desrt) 2014-02-18 18:43:53 UTC
I talked to David about this just now and he proposed what I believe to be a workable solution:

 - we add a distinction between explicit close() via API and a close done in
  response to external connection loss

 - we keep the criticals for cases where the user called close()

 - we tell people that they should not close() on a shared singleton bus (which
   has already been a 'rule' for a while now anyway)

 - if you close() your own private connection and then call some API on it,
   it's definitely your own fault and you deserve to see criticals
Comment 14 Simon McVittie 2018-04-03 10:27:30 UTC
(In reply to Emilio Pozuelo Monfort from comment #9)
> FTR, gapplication also fails with connection closed:
> 
> ERROR: gapplication
> ===================
> 
> Failed to register: The connection is closed
> **
> GLib-GIO:ERROR:/tmp/buildd/glib2.0-2.39.90/./gio/tests/gapplication.c:564:
> test_quit: assertion failed: (activated)
> Aborted (core dumped)

This was later filed as Bug #768996. It might have the same root cause, I'm not sure.
Comment 15 GNOME Infrastructure Team 2018-05-24 16:12:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/812.