GNOME Bugzilla – Bug 721921
GDBusConnection API presents impossible situations when exit-on-close is FALSE
Last modified: 2018-05-24 16:12:05 UTC
We started seeing our runs of glib's installed tests failing in Ubuntu lately, e.g. with: https://jenkins.qa.ubuntu.com/view/Trusty/view/AutoPkgTest/job/trusty-adt-glib2.0/ARCH=amd64,label=adt/45/console Running test: /usr/share/installed-tests/glib/gdbus-names.test /gdbus/validate-names: OK /gdbus/bus-own-name: OK /gdbus/bus-watch-name: (/usr/lib/glib2.0/installed-tests/glib/gdbus-names:12323): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed Test glib/gdbus-names.test failed: Child process killed by signal 5 And more recently package builds started failing with a failure in the same test: PASS: gdbus-names 1 /gdbus/validate-names PASS: gdbus-names 2 /gdbus/bus-own-name ERROR: gdbus-names - missing test plan ERROR: gdbus-names - exited with status 133 (terminated by signal 5?) (I can only assume it's the same problem -- can I get more helpful errors somehow?) Ryan tried it out and said he could reproduce the failure occasionally too. I can get it if I run the test in parallel with itself, like so (this is `parallel' from `moreutils', not GNU parallel): # parallel -j4 sh -c "gnome-desktop-testing-runner glib/gdbus-names.test" -- 1 2 3 4 Running test: /usr/share/installed-tests/glib/gdbus-names.test Running test: /usr/share/installed-tests/glib/gdbus-names.test Running test: /usr/share/installed-tests/glib/gdbus-names.test Running test: /usr/share/installed-tests/glib/gdbus-names.test /gdbus/validate-names: OK /gdbus/bus-own-name: /gdbus/validate-names: OK /gdbus/bus-own-name: /gdbus/validate-names: OK /gdbus/bus-own-name: /gdbus/validate-names: OK /gdbus/bus-own-name: OK /gdbus/bus-watch-name: OK /gdbus/bus-watch-name: OK /gdbus/bus-watch-name: (/usr/lib/glib2.0/installed-tests/glib/gdbus-names:8978): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed OK /gdbus/bus-watch-name: OK PASS: glib/gdbus-names.test SUMMARY: total: 1 passed: 1 skipped: 0 failed: 0 OK PASS: glib/gdbus-names.test SUMMARY: total: 1 passed: 1 skipped: 0 failed: 0 OK PASS: glib/gdbus-names.test SUMMARY: total: 1 passed: 1 skipped: 0 failed: 0 Test glib/gdbus-names.test failed: Child process killed by signal 5 SUMMARY: total: 1 passed: 0 skipped: 0 failed: 1
It's possible to get something similar in bus-own-name, afaics: ERROR: gdbus-names ================== # random seed: R02S3c13026d806d9e6265f2c182c7bae9f1 # Start of gdbus tests ok 1 /gdbus/validate-names PASS: gdbus-names 1 /gdbus/validate-names (/home/smcv/build/glib/opt/gio/tests/.libs/lt-gdbus-names:1116): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed ok 2 /gdbus/bus-own-name PASS: gdbus-names 2 /gdbus/bus-own-name Trace/breakpoint trap (core dumped) # GLib-GIO-FATAL-CRITICAL: Error while sending AddMatch() message: The connection is closed cleaning up pid 1146 ERROR: gdbus-names - missing test plan ERROR: gdbus-names - exited with status 133 (terminated by signal 5?)
(In reply to comment #0) > And more recently package builds started failing with a failure in the same > test This might be triggered by running the tests in parallel (make -j$N check, where $N is the number of cores), which is a recent change in Debian and might have been merged into Ubuntu.
(In reply to comment #2) > (In reply to comment #0) > > And more recently package builds started failing with a failure in the same > > test > > This might be triggered by running the tests in parallel (make -j$N check, > where $N is the number of cores), which is a recent change in Debian and might > have been merged into Ubuntu. I tried with -j1 and it still happened—IIRC that's why I wrote "wrt. other tests" in the bug title here. Also, the installed tests I referred to are simple runs of `gnome-desktop-testing-runner glib' which run sequentially unless I'm mistaken.
(In reply to comment #2) > (In reply to comment #0) > > And more recently package builds started failing with a failure in the same > > test > > This might be triggered by running the tests in parallel (make -j$N check, > where $N is the number of cores), which is a recent change in Debian and might > have been merged into Ubuntu. The gio tests definitely don't pass reliably under "make -j check", because the various dbus tests interfere with each other
(In reply to comment #3) > `gnome-desktop-testing-runner glib' which run sequentially > unless I'm mistaken. The Continuous builder runs: gnome-desktop-testing-runner -p 0 which auto-parallelizes to the number of cores.
This happens with -j1 as well.
So this is some kind of an API failure on GDBusConnection. If _set_exit_on_close(FALSE) was called then the connection can transition to the closed state more or less automatically, at any time (since it happens in the worker thread). Meanwhile, we have APIs, such as g_dbus_connection_signal_subscribe() that throw criticals if you call them on a closed connection. We either need to go around and collect the 'invalid on closed' cases and turn them into 'defined as no-op on closed', or figure out something else.
So it turns out that there are two bugs here. The most immediate cause of this bug actually appears to be bug 711807 (which I will soon reopen). Long story short, the patch introduced in bug 711807 causes g_test_dbus_down() to exit immediately. This results in the /gdbus/bus-watch-name testcase sharing the already-used-and-closed bus of the /gdbus/bus-own-name case if it's fast enough.
FTR, gapplication also fails with connection closed: ERROR: gapplication =================== Failed to register: The connection is closed ** GLib-GIO:ERROR:/tmp/buildd/glib2.0-2.39.90/./gio/tests/gapplication.c:564:test_quit: assertion failed: (activated) Aborted (core dumped)
(In reply to comment #7) > If _set_exit_on_close(FALSE) was called then the connection can transition to > the closed state more or less automatically, at any time (since it happens in > the worker thread). With bad enough timing, can it do this? (pseudocode) if (connection.is_still_open()) # returns TRUE { [at this point the connection is closed by the other thread] connection.do_thing_that_is_invalid_while_closed() # critical warning }
Yes. That's entirely the point. Unless we give people the ability to hold the connection lock from outside, then it's possible that even if they're "careful", some other operation will intercede from another thread. I do not want to give people the ability to hold the connection lock from outside. I therefore think that the only reasonable thing to do is to turn these situations into defined no-ops.
(Reassigning to the proper component as I only see bugs filed under 'gdbus' and this was filed on 'gio'.)
I talked to David about this just now and he proposed what I believe to be a workable solution: - we add a distinction between explicit close() via API and a close done in response to external connection loss - we keep the criticals for cases where the user called close() - we tell people that they should not close() on a shared singleton bus (which has already been a 'rule' for a while now anyway) - if you close() your own private connection and then call some API on it, it's definitely your own fault and you deserve to see criticals
(In reply to Emilio Pozuelo Monfort from comment #9) > FTR, gapplication also fails with connection closed: > > ERROR: gapplication > =================== > > Failed to register: The connection is closed > ** > GLib-GIO:ERROR:/tmp/buildd/glib2.0-2.39.90/./gio/tests/gapplication.c:564: > test_quit: assertion failed: (activated) > Aborted (core dumped) This was later filed as Bug #768996. It might have the same root cause, I'm not sure.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/812.