After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 662100 - regression: g_dbus_connection_close() triggers exit-on-close logic
regression: g_dbus_connection_close() triggers exit-on-close logic
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gdbus
2.28.x
Other Linux
: Normal normal
: ---
Assigned To: David Zeuthen (not reading bugmail)
gtkdev
Depends on:
Blocks: 661689 661992
 
 
Reported: 2011-10-18 14:59 UTC by Simon McVittie
Modified: 2011-11-11 16:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[1/4] Revert "Disable two GDBus tests" (1.49 KB, patch)
2011-10-18 15:36 UTC, Simon McVittie
accepted-commit_now Details | Review
[2/4] GDBusWorker: if a read was cancelled it means we closed the connection (1.40 KB, patch)
2011-10-18 15:37 UTC, Simon McVittie
accepted-commit_now Details | Review
[3/4] gdbus-exit-on-close test: cover more possibilities (6.58 KB, patch)
2011-10-18 15:38 UTC, Simon McVittie
accepted-commit_now Details | Review
[4/4] gdbus-non-socket test: avoid use of a GMainContext across a fork (3.89 KB, patch)
2011-10-18 15:39 UTC, Simon McVittie
accepted-commit_now Details | Review
[1/4] gdbus-exit-on-close test: optionally be more verbose for debugging (2.29 KB, patch)
2011-11-03 19:33 UTC, Simon McVittie
none Details | Review
[2/4] gdbus-exit-on-close test: don't leak a variant (828 bytes, patch)
2011-11-03 19:34 UTC, Simon McVittie
none Details | Review
[3/4] GDBusWorker: debug on read errors if transport debugging is enabled (1.34 KB, patch)
2011-11-03 19:34 UTC, Simon McVittie
none Details | Review
[4/4] GDBusWorker: on read error, check whether we were cancelled, not the error (2.20 KB, patch)
2011-11-03 19:38 UTC, Simon McVittie
none Details | Review
[1/5] GDBusWorker: backport explanatory comment about G_IO_ERROR_CANCELLED from 2.30 (1.36 KB, patch)
2011-11-10 19:17 UTC, Simon McVittie
none Details | Review
[2/5] gdbus-exit-on-close test: optionally be more verbose for debugging (2.42 KB, patch)
2011-11-10 19:18 UTC, Simon McVittie
none Details | Review
[3/5] gdbus-exit-on-close test: don't leak a variant (964 bytes, patch)
2011-11-10 19:18 UTC, Simon McVittie
none Details | Review
[4/5] GDBusWorker: debug on read errors if transport debugging is enabled (1.54 KB, patch)
2011-11-10 19:19 UTC, Simon McVittie
none Details | Review
[5/5] GDBusWorker: tolerate read errors while closing, and try not to read more (3.99 KB, patch)
2011-11-10 19:19 UTC, Simon McVittie
none Details | Review
[5/5 v2] GDBusWorker: tolerate read errors while closing (4.03 KB, patch)
2011-11-11 15:40 UTC, Simon McVittie
none Details | Review

Description Simon McVittie 2011-10-18 14:59:19 UTC
My previous work on GDBusWorker thread-safety for Bug #651268 caused a regression: a GDBusConnection with exit-on-close = TRUE would kill itself if you call g_dbus_connection_close().

Also, the exit-on-close test was disabled for a while due to questionable use of fork(), but we should resurrect it: it detects real bugs.

The test I'll attach here also works if compiled against commit 05ef173466e, just before my thread-safety changes (i.e. it proves that we have the old behaviour back).

I haven't tried the 2.28 branch yet, but the regression almost certainly also exists there.
Comment 1 Simon McVittie 2011-10-18 15:36:27 UTC
Created attachment 199338 [details] [review]
[1/4] Revert "Disable two GDBus tests"

---

Only needed on master. They were disabled for Bug #658999, but I'm about to fix the underlying problems by only using main contexts after forking.
Comment 2 Simon McVittie 2011-10-18 15:37:33 UTC
Created attachment 199339 [details] [review]
[2/4] GDBusWorker: if a read was cancelled it means we closed the connection

This was a regression caused by my previous work on GDBusWorker thread-safety
(Bug #651268). The symptom is that if you disconnect a GDBusConnection
locally, the default implementation of GDBusConnection::closed
terminates your process, even though it shouldn't do that for
locally-closed connections; this is because GDBusWorker didn't think a
cancelled read was a local close.

---

This is the actual bugfix. It applies cleanly to glib-2-30 and master.
Comment 3 Simon McVittie 2011-10-18 15:38:31 UTC
Created attachment 199341 [details] [review]
[3/4] gdbus-exit-on-close test: cover more possibilities

We didn't previously test anything except the implicit default of TRUE.
Now we test implicit TRUE, explicit TRUE, explicit FALSE, and
disconnecting at the local end (which regressed while fixing Bug #651268).

Also avoid some questionable use of a main context, which fell foul of
Bug #658999 and caused this test to be disabled in master.

---

Applies cleanly to glib-2-30, and to master if you apply Attachment #199338 [details] first.
Comment 4 Simon McVittie 2011-10-18 15:39:06 UTC
Created attachment 199342 [details] [review]
[4/4] gdbus-non-socket test: avoid use of a GMainContext across a fork

See https://bugzilla.gnome.org/show_bug.cgi?id=658999 for why this would
be bad.
Comment 5 Simon McVittie 2011-10-18 16:15:42 UTC
Patches 2-4 seem to be equally good for 2.28, although an unrelated test (gsettings) fails there. Cherry-picking commit 7654a848e2b57 (a fix to the test code, from 2.30.0) fixes that test for me.

I've also verified that the version of gdbus-exit-on-close produced by these patches passes when compiled against 2.28.8, which is the latest version in which Bug #651268 was unfixed.
Comment 6 David Zeuthen (not reading bugmail) 2011-10-18 18:18:52 UTC
Comment on attachment 199338 [details] [review]
[1/4] Revert "Disable two GDBus tests"

Sure.
Comment 7 David Zeuthen (not reading bugmail) 2011-10-18 18:22:12 UTC
Comment on attachment 199339 [details] [review]
[2/4] GDBusWorker: if a read was cancelled it means we closed the connection

I think the assumption that any error that is G_IO_ERROR_CANCELLED stems from closing the connection on the client-side. We could make it explicit by setting a flag in the worker struct but I don't think that's needed...
Comment 8 David Zeuthen (not reading bugmail) 2011-10-18 18:23:14 UTC
Comment on attachment 199341 [details] [review]
[3/4] gdbus-exit-on-close test: cover more possibilities

Looks good to me. Thanks.
Comment 9 David Zeuthen (not reading bugmail) 2011-10-18 18:24:34 UTC
Comment on attachment 199342 [details] [review]
[4/4] gdbus-non-socket test: avoid use of a GMainContext across a fork

OK, I don't think we'd ever run into any problems like this but I agree it's better to be safe than sorry. Please commit. Thanks!
Comment 10 David Zeuthen (not reading bugmail) 2011-10-18 18:26:14 UTC
(In reply to comment #5)
> Patches 2-4 seem to be equally good for 2.28, although an unrelated test
> (gsettings) fails there. Cherry-picking commit 7654a848e2b57 (a fix to the test
> code, from 2.30.0) fixes that test for me.

I agree it would be nice to cherry-pick as much as possible but please coordinate with Ryan and/or Matthias for cherry-picking - normally I just ask them on IRC. Thanks.
Comment 11 Simon McVittie 2011-10-19 09:49:49 UTC
(In reply to comment #7)
> (From update of attachment 199339 [details] [review])
> I think the assumption that any error that is G_IO_ERROR_CANCELLED stems from
> closing the connection on the client-side. We could make it explicit by setting
> a flag in the worker struct but I don't think that's needed...

I couldn't quite parse this (some words missing?), but you said commit_now, so I will :-)

I did add this comment to explain why it's OK:

+      /* Every async read that uses this callback uses worker->cancellable
+       * as its GCancellable. worker->cancellable gets cancelled if and only
+       * if the GDBusConnection tells us to close (either via
+       * _g_dbus_worker_stop, which is called on last-unref, or directly),
+       * so a cancelled read must mean our connection was closed locally.
+       */

Using a flag in the worker struct would mean I had to think about thread-safety (again) - close() is conceptually a "write" operation at the moment, but this callback is on the "read" side, which has a separate lock - so I think using the cancellable is strictly better.
Comment 12 Simon McVittie 2011-10-19 10:01:34 UTC
All committed to master for 2.31.0. I'll talk to desrt/mclasen about stable branches, please leave this open for now.
Comment 13 Simon McVittie 2011-10-19 11:08:02 UTC
Also applied to 2.28 for 2.28.9 (if it ever happens), and 2.30 for 2.30.2, based on IRC feedback from Benjamin Otte.
Comment 14 David Zeuthen (not reading bugmail) 2011-10-19 12:50:13 UTC
(In reply to comment #11)
> (In reply to comment #7)
> > (From update of attachment 199339 [details] [review] [details])
> > I think the assumption that any error that is G_IO_ERROR_CANCELLED stems from
> > closing the connection on the client-side. We could make it explicit by setting
> > a flag in the worker struct but I don't think that's needed...
> 
> I couldn't quite parse this (some words missing?), 

Sorry, yea, I meant to put "is correct" at the end of the first sentence... Thanks for adding the extra comment
Comment 15 Simon McVittie 2011-11-03 18:53:08 UTC
Sorry, turns out there test still isn't reliable - reopening.

On my laptop the test given here is reliable, but on an embedded device it fails when the async read fails with G_IO_ERROR_CLOSED, not G_IO_ERROR_CANCELLED.

Checking the cancellable explicitly seems a good way to make this not matter while not requiring a flag.
Comment 16 Simon McVittie 2011-11-03 19:33:46 UTC
Created attachment 200638 [details] [review]
[1/4] gdbus-exit-on-close test: optionally be more verbose for  debugging

---

Not required, but helpful.
Comment 17 Simon McVittie 2011-11-03 19:34:13 UTC
Created attachment 200639 [details] [review]
[2/4] gdbus-exit-on-close test: don't leak a variant

---

Not really related, I just happened to notice this bug.
Comment 18 Simon McVittie 2011-11-03 19:34:34 UTC
Created attachment 200640 [details] [review]
[3/4] GDBusWorker: debug on read errors if transport debugging  is enabled

---

Not needed, but helpful.
Comment 19 Simon McVittie 2011-11-03 19:38:26 UTC
Created attachment 200642 [details] [review]
[4/4] GDBusWorker: on read error, check whether we were  cancelled, not the error

My previous fix for GNOME#662100 was incomplete: it seems that with some
timings (I've seen it on an embedded device, but not on my laptop),
the stream can be closed before a read starts. This makes the read
fail immediately with G_IO_ERROR_CLOSED instead of becoming cancelled.

It seems as though what we really want is to ignore read errors, once
we've established that we want to close the connection anyway - this
means that after asking to close, you're immune to exit-on-close,
which seems like a good rule.

---

Unfinished but included here for comment...

This is the actual bugfix, for the 2.28 branch. On 2.30 and master the patch should be basically the same, but would also include deleting the comment quoted in Comment #11, which I accidentally omitted from 2.28.

> Using a flag in the worker struct would mean I had to think about
> thread-safety (again)

This is essentially a flag in the worker struct - and I thought GCancellable was thread-safe, but it looks as though it might not be (unless taking the lock in g_cancellable_cancel is a memory barrier). So this may or may not actually be safe. Any ideas?
Comment 20 Simon McVittie 2011-11-03 19:39:07 UTC
The other thing we could do instead of 4/4 would be to treat G_IO_ERROR_CLOSED as equivalent to G_IO_ERROR_CANCELLED.
Comment 21 Simon McVittie 2011-11-10 19:17:47 UTC
Created attachment 201175 [details] [review]
[1/5] GDBusWorker: backport explanatory comment about  G_IO_ERROR_CANCELLED from 2.30

This was mistakenly omitted from the 2.28 version.

(cherry picked from commit d65b80fb547ef3eed80ae970f858e7d110d90b40)
Comment 22 Simon McVittie 2011-11-10 19:18:15 UTC
Created attachment 201176 [details] [review]
[2/5] gdbus-exit-on-close test: optionally be more verbose for  debugging
Comment 23 Simon McVittie 2011-11-10 19:18:44 UTC
Created attachment 201177 [details] [review]
[3/5] gdbus-exit-on-close test: don't leak a variant
Comment 24 Simon McVittie 2011-11-10 19:19:05 UTC
Created attachment 201178 [details] [review]
[4/5] GDBusWorker: debug on read errors if transport debugging  is enabled
Comment 25 Simon McVittie 2011-11-10 19:19:36 UTC
Created attachment 201179 [details] [review]
[5/5] GDBusWorker: tolerate read errors while closing, and try  not to read more

My previous fix for GNOME#662100 was incomplete: it seems that with some
timings, the stream can be closed with an async read in-flight. This
can make the read fail immediately with G_IO_ERROR_CLOSED instead of
becoming cancelled.
 
This happens reliably on an embedded device, and rarely on my laptop;
repeating the test 100 times in quick succession reliably reproduces
the bug on my laptop.

It seems as though what we really want is to ignore read errors, once
we've established that we want to close the connection anyway - this
means that after asking to close, you're immune to exit-on-close,
which seems like a good rule.
Comment 26 Colin Walters 2011-11-11 15:39:28 UTC
These patches all look good.
Comment 27 Simon McVittie 2011-11-11 15:40:58 UTC
Created attachment 201237 [details] [review]
[5/5 v2] GDBusWorker: tolerate read errors while closing

My previous fix for GNOME#662100 was incomplete: it seems that with some
timings, the stream can be closed with an async read in-flight. This
can make the read fail immediately with G_IO_ERROR_CLOSED instead of
becoming cancelled.

This happens reliably on an embedded device, and rarely on my laptop;
repeating the test 100 times in quick succession reliably reproduces
the bug on my laptop.

It seems as though what we really want is to ignore read errors, once
we've established that we want to close the connection anyway - this
means that after asking to close, you're immune to exit-on-close,
which seems like a good rule.

An additional subtlety is that continuing to read after we know we
want to close is still required, otherwise we'll never emit ::closed.

---

Sigh. The "never emit ::closed" bug is another one that happens when you run a test several hundred times in quick succession.
Comment 28 Simon McVittie 2011-11-11 16:07:27 UTC
Updated patch acked on IRC by walters. Fixed in git for 2.28.9, 2.30.2, 2.31.1