GNOME Bugzilla – Bug 662100
regression: g_dbus_connection_close() triggers exit-on-close logic
Last modified: 2011-11-11 16:39:54 UTC
My previous work on GDBusWorker thread-safety for Bug #651268 caused a regression: a GDBusConnection with exit-on-close = TRUE would kill itself if you call g_dbus_connection_close(). Also, the exit-on-close test was disabled for a while due to questionable use of fork(), but we should resurrect it: it detects real bugs. The test I'll attach here also works if compiled against commit 05ef173466e, just before my thread-safety changes (i.e. it proves that we have the old behaviour back). I haven't tried the 2.28 branch yet, but the regression almost certainly also exists there.
Created attachment 199338 [details] [review] [1/4] Revert "Disable two GDBus tests" --- Only needed on master. They were disabled for Bug #658999, but I'm about to fix the underlying problems by only using main contexts after forking.
Created attachment 199339 [details] [review] [2/4] GDBusWorker: if a read was cancelled it means we closed the connection This was a regression caused by my previous work on GDBusWorker thread-safety (Bug #651268). The symptom is that if you disconnect a GDBusConnection locally, the default implementation of GDBusConnection::closed terminates your process, even though it shouldn't do that for locally-closed connections; this is because GDBusWorker didn't think a cancelled read was a local close. --- This is the actual bugfix. It applies cleanly to glib-2-30 and master.
Created attachment 199341 [details] [review] [3/4] gdbus-exit-on-close test: cover more possibilities We didn't previously test anything except the implicit default of TRUE. Now we test implicit TRUE, explicit TRUE, explicit FALSE, and disconnecting at the local end (which regressed while fixing Bug #651268). Also avoid some questionable use of a main context, which fell foul of Bug #658999 and caused this test to be disabled in master. --- Applies cleanly to glib-2-30, and to master if you apply Attachment #199338 [details] first.
Created attachment 199342 [details] [review] [4/4] gdbus-non-socket test: avoid use of a GMainContext across a fork See https://bugzilla.gnome.org/show_bug.cgi?id=658999 for why this would be bad.
Patches 2-4 seem to be equally good for 2.28, although an unrelated test (gsettings) fails there. Cherry-picking commit 7654a848e2b57 (a fix to the test code, from 2.30.0) fixes that test for me. I've also verified that the version of gdbus-exit-on-close produced by these patches passes when compiled against 2.28.8, which is the latest version in which Bug #651268 was unfixed.
Comment on attachment 199338 [details] [review] [1/4] Revert "Disable two GDBus tests" Sure.
Comment on attachment 199339 [details] [review] [2/4] GDBusWorker: if a read was cancelled it means we closed the connection I think the assumption that any error that is G_IO_ERROR_CANCELLED stems from closing the connection on the client-side. We could make it explicit by setting a flag in the worker struct but I don't think that's needed...
Comment on attachment 199341 [details] [review] [3/4] gdbus-exit-on-close test: cover more possibilities Looks good to me. Thanks.
Comment on attachment 199342 [details] [review] [4/4] gdbus-non-socket test: avoid use of a GMainContext across a fork OK, I don't think we'd ever run into any problems like this but I agree it's better to be safe than sorry. Please commit. Thanks!
(In reply to comment #5) > Patches 2-4 seem to be equally good for 2.28, although an unrelated test > (gsettings) fails there. Cherry-picking commit 7654a848e2b57 (a fix to the test > code, from 2.30.0) fixes that test for me. I agree it would be nice to cherry-pick as much as possible but please coordinate with Ryan and/or Matthias for cherry-picking - normally I just ask them on IRC. Thanks.
(In reply to comment #7) > (From update of attachment 199339 [details] [review]) > I think the assumption that any error that is G_IO_ERROR_CANCELLED stems from > closing the connection on the client-side. We could make it explicit by setting > a flag in the worker struct but I don't think that's needed... I couldn't quite parse this (some words missing?), but you said commit_now, so I will :-) I did add this comment to explain why it's OK: + /* Every async read that uses this callback uses worker->cancellable + * as its GCancellable. worker->cancellable gets cancelled if and only + * if the GDBusConnection tells us to close (either via + * _g_dbus_worker_stop, which is called on last-unref, or directly), + * so a cancelled read must mean our connection was closed locally. + */ Using a flag in the worker struct would mean I had to think about thread-safety (again) - close() is conceptually a "write" operation at the moment, but this callback is on the "read" side, which has a separate lock - so I think using the cancellable is strictly better.
All committed to master for 2.31.0. I'll talk to desrt/mclasen about stable branches, please leave this open for now.
Also applied to 2.28 for 2.28.9 (if it ever happens), and 2.30 for 2.30.2, based on IRC feedback from Benjamin Otte.
(In reply to comment #11) > (In reply to comment #7) > > (From update of attachment 199339 [details] [review] [details]) > > I think the assumption that any error that is G_IO_ERROR_CANCELLED stems from > > closing the connection on the client-side. We could make it explicit by setting > > a flag in the worker struct but I don't think that's needed... > > I couldn't quite parse this (some words missing?), Sorry, yea, I meant to put "is correct" at the end of the first sentence... Thanks for adding the extra comment
Sorry, turns out there test still isn't reliable - reopening. On my laptop the test given here is reliable, but on an embedded device it fails when the async read fails with G_IO_ERROR_CLOSED, not G_IO_ERROR_CANCELLED. Checking the cancellable explicitly seems a good way to make this not matter while not requiring a flag.
Created attachment 200638 [details] [review] [1/4] gdbus-exit-on-close test: optionally be more verbose for debugging --- Not required, but helpful.
Created attachment 200639 [details] [review] [2/4] gdbus-exit-on-close test: don't leak a variant --- Not really related, I just happened to notice this bug.
Created attachment 200640 [details] [review] [3/4] GDBusWorker: debug on read errors if transport debugging is enabled --- Not needed, but helpful.
Created attachment 200642 [details] [review] [4/4] GDBusWorker: on read error, check whether we were cancelled, not the error My previous fix for GNOME#662100 was incomplete: it seems that with some timings (I've seen it on an embedded device, but not on my laptop), the stream can be closed before a read starts. This makes the read fail immediately with G_IO_ERROR_CLOSED instead of becoming cancelled. It seems as though what we really want is to ignore read errors, once we've established that we want to close the connection anyway - this means that after asking to close, you're immune to exit-on-close, which seems like a good rule. --- Unfinished but included here for comment... This is the actual bugfix, for the 2.28 branch. On 2.30 and master the patch should be basically the same, but would also include deleting the comment quoted in Comment #11, which I accidentally omitted from 2.28. > Using a flag in the worker struct would mean I had to think about > thread-safety (again) This is essentially a flag in the worker struct - and I thought GCancellable was thread-safe, but it looks as though it might not be (unless taking the lock in g_cancellable_cancel is a memory barrier). So this may or may not actually be safe. Any ideas?
The other thing we could do instead of 4/4 would be to treat G_IO_ERROR_CLOSED as equivalent to G_IO_ERROR_CANCELLED.
Created attachment 201175 [details] [review] [1/5] GDBusWorker: backport explanatory comment about G_IO_ERROR_CANCELLED from 2.30 This was mistakenly omitted from the 2.28 version. (cherry picked from commit d65b80fb547ef3eed80ae970f858e7d110d90b40)
Created attachment 201176 [details] [review] [2/5] gdbus-exit-on-close test: optionally be more verbose for debugging
Created attachment 201177 [details] [review] [3/5] gdbus-exit-on-close test: don't leak a variant
Created attachment 201178 [details] [review] [4/5] GDBusWorker: debug on read errors if transport debugging is enabled
Created attachment 201179 [details] [review] [5/5] GDBusWorker: tolerate read errors while closing, and try not to read more My previous fix for GNOME#662100 was incomplete: it seems that with some timings, the stream can be closed with an async read in-flight. This can make the read fail immediately with G_IO_ERROR_CLOSED instead of becoming cancelled. This happens reliably on an embedded device, and rarely on my laptop; repeating the test 100 times in quick succession reliably reproduces the bug on my laptop. It seems as though what we really want is to ignore read errors, once we've established that we want to close the connection anyway - this means that after asking to close, you're immune to exit-on-close, which seems like a good rule.
These patches all look good.
Created attachment 201237 [details] [review] [5/5 v2] GDBusWorker: tolerate read errors while closing My previous fix for GNOME#662100 was incomplete: it seems that with some timings, the stream can be closed with an async read in-flight. This can make the read fail immediately with G_IO_ERROR_CLOSED instead of becoming cancelled. This happens reliably on an embedded device, and rarely on my laptop; repeating the test 100 times in quick succession reliably reproduces the bug on my laptop. It seems as though what we really want is to ignore read errors, once we've established that we want to close the connection anyway - this means that after asking to close, you're immune to exit-on-close, which seems like a good rule. An additional subtlety is that continuing to read after we know we want to close is still required, otherwise we'll never emit ::closed. --- Sigh. The "never emit ::closed" bug is another one that happens when you run a test several hundred times in quick succession.
Updated patch acked on IRC by walters. Fixed in git for 2.28.9, 2.30.2, 2.31.1