After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 651268 - assertion failed: (!g_output_stream_has_pending (ostream)) in GDBus worker thread
assertion failed: (!g_output_stream_has_pending (ostream)) in GDBus worker th...
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gdbus
2.29.x
Other Linux
: Normal critical
: ---
Assigned To: David Zeuthen (not reading bugmail)
gtkdev
: 659085 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-05-27 16:42 UTC by Pedro Villavicencio
Modified: 2017-11-15 12:58 UTC
See Also:
GNOME target: ---
GNOME version: 2.31/2.32


Attachments
GDBus test case (12.99 KB, patch)
2011-09-07 19:23 UTC, Simon McVittie
none Details | Review
[1/5] GDBusWorker: combine num_writes_pending with flush_pending (5.36 KB, patch)
2011-09-08 18:30 UTC, Simon McVittie
accepted-commit_now Details | Review
[2/5] GDBusConnection: delegate to the worker to close the stream (14.98 KB, patch)
2011-09-08 18:31 UTC, Simon McVittie
reviewed Details | Review
[3/5] GDBusWorker: annotate more functions with lock/pending state (5.64 KB, patch)
2011-09-08 18:32 UTC, Simon McVittie
accepted-commit_now Details | Review
[4/5] Regression test for closing a stream, slowly, while a message is queued (13.32 KB, patch)
2011-09-08 18:32 UTC, Simon McVittie
reviewed Details | Review
[5/5] GDBusWorker: as a precaution, access 'stopped' boolean atomically (2.77 KB, patch)
2011-09-08 18:36 UTC, Simon McVittie
accepted-commit_now Details | Review
[1/6 v2 for 2.28] GDBusWorker: combine num_writes_pending with flush_pending (5.69 KB, patch)
2011-09-13 16:11 UTC, Simon McVittie
committed Details | Review
[2/6 v2 for 2.28] GDBusConnection: delegate to the worker to close the stream (15.40 KB, patch)
2011-09-13 16:19 UTC, Simon McVittie
committed Details | Review
[3/6 v2 for 2.28] GDBusWorker: annotate more functions with lock/pending state (6.40 KB, patch)
2011-09-13 16:20 UTC, Simon McVittie
committed Details | Review
[4/6 v2 for 2.28] Regression test for closing a stream, slowly, while a message is queued (13.25 KB, patch)
2011-09-13 16:21 UTC, Simon McVittie
committed Details | Review
[5/6 v2 for 2.28] GDBusWorker: as a precaution, access 'stopped' boolean atomically (2.77 KB, patch)
2011-09-13 16:21 UTC, Simon McVittie
committed Details | Review
[6/6 for 2.28] _g_dbus_worker_new: ref worker until after _g_dbus_worker_thread_begin_func (1.41 KB, patch)
2011-09-13 16:24 UTC, Simon McVittie
committed Details | Review
[1/6 for 2.30] GDBusWorker: combine num_writes_pending with flush_pending (5.69 KB, patch)
2011-09-13 16:41 UTC, Simon McVittie
committed Details | Review
[2/6 for 2.30] GDBusConnection: delegate to the worker to close the stream (16.21 KB, patch)
2011-09-13 16:43 UTC, Simon McVittie
committed Details | Review
[3/6 for 2.30] GDBusWorker: annotate more functions with lock/pending state (6.40 KB, patch)
2011-09-13 16:45 UTC, Simon McVittie
committed Details | Review
[4/6 v2 for 2.30] Regression test for closing a stream, slowly, while a message is queued (13.37 KB, patch)
2011-09-13 16:46 UTC, Simon McVittie
committed Details | Review
[5/6 for 2.30] GDBusWorker: as a precaution, access 'stopped' boolean atomically (2.78 KB, patch)
2011-09-13 16:47 UTC, Simon McVittie
committed Details | Review
[6/6 for 2.30] _g_dbus_worker_new: hold a ref until we have done the initial read (1.30 KB, patch)
2011-09-13 16:48 UTC, Simon McVittie
committed Details | Review

Description Pedro Villavicencio 2011-05-27 16:42:43 UTC
this report has been filed here:

https://bugs.launchpad.net/ubuntu/+source/gnome-power-manager/+bug/740878

gpm seems to be crashing for users after log in, backtrace is:

".

Thread 2 (Thread 1227)

  • #0 g_output_stream_close
    at /build/buildd/glib2.0-2.28.4/./gio/goutputstream.c line 529
  • #1 g_socket_connection_close
    at /build/buildd/glib2.0-2.28.4/./gio/gsocketconnection.c line 297
  • #2 g_socket_connection_close_async
    at /build/buildd/glib2.0-2.28.4/./gio/gsocketconnection.c line 332
  • #3 g_io_stream_close_async
    at /build/buildd/glib2.0-2.28.4/./gio/giostream.c line 510
  • #4 g_dbus_connection_finalize
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusconnection.c line 518
  • #5 g_object_unref
    at /build/buildd/glib2.0-2.28.4/./gobject/gobject.c line 2734
  • #6 app_indicator_dispose
    at /build/buildd/libappindicator-0.2.99/./src/app-indicator.c line 662
  • #7 g_object_unref
    at /build/buildd/glib2.0-2.28.4/./gobject/gobject.c line 2697
  • #8 gpm_tray_icon_finalize
    at gpm-tray-icon.c line 483
  • #9 g_object_unref
    at /build/buildd/glib2.0-2.28.4/./gobject/gobject.c line 2734
  • #10 gpm_manager_finalize
    at gpm-manager.c line 2089
  • #11 g_object_unref
    at /build/buildd/glib2.0-2.28.4/./gobject/gobject.c line 2734
  • #12 main
    at gpm-main.c line 282

Comment 1 Simon McVittie 2011-09-05 18:11:58 UTC
I've seen automated crash reports resembling this backtrace in several non-public apps, and it's in GDBus internal code, so I think it's likely to be a GDBus bug.

Could someone with Bugzilla permissions please retitle to something like:

    assertion failed: (!g_output_stream_has_pending (ostream)) in GDBus
    worker thread

and reassign to Platform/glib/GDBus?

So far, I've been unable to reproduce it in a synthetic stress-test, or spot what the problem is from examination of the source code.

The interesting part of the stack trace is thread 1:

Thread 1 (Thread 1234)

  • #4 g_assertion_message_expr
  • #5 write_message_continue_writing
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusprivate.c line 1035
  • #6 maybe_write_next_message
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusprivate.c line 1343
  • #7 write_message_in_idle_cb
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusprivate.c line 1356
  • #8 g_idle_dispatch
    at /build/buildd/glib2.0-2.28.4/./glib/gmain.c line 4545

Comment 2 Simon McVittie 2011-09-05 18:19:01 UTC
[For those working on the appropriate project, this is NB#271520.]

One theory I had was:

> /* called in private thread shared by all GDBusConnection instances (without
> write-lock held) */
> static gboolean
> write_message_in_idle_cb (gpointer user_data)
> {
>   GDBusWorker *worker = user_data;
>   if (worker->num_writes_pending == 0 && !worker->flush_pending)
>     maybe_write_next_message (worker);
>   return FALSE;
> }
> 
> It checks num_writes_pending and flush_pending without holding a lock: they
> might be updated from another thread.

However, on closer examination, this doesn't look like a bug. The code is subtle, but seems correct:

* the relevant threads come in two categories: the shared GDBus worker thread
  (I'll call it "the worker thread") and all other threads, including the
  main thread (I'll call them "user threads")

* user threads sometimes look at num_writes_pending and flush_pending,
  but do not modify them; they take the write_lock while looking at those
  variables

* various functions in the worker thread alter num_writes_pending and
  flush_pending; they always take the write_lock while altering those
  variables

* write_message_in_idle_cb reads those variables without taking the lock,
  but this is safe, because user threads never alter them, and
  write_message_in_idle_cb runs in the worker thread, so the worker thread
  can't be doing anything else (like concurrent alteration) at the same time

Based on drawing a diagram of the call-graph and how the pending state and those variables relate, I can't yet see how this assertion failure can happen, except under pathological circumstances [1], but there must be something I haven't spotted. I'll have another look tomorrow...

--------

[1] These are the pathological circumstances I mean:

* someone has passed a GIOStream (which is a paired GInputStream and
  GOutputStream) to g_dbus_connection_new(), and

* a user thread is concurrently sending data out of the GOutputStream,
  bypassing GDBus

That would be a silly thing to do, and I don't think it's something we want to
support. I doubt gnome-power-manager actually does this, though!
Comment 3 Simon McVittie 2011-09-05 18:25:26 UTC
(In reply to comment #2)
> Based on drawing a diagram of the call-graph and how the pending state and
> those variables relate, I can't yet see how this assertion failure can happen

Of course, just after writing that, I spotted a potential problem, then found the fix in git. I believe this is a duplicate of Bug #635626.
Comment 4 Simon McVittie 2011-09-06 10:09:02 UTC
(In reply to comment #3)
> I believe this is a duplicate of Bug #635626.

In fact it can't be the same thing - the reporter of the original Launchpad bug here was running GLib 2.28.4, the systems where I've seen this bug are also based on GLib 2.28.4, and Bug #635626 was fixed in 2.27.5. Perhaps the fix was incomplete?
Comment 5 Simon McVittie 2011-09-07 17:12:33 UTC
The crash looks like it's caused by what's happening in the other thread (my backtraces from non-public apps all have the same pattern).

The GDBus worker thread is sending a message:

Thread 2 (Thread 1227)

  • #0 g_output_stream_close
    at /build/buildd/glib2.0-2.28.4/./gio/goutputstream.c line 529
  • #1 g_socket_connection_close
    at /build/buildd/glib2.0-2.28.4/./gio/gsocketconnection.c line 297
  • #2 g_socket_connection_close_async
    at /build/buildd/glib2.0-2.28.4/./gio/gsocketconnection.c line 332
  • #3 g_io_stream_close_async
    at /build/buildd/glib2.0-2.28.4/./gio/giostream.c line 510
  • #4 g_dbus_connection_finalize
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusconnection.c line 518
  • #5 g_object_unref
    at /build/buildd/glib2.0-2.28.4/./gobject/gobject.c line 2734
  • #6 app_indicator_dispose
    at /build/buildd/libappindicator-0.2.99/./src/app-indicator.c line 662

GDBusWorker guards itself against pending write and flush operations (assuming you don't mix data into the same output stream, which would be silly), but it doesn't guard against a pending close.

I'm trying to write a test-case for this.
Comment 6 Simon McVittie 2011-09-07 17:14:17 UTC
Ugh, Bugzilla's "pretty" stacktrace display broke my comment. What I meant to say (I'll turn it into two separate comments) is:

The GDBus worker thread is sending a message:

  • #4 g_assertion_message_expr
  • #5 write_message_continue_writing
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusprivate.c line 1035
  • #6 maybe_write_next_message
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusprivate.c line 1343
  • #7 write_message_in_idle_cb
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusprivate.c line 1356
  • #8 g_idle_dispatch
    at /build/buildd/glib2.0-2.28.4/./glib/gmain.c line 4545

Comment 7 Simon McVittie 2011-09-07 17:15:31 UTC
... and meanwhile, the main thread is finalizing a private GDBusConnection, which results in it closing its IO stream (the actual close is synchronous and relatively fast, although the result won't be reported until the main loop is re-entered):

  • #0 g_output_stream_close
    at /build/buildd/glib2.0-2.28.4/./gio/goutputstream.c line 529
  • #1 g_socket_connection_close
    at /build/buildd/glib2.0-2.28.4/./gio/gsocketconnection.c line 297
  • #2 g_socket_connection_close_async
    at /build/buildd/glib2.0-2.28.4/./gio/gsocketconnection.c line 332
  • #3 g_io_stream_close_async
    at /build/buildd/glib2.0-2.28.4/./gio/giostream.c line 510
  • #4 g_dbus_connection_finalize
    at /build/buildd/glib2.0-2.28.4/./gio/gdbusconnection.c line 518
  • #5 g_object_unref
    at /build/buildd/glib2.0-2.28.4/./gobject/gobject.c line 2734
  • #6 app_indicator_dispose
    at /build/buildd/libappindicator-0.2.99/./src/app-indicator.c line 662

Comment 8 Simon McVittie 2011-09-07 19:23:18 UTC
Created attachment 195924 [details] [review]
GDBus test case

Both cases in this test fail in the same way as gnome-power-manager and my non-public apps: one case closes the connection just after sending a message, and the other last-unrefs the connection. To reproduce the bug it was necessary to make close() take long enough that the worker thread gets scheduled (I used 1 millisecond), and repeat the test a few times (repeating 100 times reliably triggers the bug on my laptop).

I'm working towards a fix, but that's harder...

(This test-case demonstrates that there's definitely a GDBus bug - nothing gnome-power-manager-specific here.)
Comment 9 Simon McVittie 2011-09-08 18:30:51 UTC
Created attachment 196024 [details] [review]
[1/5] GDBusWorker: combine num_writes_pending with  flush_pending

num_writes_pending was a counter, but it only took values 0 or 1, so make
it a boolean: it would never make sense to be trying to write out two
messages at the same time (they'd get interleaved).

Similarly, we can never be writing and flushing at the same time (that'd
mean we were flushing halfway through a message, which would be pointless)
so combine it with flush_pending too, calling the result output_pending.

Also assert that it takes the expected value whenever we change it,
and document the locking discipline used for it, including a subtle
case in write_message_in_idle_cb where it's not obvious at first glance
why we don't need the lock.
Comment 10 Simon McVittie 2011-09-08 18:31:46 UTC
Created attachment 196025 [details] [review]
[2/5] GDBusConnection: delegate to the worker to close the  stream

We can't safely close the output part of the I/O stream until any 
pending write or flush has been completed. In the worst case, this could
lead to an assertion failure in the worker (when the close wins the
race) or not closing the stream at all (when the write wins the race).

---

This is the fix for this bug as reported.
Comment 11 Simon McVittie 2011-09-08 18:32:28 UTC
Created attachment 196027 [details] [review]
[3/5] GDBusWorker: annotate more functions with lock/pending  state
Comment 12 Simon McVittie 2011-09-08 18:32:56 UTC
Created attachment 196028 [details] [review]
[4/5] Regression test for closing a stream, slowly, while a  message is queued
Comment 13 Simon McVittie 2011-09-08 18:36:55 UTC
Created attachment 196029 [details] [review]
[5/5] GDBusWorker: as a precaution, access 'stopped' boolean  atomically

This member is written in _g_dbus_worker_stop from arbitrary threads, and
read by the worker thread, so it should be accessed atomically.

---

That's all the patches for now; they seem to fix the test in 4/5, at least. Developed against 2.28, but should hopefully apply to 2.30 or master.
Comment 14 Cosimo Alfarano 2011-09-09 15:06:10 UTC
I've no gdbus/glib reviewer hat, but all patches seem OK to me, though.
Comment 15 Colin Walters 2011-09-09 20:10:01 UTC
Review of attachment 196025 [details] [review]:

Minor comments, looks generally right to me.

::: gio/gdbusconnection.c
@@ -520,3 @@
-                               NULL,  /* GCancellable */
-                               NULL,  /* GAsyncReadyCallback */
-                               NULL); /* userdata */

So...what is responsible for ensuring the stream is closed now?  Just the refcount on the socket?  If that's the case, why were we explicitly closing it before?  

Looks like we cancel/unref the worker in dispose(), so this close_async() call should have always been superfluous.

::: gio/gdbusprivate.c
@@ +1390,3 @@
+
+  /* if we want to close the connection, that takes precedence */
+  if (worker->close_list != NULL)

Hm, "close_list" is a weird name to me.  How about "pending_close_data"?

Same goes for "flush_list".

@@ +1489,3 @@
+/*
+ * @write_data: (transfer full) (allow-none):
+ * @close_data: (transfer full) (allow-none):

If you're going to make it look like gtk-doc, you might as well add schedule_write_in_worker_thread: to the top too.

@@ +1519,3 @@
+                             (GDestroyNotify) _g_dbus_worker_unref);
+      g_source_attach (idle_source, shared_thread_data->context);
+      g_source_unref (idle_source);

Hmm, I thought we had centralized the idle creation in a utility function earlier, but I guess not.
Comment 16 Colin Walters 2011-09-09 20:11:18 UTC
Review of attachment 196024 [details] [review]:

Makes sense.
Comment 17 Colin Walters 2011-09-09 20:12:10 UTC
Review of attachment 196027 [details] [review]:

We clearly lack a nice semi-formal style for lock annotations, but this looks like an improvement at least.
Comment 18 Colin Walters 2011-09-09 20:18:43 UTC
Review of attachment 196028 [details] [review]:

::: gio/tests/gdbus-close-pending.c
@@ +304,3 @@
+
+  g_assert_cmpint (pipe (f->server_to_client), ==, 0);
+  g_assert_cmpint (pipe (f->client_to_server), ==, 0);

g_unix_open_pipe()

Also I definitely prefer expressions with side effects to be outside assertions.

@@ +382,3 @@
+                 gconstpointer  context)
+{
+  g_message ("skipped (FIXME: use a Win32 pipe or something)");

Seems like it'd be better to only compile/run this at all on unix via automake conditional rather than having 90% of the file under #ifdef G_OS_UNIX.
Comment 19 Colin Walters 2011-09-09 20:19:11 UTC
Review of attachment 196029 [details] [review]:

Sure.
Comment 20 Matthias Clasen 2011-09-09 21:39:19 UTC
Review of attachment 196025 [details] [review]:

::: gio/gdbusprivate.c
@@ +1519,3 @@
+                             (GDestroyNotify) _g_dbus_worker_unref);
+      g_source_attach (idle_source, shared_thread_data->context);
+      g_source_unref (idle_source);

Ryan added g_main_context_invoke() for this purpose.
Comment 21 Simon McVittie 2011-09-12 08:54:05 UTC
(In reply to comment #15)
> So...what is responsible for ensuring the stream is closed now?  Just the
> refcount on the socket?

_g_dbus_worker_stop() in dispose (which was already called) now schedules a close on the stream, in addition to what it already did - so finalize no longer needs to.

> Looks like we cancel/unref the worker in dispose(), so this close_async() call
> should have always been superfluous.

Yes, but relying on the refcount would mean that the stream wasn't actually closed if there was a refleak.

> Hm, "close_list" is a weird name to me.  How about "pending_close_data"?

Perhaps close_attempts?

> Hmm, I thought we had centralized the idle creation in a utility function
> earlier, but I guess not.

Perhaps you did on 2.30, but I'm still targeting 2.28 at the moment. I'll check that everything applies nicely to 2.30 - there'll probably be an additional patch to clean up things that can be cleaned up.

(Unless you're no longer fixing 2.28 bugs - but I need a version of these patches that can apply to 2.28 anyway, whether you still want those or not.)

(In reply to comment #18)
> g_unix_open_pipe()

Not on 2.28, but yes.

> Also I definitely prefer expressions with side effects to be outside
> assertions.

Yes; I should fix gdbus-non-socket.c (on which I based this test), too.

> Seems like it'd be better to only compile/run this at all on unix via automake
> conditional rather than having 90% of the file under #ifdef G_OS_UNIX.

The reason I didn't is that there's nothing inherently Unix-specific in this test - it just needs a pipe-to-self implementation, and I don't know the using-the-right-C-runtime subtleties of writing one of those on Windows. gdbus-non-socket.c is the same.
Comment 22 David Zeuthen (not reading bugmail) 2011-09-12 15:06:45 UTC
Hey, sorry for not responding sooner - I was at the Plumbers conference last week.

Thanks for the patches and also thanks to Colin for the review. Patches generally look good and I'm happy to see how they improve the quality of the code in gio/gdbusprivate.c - the code I originally wrote is pretty cryptic because everything in async such that multiple connections can share the same worker thread - sorry about that!

I think it sounds like a great idea to get this into both 2.28 and 2.30. I just tried applying the patches on 2.30 but they didn't cleanly apply... FWIW, I'm not too concerned about using the new glib-unix stuff or g_main_context_invoke() only in 2.29.* from the get go - I think it's more important to get the bug-fixes (and test cases) into the tree and then later we can add patches to make the code 2.29.* only by using e.g. g_main_context_invoke().

Thanks again for the great work!
Comment 23 Simon McVittie 2011-09-12 18:49:27 UTC
OK, I now have two branches, for 2.28 and 2.30/master. I'll attach them after some testing.

(In reply to comment #18)
> g_unix_open_pipe()

Fixed in the 2.30 version.

> Also I definitely prefer expressions with side effects to be outside
> assertions.

Fixed in both versions for the test I added, but this seems to be a general problem in GLib tests, which is rather out-of-scope for this bug. You could argue that when the tests report failure as an assertion, it's not a problem if they don't really work with assertions disabled?

I have about half a branch to fix the other instances in 2.28, but it got rather long, so I shelved it; I'll revisit it when the changes from this bug have landed. Do people want this fixed in 2.28, 2.30, or only in master? (I suspect only-in-master is best.)

> Seems like it'd be better to only compile/run this at all on unix via automake
> conditional rather than having 90% of the file under #ifdef G_OS_UNIX.

The GDBus tests are Unix-only at the moment anyway, because most of them run under a temporary session bus using dbus-launch.

For the moment I made it #error if used on non-Unix; I have half a branch to make it work on Windows, but in my mingw-w64 + Wine environment it still fails, possibly because it's picking up a GIO DLL with this bug not fixed. I have no idea whether it works on real Windows.

(In reply to comment #20)
> Ryan added g_main_context_invoke() for this purpose.

I don't think this is an appropriate change for 2.28 (although the function does exist there), but I'll add a patch to do this in the 2.30 branch. I would personally not apply this to 2.30 either, only to master after merging 2.30, but it's up to you whether you want to apply it anyway.

How do stable branches work in GLib? It looks to me as though fixes get applied or cherry-picked separately to all applicable branches, rather than merging stable into master like we do in dbus?
Comment 24 David Zeuthen (not reading bugmail) 2011-09-12 20:34:13 UTC
(In reply to comment #23)
> OK, I now have two branches, for 2.28 and 2.30/master. I'll attach them after
> some testing.

Excellent, thanks!

> > Also I definitely prefer expressions with side effects to be outside
> > assertions.
> 
> Fixed in both versions for the test I added, but this seems to be a general
> problem in GLib tests, which is rather out-of-scope for this bug. You could
> argue that when the tests report failure as an assertion, it's not a problem if
> they don't really work with assertions disabled?
> 
> I have about half a branch to fix the other instances in 2.28, but it got
> rather long, so I shelved it; I'll revisit it when the changes from this bug
> have landed. Do people want this fixed in 2.28, 2.30, or only in master? (I
> suspect only-in-master is best.)

I think that it's fine to have side-effects in asserts in the tests - the whole idea is that the g_assert*() family of functions are used in tests so the test program won't work correctly if asserts are disabled.

> The GDBus tests are Unix-only at the moment anyway, because most of them run
> under a temporary session bus using dbus-launch.

Yeah. I would love to see gtester support something like that natively though...

> For the moment I made it #error if used on non-Unix; I have half a branch to
> make it work on Windows, but in my mingw-w64 + Wine environment it still fails,
> possibly because it's picking up a GIO DLL with this bug not fixed. I have no
> idea whether it works on real Windows.

That's fine - if someone is so inclined they can make it work on Windows.

> How do stable branches work in GLib? It looks to me as though fixes get applied
> or cherry-picked separately to all applicable branches, rather than merging
> stable into master like we do in dbus?

Yeah, I think we only cherry-pick only the really important changes / bug-fixes.
Comment 25 Simon McVittie 2011-09-13 16:08:58 UTC
(In reply to comment #24)
> > The GDBus tests are Unix-only at the moment anyway, because most of them run
> > under a temporary session bus using dbus-launch.
> 
> Yeah. I would love to see gtester support something like that natively
> though...

For that sort of thing, you probably want have gtester support a G_TEST_TRAP_DBUS_DAEMON flag that runs something like 'dbus-daemon --print-address=3 --no-fork' directly, rather than relying on dbus-launch.

dbus-launch isn't really the right tool for the job: see <https://bugs.freedesktop.org/show_bug.cgi?id=39196> (where review would also be appreciated).
Comment 26 Simon McVittie 2011-09-13 16:11:16 UTC
Created attachment 196410 [details] [review]
[1/6 v2 for 2.28] GDBusWorker: combine num_writes_pending with flush_pending

num_writes_pending was a counter, but it only took values 0 or 1, so make
it a boolean: it would never make sense to be trying to write out two
messages at the same time (they'd get interleaved).

Similarly, we can never be writing and flushing at the same time (that'd
mean we were flushing halfway through a message, which would be pointless)
so combine it with flush_pending too, calling the result output_pending.

Also assert that it takes the expected value whenever we change it,
and document the locking discipline used for it, including a subtle
case in write_message_in_idle_cb where it's not obvious at first glance
why we don't need the lock.

(Having the combined boolean at the top of the block of write-related
struct members improves struct packing on 64-bit platforms, by packing
read_num_ancillary_messages and output_pending into one word.)
Comment 27 Simon McVittie 2011-09-13 16:19:05 UTC
Created attachment 196411 [details] [review]
[2/6 v2 for 2.28] GDBusConnection: delegate to the worker to close the  stream

We can't safely close the output part of the I/O stream until any
pending write or flush has been completed. In the worst case, this could
lead to an assertion failure in the worker (when the close wins the
race) or not closing the stream at all (when the write wins the race).

---

I think this incorporates all of Colin's review feedback that is applicable to 2.28:

- variables renamed: GList *pending_close_attempts, *pending_flush_attempts;
- add "schedule_write_in_worker_thread:" heading to its pseudo-gtkdoc comment

Using g_main_context_invoke_full() is not a trivial change, because it calls the callback inline if it can acquire the thread-default main context on behalf of the calling thread. That's not what we want here: it seems we genuinely do need an idle to be used.

I did try using g_main_context_invoke_full on the 2.30 branch, and once I'd fixed the obvious deadlock (must not hold worker->write_lock across the g_main_context_invoke_full call), I still saw deadlocks in my gdbus-proxy-threads stress-test. One notable thing about that test is that it uses GDBusProxy synchronously from threads that don't have their own thread-default main context - which seems to be something that's done "in real life", notably in Tracker.
Comment 28 Simon McVittie 2011-09-13 16:20:24 UTC
Created attachment 196412 [details] [review]
[3/6 v2 for 2.28] GDBusWorker: annotate more functions with lock/pending  state

Now with locking-discipline annotations on the writing-related struct members, too.
Comment 29 Simon McVittie 2011-09-13 16:21:08 UTC
Created attachment 196413 [details] [review]
[4/6 v2 for 2.28] Regression test for closing a stream, slowly, while a  message is queued
Comment 30 Simon McVittie 2011-09-13 16:21:38 UTC
Created attachment 196414 [details] [review]
[5/6 v2 for 2.28] GDBusWorker: as a precaution, access 'stopped' boolean  atomically

This member is written in _g_dbus_worker_stop from arbitrary threads, and
read by the worker thread, so it should be accessed atomically.
Comment 31 Simon McVittie 2011-09-13 16:23:14 UTC
(In reply to comment #29)
> [4/6 v2 for 2.28] Regression test for closing a stream, slowly, while a 
> message is queued

I think this addresses Colin's review feedback, except for g_unix_open_pipe() which isn't available in 2.28.
Comment 32 Simon McVittie 2011-09-13 16:24:26 UTC
Created attachment 196416 [details] [review]
[6/6 for 2.28] _g_dbus_worker_new: ref worker until after  _g_dbus_worker_thread_begin_func

Otherwise, if the caller of _g_dbus_worker_new immediately unrefs the 
worker, calling _g_dbus_worker_thread_begin_func in the worker thread
could be a use-after-free.

---

New patch: I noticed this problem while rebasing to 2.30. This is a backport.
Comment 33 Simon McVittie 2011-09-13 16:41:34 UTC
Created attachment 196418 [details] [review]
[1/6 for 2.30] GDBusWorker: combine num_writes_pending with flush_pending

The same as the corresponding 2.28 patch, but it deletes num_writes_pending from a different location (the struct packing had already been improved).
Comment 34 Simon McVittie 2011-09-13 16:43:54 UTC
Created attachment 196419 [details] [review]
[2/6 for 2.30] GDBusConnection: delegate to the worker to close the   stream

Non-trivial changes relative to the 2.28 version:

- _g_dbus_worker_stop no longer needs to unref in an idle, because
  it now calls _g_dbus_worker_close, which unrefs in an idle anyway:
  so comit 322e25b35 can essentially be reverted
Comment 35 Simon McVittie 2011-09-13 16:45:13 UTC
Created attachment 196420 [details] [review]
[3/6 for 2.30] GDBusWorker: annotate more functions with lock/pending state

Almost identical to the 2.28 version. I've checked that it's all still relevant.
Comment 36 Simon McVittie 2011-09-13 16:46:22 UTC
Created attachment 196422 [details] [review]
[4/6 v2 for 2.30] Regression test for closing a stream, slowly, while a  message is queued

Changes relative to 2.28: the Makefile.am fixes some conflicts, and the test itself uses g_unix_open_pipe().
Comment 37 Simon McVittie 2011-09-13 16:47:14 UTC
Created attachment 196423 [details] [review]
[5/6 for 2.30] GDBusWorker: as a precaution, access 'stopped' boolean atomically

Essentially the same as the 2.28 version; it just fixes some conflicts.
Comment 38 Simon McVittie 2011-09-13 16:48:54 UTC
Created attachment 196424 [details] [review]
[6/6 for 2.30] _g_dbus_worker_new: hold a ref until we have done the initial read

Otherwise, we could use-after-free the GDBusWorker, if its last-unref
is immediately after _g_dbus_worker_new returns (before the worker thread
does its initial read).

---

The code structure in 2.30 differs significantly from 2.28, but the bug and the fix are basically the same as Attachment #196416 [details].
Comment 39 Simon McVittie 2011-09-13 17:29:32 UTC
The 2.30 patches apply cleanly to master and still pass tests. The 2.28 patches will conflict badly.
Comment 40 David Zeuthen (not reading bugmail) 2011-09-16 16:14:31 UTC
OK, I've committed the 2.30 patches to master and glib-2-30 branches and the 2.28 patches to the glib-2-28 branch. Thanks!
Comment 41 Philip Withnall 2017-11-15 12:58:55 UTC
*** Bug 659085 has been marked as a duplicate of this bug. ***