After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 723168 - [review] take reference in NMRemoteConnection before calling DBUS
[review] take reference in NMRemoteConnection before calling DBUS
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: API
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2014-01-28 14:40 UTC by Thomas Haller
Modified: 2014-02-21 15:31 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Thomas Haller 2014-01-28 14:40:09 UTC
Take a reference on NMRemoteConnection before calling any of the async DBUS messages.

Please see branch th/ref_NMRemoteConnection
Comment 1 Thomas Haller 2014-01-28 14:41:26 UTC
[Paste here a first review by dcbw:]

[[quoting]]

The one case where things might go wrong is if the D-Bus connection is disconnected for some reason; then the DBusGProxy may not call the callback and the remote calls may not get cleaned up.  Perhaps look at dbus-glib and verify what happens with the proxy callback if the connection gets dropped?  We can discuss more over IRC if you like.

also, nm_remote_connection_get_secrets() *does* require a callback per the documentation, and it would be odd to call it without one, so I think the g_return_if_fail () should be restored there.

if the dbus connection is dropped, or the service that is  being called goes away, the DBusGProxy object will emit the GObject  'destroyed' signal which we can use to clean up all the RemoteCall  structures from NMRemoteSettingsPrivate->calls.
Comment 2 Thomas Haller 2014-01-28 14:46:19 UTC
Looking at dbus-glib (dbus/dbus-gproxy.c), the DESTROY signal gets only emitted in the dbus_g_proxy_dispose().

Since NMRemoteConnection holds a reference to it's proxy, this signal will not get invoked (before NMRemoteConnection's dispose() method).

So, I think this is save as is... (?)
Comment 3 Dan Williams 2014-01-29 18:00:17 UTC
Except that dbus-gproxy.c calls "dbus_g_proxy_destroy()" in two places, and that function actually runs the 'dispose()' gobject handler, and effectively destroys/deinits the proxy outside of recounting.  Which is technically valid, because since the owner is gone, the proxy cannot communicate with the owner anymore, and the thing that owns the proxy should stop using it via the 'destroy' signal.

Two cases:

1) if the proxy was created with dbus_g_proxy_new_for_name_owner() then when that specific process goes away, the proxy will be force-destroyed.  NetworkManager doesn't do this anywhere, so we don't care about it.

2) If the client gets disconnected by D-Bus, like if the client does something bad or tries to send non-UTF8 as a string (both of which are bugs, but shouldn't cause crashes) then the proxy will be force-destroyed.  The client might be badly written, and the error might be nowhere near libnm-glib code, but libnm-glib will still have to deal with the disconnection.

So in both these cases, the proxy will be destroyed/dinitialized and no longer usable, but will *not* be freed because its reference count is still > 0.  So using the proxy won't result in a crash, but its callback will never be called, and libnm-glib will never release the reference to it.

To handle this, we could abuse "proxy" argument to the result_cb() and get_secrets_cb() callbacks by using NULL to mean "dbus disconnected us, send an error to the callback".  RemoteCall would need a new "proxy_callback" member which would store the callback passed to dbus_g_proxy_begin_call().

So, NMRemoteConnection would attach to the DBusGProxy's destroy signal, and when received, iterate through priv->calls, calling each call->proxy_callback (NULL, NULL, error); and result_cb() and get_secrets_cb() could look like this:

	if (proxy)
		dbus_g_proxy_end_call (proxy, proxy_call, &error, G_TYPE_INVALID);
	else {
		error = g_error_new_literal (NM_REMOTE_CONNECTION_ERROR,
		                             NM_REMOTE_CONNECTION_ERROR_DISCONNECTED,
		                             "Disconnected by D-Bus");
	}
	if (func)
		(*func) (call->self, error, call->user_data);
	g_clear_error (&error);
	remote_call_complete (call);

or something like that.  If it's clearer, we could instead of call->proxy_callback, have a call->destroy_callback(RemoteCall *call, GError *error); that would pass an already-created error in so that it didn't have to be allocated every time.  Then the destroy_callback would be something like:

static void
result_destroy_cb (RemoteCall *call, GError *error)
{
	NMRemoteConnectionResultFunc func = (NMRemoteConnectionResultFunc) call->callback;

	if (func)
		(*func) (call->self, error, call->user_data);
	remote_call_complete (call);
}

and the same for get_secrets_destroy_cb().

In both cases, the signal handler for proxy::destroy must use:

while (priv->calls) {
    RemoteCall *call;

    call->destroy_callback (or call->proxy_callback);
}

because the callbacks both call remote_call_complete() which removes the item from priv->calls, and thus we can use iteration when destroying the calls from the proxy destroy handler.
Comment 4 Thomas Haller 2014-01-29 20:58:42 UTC
(In reply to comment #3)

You are correct, the proxy can become destroyed. If we want to care about that (which the code did not up to now), then we also have to check for is_destroyed before any dbus_g_proxy_begin_call() call.

I did a rework, and pushed a !fixup on top of it.

Notably, get_secrets_cb() now frees the "secrets" hash. I think, before this was a leak (wasn't it?).
Comment 5 Dan Williams 2014-01-30 00:25:09 UTC
The 'error' handling is a little odd, but it works...  the rest looks OK to me, but get danw's thoughts on it too if you don't mind.

And yes, it does appear that get_secrets_cb() was leaking the hash.  Would you mind backporting that little bit to NM_0_9_8 too?
Comment 6 Thomas Haller 2014-01-30 10:15:19 UTC
(In reply to comment #5)
> The 'error' handling is a little odd, but it works...  the rest looks OK to me,
> but get danw's thoughts on it too if you don't mind.
> 
> And yes, it does appear that get_secrets_cb() was leaking the hash.  Would you
> mind backporting that little bit to NM_0_9_8 too?

Pushed two more !fixups.
Comment 7 Dan Winship 2014-01-30 12:26:50 UTC
(In reply to comment #4)
> You are correct, the proxy can become destroyed. If we want to care about that
> (which the code did not up to now)

Right, and we don't deal with this anywhere else do we?

If it hasn't been a problem up until now, and it can only happen in the case where the program is buggy (and the problem will go away when we eventually port to GDBus), then I'd say there's no reason to complicate the code to deal with it.

Actually, what is the problem here? Nothing anywhere describes any actual bug that this patch is fixing...
Comment 8 Thomas Haller 2014-01-30 12:40:43 UTC
(In reply to comment #7)

The motivation of the patch was not that that the proxy might be destroyed early, but that async methods are called, without having a ref on NMRemoteConnection. So, either the callback was never invoked in some cases or it crashed.

You already fixed this for nm_remote_connection_delete (commit ffa012f3cef96daead331c6f573e5357d7f0ad85) but not for the other async calls. Arguably, it is not likely that this happens, because it would need that somebody deletes the connection at a very specific moment, but it's still a lingering bug.
Comment 9 Dan Williams 2014-02-14 23:19:22 UTC
proxy_set_destroyed() and proxy_set_cb() both have the function starting brace on the same line as the function prototype.
Comment 10 Dan Williams 2014-02-14 23:35:37 UTC
Other than that, I can't think of a better way to ensure that when the proxy gets destroyed, any in-flight calls get an error returned.  To ensure that, we have to wrap all the dbus calls so that the proxy destroy signal calls all their callback functions with the error.
Comment 11 Thomas Haller 2014-02-18 19:06:25 UTC
Fixed style error and rebased to master.
Comment 12 Dan Williams 2014-02-20 02:28:53 UTC
Looks fine to me now; +1 to merge unless danw has any further suggestions?
Comment 13 Dan Winship 2014-02-21 14:11:19 UTC
> platform: refactor address_to_string() to return device as numeric if ifname is unknown

I would have _to_string_dev() return just the ifname/index, and put the " dev " part into the g_snprintf() format string in nm_platform_ip4_address_to_string(), etc, which I think would be more consistent with how the other fields are handled (in addition to simplifying things).


> libnm-glib: take reference in NMRemoteConnection before calling DBUS

>+typedef void (*RemoteCallFetchResultCb) (RemoteCall *call, DBusGProxyCall *proxy_call, GError **error);

You're passing the error *to* this function, not getting it back from the function, so it should be "GError *" not "GError **".

>+		error = g_error_new_literal (NM_REMOTE_CONNECTION_ERROR,
>+		                             NM_REMOTE_CONNECTION_ERROR_DISCONNECTED,
>+		                             "Disconnected by D-Bus");

error message should be _()'ed

>+		(*func)(call->self, *error, call->user_data);

space between ")(" (in two places)
Comment 14 Thomas Haller 2014-02-21 14:52:27 UTC
(In reply to comment #13)
> > platform: refactor address_to_string() to return device as numeric if ifname is unknown
> 
> I would have _to_string_dev() return just the ifname/index, and put the " dev "
> part into the g_snprintf() format string in
> nm_platform_ip4_address_to_string(), etc, which I think would be more
> consistent with how the other fields are handled (in addition to simplifying
> things).

Said commit a6767f215e0b7ab55949e4410d399f315a225578 was not part of this branch, it's already on master. The motivation for returning " dev %s" is that in the _to_string() functions, the entire " dev " part should be omited. So, if _dev_to_string would only return the previous part it would need:

-    g_snprintf (buffer, sizeof (buffer), "%s/%d lft %u pref %u time %u%s%s src %s",
+    g_snprintf (buffer, sizeof (buffer), "%s/%d lft %u pref %u time %u%s%s%s src %s",
                 s_address, address->plen, (guint)address->lifetime, (guint)address->preferred,
                 (guint)address->timestamp,
                 str_peer ? str_peer : "",
+                *str_dev ? " dev " : "",
                 str_dev,
                 source_to_string (address->source));

at 4 places (or something similar). Yes, would be possible too...



> > libnm-glib: take reference in NMRemoteConnection before calling DBUS
> 
> >+typedef void (*RemoteCallFetchResultCb) (RemoteCall *call, DBusGProxyCall *proxy_call, GError **error);
> 
> You're passing the error *to* this function, not getting it back from the
> function, so it should be "GError *" not "GError **".

I was aware of that. I pushed a fixup!, please see if you like that approach better.


> >+		error = g_error_new_literal (NM_REMOTE_CONNECTION_ERROR,
> >+		                             NM_REMOTE_CONNECTION_ERROR_DISCONNECTED,
> >+		                             "Disconnected by D-Bus");
> 
> error message should be _()'ed
> 
> >+		(*func)(call->self, *error, call->user_data);
> 
> space between ")(" (in two places)

fixed
Comment 15 Dan Winship 2014-02-21 15:10:12 UTC
(In reply to comment #14)
> > I would have _to_string_dev() return just the ifname/index

> Said commit a6767f215e0b7ab55949e4410d399f315a225578 was not part of this
> branch, it's already on master.

Yeah, noticed that belatedly. Anyway, I missed that it still returns empty string in some cases, so yes, the way it is makes sense.

> > You're passing the error *to* this function, not getting it back from the
> > function, so it should be "GError *" not "GError **".
> 
> I was aware of that. I pushed a fixup!, please see if you like that approach
> better.

Yeah, the fact that you have to use "local_error" is a bit unfortunate, but I think it's clearer than the previous version where sometimes an error gets passed in and sometimes it gets passed out.

> > >+		error = g_error_new_literal (NM_REMOTE_CONNECTION_ERROR,
> > >+		                             NM_REMOTE_CONNECTION_ERROR_DISCONNECTED,
> > >+		                             "Disconnected by D-Bus");
> > 
> > error message should be _()'ed

You didn't address this. But once you do I'd say it's good to commit.
Comment 16 Thomas Haller 2014-02-21 15:31:50 UTC
(In reply to comment #15)
> > > error message should be _()'ed
> 
> You didn't address this. But once you do I'd say it's good to commit.

Oh yes. Now I did, and merged to master.

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=f8dcab53d9ccdb9203d2f63872926c83413add52
http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=9968895e19690edbc2485a398d5d913ba5df91fb