GNOME Bugzilla – Bug 795940
libgdm uses weird weak refs where it doesn't need them
Last modified: 2018-05-16 15:04:11 UTC
See https://gitlab.gnome.org/GNOME/gnome-shell/issues/227 Iain Lane @iainl commented 3 days ago Right, I think this is a GDM problem... Here's my results. I think @3v1n0 is also having a look, but if this triggers any insight let us know. I ran with G_DEBUG=fatal-criticals (actually this is difficult to get working with gdm easily, so I set this in code and called g_debug_init() again myself to make it take effect...), and also added a breakpoint on g_dbus_connection_dispose to see where it was actually being disposed. It's disposed like this: (gdb) bt
+ Trace 238605
but of course we just disposed it. So it was disposed by the weak refs. That code seems a bit suspicious to me - the GDBusProxy instances handle reffing and unreffing the connection, so I'm not sure what it is trying to do? I prepped a test commit removing all of those weak refs and a few attempts to reproduce didn't manage to. Perhaps I just got lucky, or perhaps those weak refs are needed for something, but this seems to be some indication at least.
Created attachment 371816 [details] [review] libgdm: Drop weak refs on the GDBusConnection The GDBusProxies hold a strong reference to the connection themselves. We don't need to do this, and we avoid the potential of operating on a disposed GDBusConnection if we leave them alone.
Review of attachment 371816 [details] [review]: I agree with Iain that no further unref should be performed on the proxy disposition. However probably you should also use g_object_add_weak_pointer to nullify the connection's client instance on connection disposition. ::: libgdm/gdm-client.c @@ +698,3 @@ error); + g_object_unref (connection); What about using g_autoptr(GDBusConnection) instead?
note that the client->priv->connection gets ref'd each time it's opened, too.
also the manager weak ref stuff is just as wonky as the connection weak ref stuff
(In reply to Ray Strode [halfline] from comment #3) > note that the client->priv->connection gets ref'd each time it's opened, too. Yep, I noticed, forgot to mention it in the review though... If we want get rid of the weak ref's at all, for sure we need to remove this. And in that case not sure we want to do it for the manager too; and as you said I think we can get rid both; as in any case if there's a pointer around we can be sure we will reuse it without all the ref/unreffing that is currently done with weak pointers.
Mhmh... I was getting rid of the weak refs, but then I've noticed again something that also was taking my attention today when I was looking into a solution that wasn't removing these weak refs... In fact, while they're clearly something not really something easily maintainable (and this bug is clearly an example :)), at the same time even removing them and just unreffing when needed, won't change the result as I think the problem could be somewhere else: When we try to initialize a GConnection async, but we've already defined it (in gdm_client_open_connection), we just reuse that instance, passing it to the task with g_task_return_pointer. When then calling gdm_client_open_connection_finish we always unref that instance, unless in the case that we set it as client->priv->connection (when it wasn't defined before). But, what if that was already the case? In such case, we just eat a reference because in the subsequent calls when we do the weak-ref, we expect that the the connection has been ref'ed and needs to be unreffed once we've done. Thus a such change is enough for fixing the things https://paste.ubuntu.com/p/XRdnD9WHYm/ While I don't see leaks coming. There are still some changes and cleanups, and I'm still down with removing the weak refs, and I was thinking to add something like https://paste.ubuntu.com/p/kg7NdnCPp3/ on top of Laney's change. Any thoughts?
(In reply to Marco Trevisan (Treviño) from comment #6) > In fact, while they're clearly something not really something easily > maintainable (and this bug is clearly an example :)), at the same time even > removing them and just unreffing when needed, won't change the result as I > think the problem could be somewhere else: > > When we try to initialize a GConnection async, but we've already defined it > (in gdm_client_open_connection), we just reuse that instance, passing it to > the task with g_task_return_pointer. > > When then calling gdm_client_open_connection_finish we always unref that > instance, unless in the case that we set it as client->priv->connection > (when it wasn't defined before). But, what if that was already the case? > In such case, we just eat a reference because in the subsequent calls when > we do the weak-ref, we expect that the the connection has been ref'ed and > needs to be unreffed once we've done. > > Thus a such change is enough for fixing the things > https://paste.ubuntu.com/p/XRdnD9WHYm/ > While I don't see leaks coming. Right, because of the autoptr. I don't understand though - the pointer that this comes from should itself be reffed shouldn't it? I think it comes from if (client->priv->connection != NULL) { g_task_return_pointer (task, g_object_ref (client->priv->connection), (GDestroyNotify) g_object_unref); g_object_unref (task); return; } in gdm_client_open_connection(). So I'm obviously missing something, because it seems that unreffing in the case where you got the same connection back should work even in the current code. Anyway... > There are still some changes and cleanups, and I'm still down with removing > the weak refs, and I was thinking to add something like > https://paste.ubuntu.com/p/kg7NdnCPp3/ on top of Laney's change. > > Any thoughts? I'm going to attach a patch in a second that is sort of similar to yours. Except that I don't ref priv->connection when handing it out and unref when finshing with it each time. The GdmClient keeps one reference and internally we always use just that one. To me this is a bit simpler to understand. When we hand that connection out of our control, that code should handle the ref/unref itself. GDBusProxy does, which is what we really care about. When we're finalized, that reference is dropped but after this point nobody is going to be making calls on us anyway. And yeah, a similar exercise for the manager would be good too I think.
Created attachment 371831 [details] [review] libgdm: Drop weak refs on the GDBusConnection The GDBusProxies hold a strong reference to the connection themselves. We don't need to do this, and we avoid the potential of operating on a disposed GDBusConnection if we leave them alone. The GdmClient itself keeps one strong reference to the connection to ensure that it is there when anything wants to dispatch an operation.
Review of attachment 371831 [details] [review]: (In reply to Iain Lane from comment #7) > Right, because of the autoptr. I don't understand though - the pointer that > this comes from should itself be reffed shouldn't it? I think it comes from > > if (client->priv->connection != NULL) { > g_task_return_pointer (task, > g_object_ref (client->priv->connection), > (GDestroyNotify) g_object_unref); > g_object_unref (task); > return; > } > > in gdm_client_open_connection(). Yes exactly, it can come from there or from on_connected, then while in the `on_connected` case we've just created a new connection and thus is fine to steal the ref if we use for the shared private connection, in case we reuse something we already have indeed we've to ref when passing it, but in case we reuse it we don't have to unref it in the current scenario where weak_references are used. > So I'm obviously missing something, because > it seems that unreffing in the case where you got the same connection back > should work even in the current code. Anyway... Yeah, I'm not sure if I said it clearly before, or I just thought, but the current weak-ref system, although it's a bit tricky and not easy to matain is actually correct in the way it does everything. It's just wrong at this point, as if an async call reuses the connection, we've to ref again before using or when its last async customer will be unreffed, the ref/unref pair won't match. > > There are still some changes and cleanups, and I'm still down with removing > > the weak refs, and I was thinking to add something like > > https://paste.ubuntu.com/p/kg7NdnCPp3/ on top of Laney's change. > > > > Any thoughts? > > I'm going to attach a patch in a second that is sort of similar to yours. > Except that I don't ref priv->connection when handing it out and unref when > finshing with it each time. The GdmClient keeps one reference and internally > we always use just that one. To me this is a bit simpler to understand. When > we hand that connection out of our control, that code should handle the > ref/unref itself. GDBusProxy does, which is what we really care about. When > we're finalized, that reference is dropped but after this point nobody is > going to be making calls on us anyway. That's another option, however somewhere having references is needed if we don't want to unef and then recreate connections for async tasks coming closly. PS: in any case I think we're still trying to get this for micro-optimizations, especially thinking that on top of this there's JS and... We know that it doesn't really has this attention when it comes to create proxies, connections or anything it needs to be fancy. ::: libgdm/gdm-client.c @@ +486,1 @@ (GDestroyNotify) g_object_unref); I think reffing unreffing was a bit easier to read here, while I'd be down to use the autoptr anyway for cleanup @@ +581,3 @@ + * holding onto! */ + if (client->priv->connection == NULL || + client->priv->connection == connection) { I was doing this too, but the reason I changed this is because in my final change what should be done at this level is something like: if (client->priv->connection == NULL) { g_print("STEALING CONNECTION\n"); client->priv->connection = g_steal_pointer (&connection); g_object_add_weak_pointer (G_OBJECT (client->priv->connection), (gpointer *) &client->priv->connection); } else if (client->priv->connection == connection) { g_steal_pointer (&connection); } (and the same should be done in gdm_client_open_connection_sync too. So that's why I did it differently. @@ +607,3 @@ g_task_return_pointer (task, + client->priv->connection, + NULL); Mh, I'd keep the reference too, so in case the task happens just a bit after that last connection has been unreffed, we just don't recreate one.
So to be clear, a part from the refactory, I think the full fix for the bug we have is just https://paste.ubuntu.com/p/BTVPjvXsvH/ Then we can proceed with removal of weak refs too if we want and just use one ref that is kept by the proxies, and a pointer to track it (which still should be unset when connection is disposed, though)
(In reply to Marco Trevisan (Treviño) from comment #9) > It's just wrong at this point, as if an async call reuses the connection, > we've to ref again before using or when its last async customer will be > unreffed, the ref/unref pair won't match. Why ref again? We just did that before returning it to the GTask. > That's another option, however somewhere having references is needed if we > don't want to unef and then recreate connections for async tasks coming > closly. The GdmClient always holds a reference, and while that is around we will have the connection. I think you only need this complicated weak ref stuff if you want to tear down the connection when it's not actively being used, like if holding the thing open is very costly. > I was doing this too, but the reason I changed this is because in my final > change what should be done at this level is something like: > > if (client->priv->connection == NULL) { > g_print("STEALING CONNECTION\n"); > client->priv->connection = g_steal_pointer (&connection); > g_object_add_weak_pointer (G_OBJECT > (client->priv->connection), > (gpointer *) > &client->priv->connection); > } else if (client->priv->connection == connection) { > g_steal_pointer (&connection); > } Apart from the weak pointer bit, which I'm arguing to remove, that's exactly the same as what I did. If you steal a pointer without assigning it to anything it's the same as just making it NULL, so all that's happening is that in this case the autoptr leaves it alone. > @@ +607,3 @@ > g_task_return_pointer (task, > + client->priv->connection, > + NULL); > > Mh, I'd keep the reference too, so in case the task happens just a bit after > that last connection has been unreffed, we just don't recreate one. It's only going to be unreferenced if the GdmClient is destroyed. I think if you're making calls on a GdmClient after you've dropped your reference to it, that is programmer error. (In reply to Marco Trevisan (Treviño) from comment #10) > So to be clear, a part from the refactory, I think the full fix for the bug > we have is just https://paste.ubuntu.com/p/BTVPjvXsvH/ > > Then we can proceed with removal of weak refs too if we want and just use > one ref that is kept by the proxies, and a pointer to track it (which still > should be unset when connection is disposed, though) Right, it's clear that this always was a reference counting bug and we could fix that directly. I thought we had decided (by this bug's title) to fix the confusing weak ref system instead of just patching that bug directly. I'll let you and Ray decide what's best from here. Personally I vote for scrapping the whole system and tying the connection's lifecycle to the client object, even if it means keeping the connection around when it could technically be dropped.
Created attachment 371850 [details] [review] client: Don't unref the connection we're corrently using in tasks If an async task tries to reuse the connection we already have, it unrefs it while the assumption in the whole system is that we always ref the connection before using it, and its up to the weak references to unref once their customers are done.
(In reply to Iain Lane from comment #11) > (In reply to Marco Trevisan (Treviño) from comment #9) > > It's just wrong at this point, as if an async call reuses the connection, > > we've to ref again before using or when its last async customer will be > > unreffed, the ref/unref pair won't match. > > Why ref again? We just did that before returning it to the GTask. Wait we're talking of two different scenarios... That was referred to the case where we keep the current weak refs system, and thus in this case it's needed. > Apart from the weak pointer bit, which I'm arguing to remove, that's exactly > the same as what I did. If you steal a pointer without assigning it to > anything it's the same as just making it NULL, so all that's happening is > that in this case the autoptr leaves it alone. Yeah, indeed... I was referring to the fact why I used an else statement for that, as I'm keen to still add a weak pointer for that connection. > > @@ +607,3 @@ > > g_task_return_pointer (task, > > + client->priv->connection, > > + NULL); > > > > Mh, I'd keep the reference too, so in case the task happens just a bit after > > that last connection has been unreffed, we just don't recreate one. > > It's only going to be unreferenced if the GdmClient is destroyed. I think if > you're making calls on a GdmClient after you've dropped your reference to > it, that is programmer error. Again, I was referring to the case were we still want to make sure that when the last connection is not used we destroy it... In such case I'd prefer this way. Not that it would change the result, it's just a bit optimized. > (In reply to Marco Trevisan (Treviño) from comment #10) > > So to be clear, a part from the refactory, I think the full fix for the bug > > we have is just https://paste.ubuntu.com/p/BTVPjvXsvH/ > > > > Then we can proceed with removal of weak refs too if we want and just use > > one ref that is kept by the proxies, and a pointer to track it (which still > > should be unset when connection is disposed, though) > > Right, it's clear that this always was a reference counting bug and we could > fix that directly. I thought we had decided (by this bug's title) to fix the > confusing weak ref system instead of just patching that bug directly. Yes, what I'd prefer to do here, is first to push a commit with the actual fix (which will still apply to both cases), then do the refactory if we want (and I'm down with it). > I'll let you and Ray decide what's best from here. Personally I vote for > scrapping the whole system and tying the connection's lifecycle to the > client object, even if it means keeping the connection around when it could > technically be dropped. I agree, we can still have the dropping with a simplified method though.
Created attachment 372091 [details] [review] client: Don't unref a connection that's in use If an async task tries to reuse an open connection, it erroneously explicitly unrefs it. That is incorrect, because there are weak references in use to handle disposing the connection when its no longer in use. This commit makes sure the local connection object in open_connection is nullified so the connection doesn't get autofree'd.
Created attachment 372092 [details] [review] client: add weak pointer for connection object At the moment we fail to nullify GdmClient's connection to GDM when the connection is disposed. This commit adds a weak pointer to correct that mistake.
Created attachment 372093 [details] [review] libgdm: drop support for serializing multiple opens Right now libgdm tries to handle multiple simultaneous open calls at the same time by serializing the requests and giving them all the same connection. It's broken, though. - The pending_opens list is never populated, so we end up just doing multiple simultaneous open operations at a time anyway. - The finish code ends up calling g_task_return_error (task, NULL) instead of g_task_return_pointer in the non-error case. Since the feature doesn't work, drop it for now.
Created attachment 372094 [details] [review] libgdm: fix pointer/boolean task confusion The manager fetching code in GdmClient treats its task return value as boolean, but it's actually a pointer (the manager) This commit corrects the confusion.
Created attachment 372095 [details] [review] libgdm: don't keep manager proxy around longer than we need it Right now we keep the manager proxy alive long after we need it. It doesn't get cleared until one of the other proxies go away. That is not only unnecessary but illogical and confusing. This commit changes the manager proxy to be transient—only alive long enough to get what we need from it.
Created attachment 372096 [details] [review] libgdm: use g_object_unref instead of g_clear_object for weakrefs At the moment we add a weakref on each proxy to the connection object. For the _sync variant functions, When the weakref fires, they call g_clear_object, clearing the connection, even if other proxies still have a reference. This commit changes that weak ref code to use g_object_unref instead.
Created attachment 372097 [details] [review] libgdm: get connection explicitly At the moment we call gdm_client_open_connection and when it finishes, assume client->priv->connection is implicitly initialized. This commit makes the operation more explicit by changing gdm_client_open_connection to gdm_client_get_connection and returning the GDBusConnection object directly, instead of returning a boolean.
Created attachment 372098 [details] [review] libgdm: Drop weak refs on the GDBusConnection The GDBusProxies hold a strong reference to the connection themselves, so maintaining separate weak references is unnecessary. This commit drops those extraneous weak references.
so i need to hammer on that more tomorrow ⬆ (also available here: https://git.gnome.org/browse/gdm/log/?h=wip/weak-ref-removal ) but i'll probably end up pushing and doing a release after that unless there's feedback for you guys.
Review of attachment 372098 [details] [review]: LGTM
Review of attachment 372096 [details] [review]: LGTM
Review of attachment 372095 [details] [review]: LGTM
Review of attachment 372094 [details] [review]: LGTM
Review of attachment 372093 [details] [review]: LGTM
I think overall this is a good refactor with fixes. I'll actually do proper tests tomorrow morning, but code side all makes sense.
Ah, commit-message side I've noticed that mine have the `client:` prefix while others are `libgdm:` ones, so I guess we should be consisten with these, weather it will be just `libgdm:`, `client:` or even `libgdm, client:` :)
It all looks good to me, with one question. Thanks for the work. I'm not sure why there's still this internal refcounting of client->priv->connection; can you explain? In my patch I had lazily initialised it, and kept one reference around which was tied to the lifetime of the GdmClient. Is it so you can drop the connection as soon as it's not used? If so, is that better than potentially destroying and creating a connection multiple times?
(In reply to Iain Lane from comment #31) > I'm not sure why there's still this internal refcounting of > client->priv->connection; can you explain? In my patch I had lazily > initialised it, and kept one reference around which was tied to the lifetime > of the GdmClient. Is it so you can drop the connection as soon as it's not > used? Right, I think it's better not to have a connection open when just sitting at the user list. I think it might leave a worker process running as a side effect (though I didn't confirm that). > If so, is that better than potentially destroying and creating a > connection multiple times? Well, we have two proxies potentially using the connection concurrently (UserVerifier for authentication, and Greeter session selection and starting the session). They need to use the same connection, since the session needs to be started from the same worker process the user authenticated with.
Attachment 372093 [details] pushed as 6f8498c - libgdm: drop support for serializing multiple opens Attachment 372094 [details] pushed as f57e11b - libgdm: fix pointer/boolean task confusion Attachment 372095 [details] pushed as 4317394 - libgdm: don't keep manager proxy around longer than we need it Attachment 372096 [details] pushed as 85232f3 - libgdm: use g_object_unref instead of g_clear_object for weakrefs Attachment 372097 [details] pushed as dc66c87 - libgdm: get connection explicitly Attachment 372098 [details] pushed as a1b4916 - libgdm: Drop weak refs on the GDBusConnection