GNOME Bugzilla – Bug 674473
Crash when restarting NetworkManager
Last modified: 2012-04-23 22:51:53 UTC
I got this while updating NetworkManager; note that the daemon is restarted upon upgrade Core was generated by `/usr/bin/gnome-shell'. Program terminated with signal 11, Segmentation fault.
+ Trace 230110
Thread 1 (Thread 0x7f3ccc89e9c0 (LWP 1000))
Found the bug (also reported as https://bugzilla.redhat.com/show_bug.cgi?id=806649), it is in libnm-glib, so reassigning. Btw, I found multiple ways to fix it, each with different problems. Up to dcbw to decide.
Created attachment 212438 [details] [review] NMClient: don't force dispose active connections when NM restarts Disposing of objects breaks expectations of API users (it removes signal connections, for example), and may leave them with disposed (and thus totally unusable) objects that won't go away until finalized. ---- This is the simplest fix possible. It leaves a disposed NMActiveConnection around, though, so we get criticals trying to disconnect signals afterwards, and the object is kept in cache, despite not getting updates from DBus.
Created attachment 212439 [details] [review] NMActiveConnection: clear out devices on dispose The devices property may still be accessed after dispose (since that is forced by NMClient when NM is restarted), and not clearing causes an invalid memory access and a segfault. ----- This is the simplest fix possible. It leaves a disposed NMActiveConnection around, though, so we get criticals trying to disconnect signals afterwards, and the object is kept in cache, despite not getting updates from DBus. ----- Obviously, this is the patch I meant.
Created attachment 212440 [details] [review] NMObjectCache: don't reuse disposed objects If a object is disposed, it still exists in the cache until finalized (since GDatalist members in GObject are cleared in finalize()), but cannot be used. So use a weak ref instead, as those are cleared in dispose(). ------ This is another possibility. It ensures NMActiveConnection is out of cache after disposal (so nm_active_connection_get_devices() is back to work and the icon updates itself correctly), but apparently doesn't prevent the object from being used for a short while (we still get the criticals).
Created attachment 212441 [details] [review] NMClient: don't force dispose active connections when NM restarts Disposing of objects breaks expectations of API users (it removes signal connections, for example), and may leave them with disposed (and thus totally unusable) objects that won't go away until finalized. ------ This is the third, and most invasive, fix. It simply avoids disposing the objects at all, leaving that to the GC or to manual memory management from the user. It fixes all noticeable problems, including criticals, but could cause leaks.
The problem with #3 is that NMDevice and NMActiveConnection both reference each other, so they'll never be freed if we don't break that. The change to break circular ref and ensure that we correctly disposed things was made in 1138f136f8c9b126b5090906362db1c7519d39d7. Before that change, objects weren't being disposed, and thus weren't being removed from the object cache, which caused duplicate devices to be returned in nm_client_get_devices() for a variety of reasons, and also caused crashes because the old object (which should have been disposed when NM quit) would have the same object path as a new object from the newly started NM. I'm pretty sure I tried clearing out the object cache while developing 1138f136 and that had some issues too. In reality the only cached top-level object is NMClient, and when the NMClient goes away everything else should go away as well. Not quite sure how to fix this yet...
I think at least we need to ensure the cache is cleaned so that internally we don't get old objects. In reality the cache is probably too simplistic with it's removal of objects when they're destroyed; that behavior fails if clients have refcounting problems and NM subsequently restarts. So I guess we should re-investigate (a) making sure the ref cycle is broken between NMDevice and NMActiveConnection, and (b) dropping the entire cache on the floor when NM restarts.
Comment on attachment 212439 [details] [review] NMActiveConnection: clear out devices on dispose NULLing things on dispose is probably a good idea regardless of what else we do. We'd want to fix more than just that one specific field though.
Created attachment 212616 [details] [review] libnm-glib: NULL out priv fields on dispose() In some situations, objects might get used after being disposed, so clear out their various priv fields so we don't try to access unreffed objects, freed strings, etc.
Created attachment 212617 [details] [review] libnm-glib: discard devices and active connections when NM goes down When NMClient changes state to "not running", don't just unref all the devices and connections: emit notify::active-connections and device-removed signals too, so the app will drop its copies of them. ====== another approach, which I think should work...
Review of attachment 212616 [details] [review]: Looks good.
Review of attachment 212617 [details] [review]: Looks good too.
still needs another patch to empty out the NMObject cache
Created attachment 212636 [details] [review] Clear object cache when NM stops
Comment on attachment 212636 [details] [review] Clear object cache when NM stops looks right
Review of attachment 212636 [details] [review]: 48981a6166208152890b6c57af19cc3c5db5837f
With latest git libnm-glib, I can restart NM all day and it doesn't appear to affect the shell or nm-applet. I do get some clutter warnings in ~/.xsession-errors that might be related to the Shell indicator but I can't be certain since they aren't prefixed with a process ID or name.