GNOME Bugzilla – Bug 738941
[review] fixes for losing DBUS connection and ignore SIGPIPE
Last modified: 2014-10-27 19:51:32 UTC
NM has several issues when stopping the DBUS service. Especially, once restarting the DBUS service, NM does not recover (which is however not fixed by these patches).
th/bgo738941_dbus_stop The last commit "wifi: avoid assertion about dupplicate..." is not optimal. I did not find the reason for the assertion.
> settings: pass valid error domain to g_set_error() in load_plugins() You should pass a valid error code too (NM_SETTINGS_ERROR_FAILED). > core: unset "exit-on-close" of dbus connections If, as you say above, NM does not recover after dbus-daemon restarts, then it seems like letting it crash is preferable, since then it's more obvious why networking is broken.
(In reply to comment #2) > > settings: pass valid error domain to g_set_error() in load_plugins() > > You should pass a valid error code too (NM_SETTINGS_ERROR_FAILED). That commit was not part of the branch, it's already on master. I added commit: >> settings: pass valid error code to g_set_error() in load_plugins() > > > > core: unset "exit-on-close" of dbus connections > > If, as you say above, NM does not recover after dbus-daemon restarts, then it > seems like letting it crash is preferable, since then it's more obvious why > networking is broken. NM is supposed to work without DBUS daemon running and in consequence, it should handle a DBUS restart. I noticed further issues, that require investigation. But for now, this is an actual bug that should be fixed.
(In reply to comment #1) > th/bgo738941_dbus_stop > > > The last commit "wifi: avoid assertion about dupplicate..." is not optimal. I > did not find the reason for the assertion. "dupplicate" -> "duplicate" in the git shortlog But this happens because when the supplicant stops (due to the name-owner-change, which is triggered because dbus-daemon quit) the supplicant manager sets all supplicant interfaces to the DOWN state so they can be cleaned up. That does two things: 1) calls supplicant_interface_acquire() to attempt to re-launch wpa_supplicant in case wpa_supplicant segfaulted 2) moves the NMDevicWifi to UNAVAILABLE state because the supplicant is gone, the device is no longer usable and we must terminate the connection and wait for the supplicant to come back But #2 also ends up calling supplicant_interface_acquire(), because that's what we want to do when the NMDeviceWifi is first managed (at startup) and when the supplicant dies. The code just doesn't differentiate between the two cases. So the solutions here are either to: a) like your patch does, allow duplicates, which I think is probably fine because this operation doesn't really need to be nested or anything like that b) in nm-device-wifi.c::device_state_changed() don't call supplicant_interface_acquire() if the old_state > NM_DEVICE_STATE_DISCONNECTED, because if the device was ever available that meant it used to have a supplicant interface, and somehow that supplicant interface is gone, which implies that we already called supplicant_interface_acquire(). But I vote for (a) because I'd rather try to acquire a supplicant interface more often, than miss some edge-case that causes the nMDeviceWifi to be stuck in UNAVAILABLE state.
And the rest of the branch looks OK to me, I agree that if the bus daemon quits we shouldn't quit NM since we don't require the bus daemon any more. Yeah, in the past we used to just say "tough, dbus doesn't die", and I guess with kdbus it won't be an issue either, but eh.
(In reply to comment #3) > NM is supposed to work without DBUS daemon running and in consequence, it > should handle a DBUS restart. No, those are two separate running modes; it doesn't make any attempt to do a dbus-not-running to dbus-running transition if that happens while NM is running either. And at any rate, even if is NM was supposed to handle D-Bus restarts, my point is that *it still doesn't* even with your patch, and the patch actually makes things worse, because it makes NM switch from obviously failing, to silently failing.
(In reply to comment #6) > > And at any rate, even if is NM was supposed to handle D-Bus restarts, my point > is that *it still doesn't* even with your patch, and the patch actually makes > things worse, because it makes NM switch from obviously failing, to silently > failing. I dropped this patch. The remaining patches were merged to master as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d0535cafe504ef23a2da6af241900b7bda69386e
Created attachment 289479 [details] [review] [patch] unsed "exit-on-close" on DBUS connection Attach the dropped patch here for reference or later use.