After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 738941 - [review] fixes for losing DBUS connection and ignore SIGPIPE
[review] fixes for losing DBUS connection and ignore SIGPIPE
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2014-10-21 13:42 UTC by Thomas Haller
Modified: 2014-10-27 19:51 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[patch] unsed "exit-on-close" on DBUS connection (3.25 KB, patch)
2014-10-27 19:51 UTC, Thomas Haller
rejected Details | Review

Description Thomas Haller 2014-10-21 13:42:34 UTC
NM has several issues when stopping the DBUS service.

Especially, once restarting the DBUS service, NM does not recover (which is however not fixed by these patches).
Comment 1 Thomas Haller 2014-10-21 13:45:11 UTC
th/bgo738941_dbus_stop


The last commit "wifi: avoid assertion about dupplicate..." is not optimal. I did not find the reason for the assertion.
Comment 2 Dan Winship 2014-10-23 13:05:51 UTC
> settings: pass valid error domain to g_set_error() in load_plugins()

You should pass a valid error code too (NM_SETTINGS_ERROR_FAILED).


> core: unset "exit-on-close" of dbus connections

If, as you say above, NM does not recover after dbus-daemon restarts, then it seems like letting it crash is preferable, since then it's more obvious why networking is broken.
Comment 3 Thomas Haller 2014-10-27 15:21:50 UTC
(In reply to comment #2)
> > settings: pass valid error domain to g_set_error() in load_plugins()
> 
> You should pass a valid error code too (NM_SETTINGS_ERROR_FAILED).

That commit was not part of the branch, it's already on master.

I added commit:
>> settings: pass valid error code to g_set_error() in load_plugins()


> 
> 
> > core: unset "exit-on-close" of dbus connections
> 
> If, as you say above, NM does not recover after dbus-daemon restarts, then it
> seems like letting it crash is preferable, since then it's more obvious why
> networking is broken.

NM is supposed to work without DBUS daemon running and in consequence, it should handle a DBUS restart.

I noticed further issues, that require investigation. But for now, this is an actual bug that should be fixed.
Comment 4 Dan Williams 2014-10-27 17:32:01 UTC
(In reply to comment #1)
> th/bgo738941_dbus_stop
> 
> 
> The last commit "wifi: avoid assertion about dupplicate..." is not optimal. I
> did not find the reason for the assertion.

"dupplicate" -> "duplicate" in the git shortlog

But this happens because when the supplicant stops (due to the name-owner-change, which is triggered because dbus-daemon quit) the supplicant manager sets all supplicant interfaces to the DOWN state so they can be cleaned up.  That does two things:

1) calls supplicant_interface_acquire() to attempt to re-launch wpa_supplicant in case wpa_supplicant segfaulted

2) moves the NMDevicWifi to UNAVAILABLE state because the supplicant is gone, the device is no longer usable and we must terminate the connection and wait for the supplicant to come back

But #2 also ends up calling supplicant_interface_acquire(), because that's what we want to do when the NMDeviceWifi is first managed (at startup) and when the supplicant dies.  The code just doesn't differentiate between the two cases.

So the solutions here are either to:

a) like your patch does, allow duplicates, which I think is probably fine because this operation doesn't really need to be nested or anything like that

b) in nm-device-wifi.c::device_state_changed() don't call supplicant_interface_acquire() if the old_state > NM_DEVICE_STATE_DISCONNECTED, because if the device was ever available that meant it used to have a supplicant interface, and somehow that supplicant interface is gone, which implies that we already called supplicant_interface_acquire().

But I vote for (a) because I'd rather try to acquire a supplicant interface more often, than miss some edge-case that causes the nMDeviceWifi to be stuck in UNAVAILABLE state.
Comment 5 Dan Williams 2014-10-27 17:41:06 UTC
And the rest of the branch looks OK to me, I agree that if the bus daemon quits we shouldn't quit NM since we don't require the bus daemon any more.  Yeah, in the past we used to just say "tough, dbus doesn't die", and I guess with kdbus it won't be an issue either, but eh.
Comment 6 Dan Winship 2014-10-27 17:44:43 UTC
(In reply to comment #3)
> NM is supposed to work without DBUS daemon running and in consequence, it
> should handle a DBUS restart.

No, those are two separate running modes; it doesn't make any attempt to do a dbus-not-running to dbus-running transition if that happens while NM is running either.

And at any rate, even if is NM was supposed to handle D-Bus restarts, my point is that *it still doesn't* even with your patch, and the patch actually makes things worse, because it makes NM switch from obviously failing, to silently failing.
Comment 7 Thomas Haller 2014-10-27 19:50:24 UTC
(In reply to comment #6)
> 
> And at any rate, even if is NM was supposed to handle D-Bus restarts, my point
> is that *it still doesn't* even with your patch, and the patch actually makes
> things worse, because it makes NM switch from obviously failing, to silently
> failing.

I dropped this patch.

The remaining patches were merged to master as

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d0535cafe504ef23a2da6af241900b7bda69386e
Comment 8 Thomas Haller 2014-10-27 19:51:32 UTC
Created attachment 289479 [details] [review]
[patch] unsed "exit-on-close" on DBUS connection

Attach the dropped patch here for reference or later use.