GNOME Bugzilla – Bug 741347
[review] bg/ip-method-fail-fixes-bgo741347 - Fail the IP method, not the entire activation
Last modified: 2016-05-02 16:36:34 UTC
Various places in NM that handle IP-config related failures set the device state to FAILED, but instead they should be setting only the IP method to failed and then leave the determination of overall device failure to code that takes both IPv4, IPv6, and layer2 decisions into account. Those places that should be updated to do this are: nm-device-modem.c::ppp_failed() - the second case during IP config nm-device-modem.c::modem_ip4_config_result() nm-device-adsl.c::ppp_state_changed() - both cases (?) nm-device-bt.c::ppp_failed() - same as nm-device-modem.c nm-device-bt.c::modem_ip4_config_result() nm-device-ethernet.c::ppp_state_changed() nm-device.c::update_for_ip_ifname_change() - both dhcp4 and dhcp6 cases nm-device.c::dnsmasq_state_changed_cb() nm-device.c::nm_device_handle_autoip4_event() nm-device.c::aipd_watch_cb() nm-device.c::nm_device_activate_stage3_ip4_start() nm-device.c::nm_device_activate_stage3_ip6_start() nm-device.c::nm_device_activate_ip4_config_commit() nm-device.c::nm_device_activate_ip6_config_commit() nm-device-wimax.c::wmx_media_status_cb() The following cases should also be updated, but require special handling to ensure we don't break existing fixes for DHCP servers that go away for a short time: nm-device.c::nm_device_activate_ip4_config_timeout() nm-device.c::nm_device_activate_ip6_config_timeout() nm-device.c::dhcp4_lease_change() nm-device.c::dhcp4_fail() nm-device.c::dhcp4_state_changed() nm-device.c::dhcp6_lease_change() nm-device.c::dhcp6_fail() nm-device.c::dhcp6_state_changed() For DHCP, instead of always failing the IP method (like above), we should only fail the IP method if DHCP has not yet succeeded (eg, device still in initial activation, and priv->ipX_state == IP_CONF). If the device has previously completed DHCP, but the lease expires, then NM should periodically retry DHCP to regain the lease. After some grace period (10 minutes?) if may-fail=no, then NM should fail the IP method.
Pushed branch bg/ip-method-fail-fixes-bgo741347: 49b294a device: get rid of ipv4ll_timeout_remove() 1f6e3d9 device: don't fail the activation when a IP method fails 6e8e3eb device: retry DHCP when a lease expires b8a5518 device: don't disconnect after DHCP timeout when there are static IPs which also addresses the issue reported in: https://bugzilla.redhat.com/show_bug.cgi?id=1168388 The branch introduces nm_device_ipx_method_failed() functions to check the value of the ipvx.may-fail setting and the outcome of other methods in order to decide if the activation must fail. It also changes the behavior after DHCP renewal failures, so that DHCP is restarted for a predefined number of times (or forever, in case there are static addresses configured on the interface).
(In reply to Beniamino Galvani from comment #1) > Pushed branch bg/ip-method-fail-fixes-bgo741347: > > 49b294a device: get rid of ipv4ll_timeout_remove() > 1f6e3d9 device: don't fail the activation when a IP method fails > 6e8e3eb device: retry DHCP when a lease expires > b8a5518 device: don't disconnect after DHCP timeout when there are static > IPs seems this changed in the meantime? >> device: retry DHCP when a lease expires When we fail (nm_device_ip4_method_failed()), will we continue to retry? It seems that if nm_device_ip4_method_failed() decides not to disconnect -- because of may-fail=yes) -- we should also retry indefinitely. OTOH, if we retry also if a method failed, why do we even need to retry on lease-expiry? Can we not just fail right away?
(In reply to Thomas Haller from comment #2) > seems this changed in the meantime? Reworked and rebased on master > >> device: retry DHCP when a lease expires > > When we fail (nm_device_ip4_method_failed()), will we continue to retry? It > seems that if nm_device_ip4_method_failed() decides not to disconnect -- > because of may-fail=yes) -- we should also retry indefinitely. We keep trying to renew the lease if the server does not respond, but after 5 tries the method fails and we never try DHCP again (even if the device remains active because of may-fail=yes and the other method succeeded). I'm not sure we should retry forever if the user specified that the method can fail.
> device: don't fail the activation when a IP method fails So this isn't going to work well with IPv6 for Bluetooth and WWAN (or really anything that does PPPv6 like PPPoE either) since PPP failure can be for IPv4 or IPv6, since both methods are run through the same pppd process. I think in the PREPARE/CONFIG/NEED_AUTH cases we should still fail the entire activation because at that point we haven't start IP config. But for the IP_CHECK/IP_CONFIG/SECONDARIES/ACITVATED we need the logic to handle both IPv4 and IPv6. Maybe something like: if (nm_device_activate_ip4_state_in_conf (device)) nm_device_activate_schedule_ip4_config_timeout (device); else if (nm_device_activate_ip6_state_in_conf (device)) nm_device_activate_schedule_ip6_config_timeout (device); else { if <IPv4 == IP_DONE> nm_device_ip4_method_failed (device, NM_DEVICE_STATE_REASON_IP_CONFIG_UNAVAILABLE); if <ipv6 == IP_DONE> nm_device_ip6_method_failed (device, NM_DEVICE_STATE_REASON_IP_CONFIG_UNAVAILABLE); } > device: retry DHCP when a lease expires Instead of having a second variable dhcpX_restarted, maybe just check if dhcpX_num_restarts_left == DHCP_NUM_RESTARTS_MAX? Those should be the same thing right?
(In reply to Dan Williams from comment #4) > But for the IP_CHECK/IP_CONFIG/SECONDARIES/ACITVATED we need the logic to > handle both IPv4 and IPv6. Maybe something like: > [...] Should be fixed now. > Instead of having a second variable dhcpX_restarted, maybe just check if > dhcpX_num_restarts_left == DHCP_NUM_RESTARTS_MAX? Those should be the same > thing right? Yes, fixed and repushed.
>> device: fail activation immediately only when may-fail=no src/devices/bluetooth/nm-device-bt.c: + else if (nm_device_activate_ip4_state_in_conf (device)) ^^^ + nm_device_activate_schedule_ip6_config_timeout (device); Other then that, lgtm
(In reply to Thomas Haller from comment #6) > src/devices/bluetooth/nm-device-bt.c: > + else if (nm_device_activate_ip4_state_in_conf (device)) > ^^^ > + nm_device_activate_schedule_ip6_config_timeout (device); Fixed, thanks.
Looks all good to me
Merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=904f840c2072062a22f82024563bf0caf2a062cf