After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 741347 - [review] bg/ip-method-fail-fixes-bgo741347 - Fail the IP method, not the entire activation
[review] bg/ip-method-fail-fixes-bgo741347 - Fail the IP method, not the enti...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
unspecified
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks: nm-review
 
 
Reported: 2014-12-10 17:08 UTC by Dan Williams
Modified: 2016-05-02 16:36 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dan Williams 2014-12-10 17:08:52 UTC
Various places in NM that handle IP-config related failures set the device state to FAILED, but instead they should be setting only the IP method to failed and then leave the determination of overall device failure to code that takes both IPv4, IPv6, and layer2 decisions into account.  Those places that should be updated to do this are:

nm-device-modem.c::ppp_failed() - the second case during IP config
nm-device-modem.c::modem_ip4_config_result()
nm-device-adsl.c::ppp_state_changed() - both cases (?)
nm-device-bt.c::ppp_failed() - same as nm-device-modem.c
nm-device-bt.c::modem_ip4_config_result()
nm-device-ethernet.c::ppp_state_changed()
nm-device.c::update_for_ip_ifname_change() - both dhcp4 and dhcp6 cases
nm-device.c::dnsmasq_state_changed_cb()
nm-device.c::nm_device_handle_autoip4_event()
nm-device.c::aipd_watch_cb()
nm-device.c::nm_device_activate_stage3_ip4_start()
nm-device.c::nm_device_activate_stage3_ip6_start()
nm-device.c::nm_device_activate_ip4_config_commit()
nm-device.c::nm_device_activate_ip6_config_commit()
nm-device-wimax.c::wmx_media_status_cb()

The following cases should also be updated, but require special handling to ensure we don't break existing fixes for DHCP servers that go away for a short time:

nm-device.c::nm_device_activate_ip4_config_timeout()
nm-device.c::nm_device_activate_ip6_config_timeout()
nm-device.c::dhcp4_lease_change()
nm-device.c::dhcp4_fail()
nm-device.c::dhcp4_state_changed()
nm-device.c::dhcp6_lease_change()
nm-device.c::dhcp6_fail()
nm-device.c::dhcp6_state_changed()

For DHCP, instead of always failing the IP method (like above), we should only fail the IP method if DHCP has not yet succeeded (eg, device still in initial activation, and priv->ipX_state == IP_CONF).  If the device has previously completed DHCP, but the lease expires, then NM should periodically retry DHCP to regain the lease.  After some grace period (10 minutes?) if may-fail=no, then NM should fail the IP method.
Comment 1 Beniamino Galvani 2015-08-25 12:52:44 UTC
Pushed branch bg/ip-method-fail-fixes-bgo741347:

  49b294a device: get rid of ipv4ll_timeout_remove()
  1f6e3d9 device: don't fail the activation when a IP method fails
  6e8e3eb device: retry DHCP when a lease expires
  b8a5518 device: don't disconnect after DHCP timeout when there are static IPs

which also addresses the issue reported in:

  https://bugzilla.redhat.com/show_bug.cgi?id=1168388

The branch introduces nm_device_ipx_method_failed() functions to check
the value of the ipvx.may-fail setting and the outcome of other
methods in order to decide if the activation must fail.

It also changes the behavior after DHCP renewal failures, so that DHCP
is restarted for a predefined number of times (or forever, in case
there are static addresses configured on the interface).
Comment 2 Thomas Haller 2015-09-09 12:53:40 UTC
(In reply to Beniamino Galvani from comment #1)
> Pushed branch bg/ip-method-fail-fixes-bgo741347:
> 
>   49b294a device: get rid of ipv4ll_timeout_remove()
>   1f6e3d9 device: don't fail the activation when a IP method fails
>   6e8e3eb device: retry DHCP when a lease expires
>   b8a5518 device: don't disconnect after DHCP timeout when there are static
> IPs

seems this changed in the meantime?


>> device: retry DHCP when a lease expires

When we fail (nm_device_ip4_method_failed()), will we continue to retry? It seems that if nm_device_ip4_method_failed() decides not to disconnect -- because of may-fail=yes) -- we should also retry indefinitely.

OTOH, if we retry also if a method failed, why do we even need to retry on lease-expiry? Can we not just fail right away?
Comment 3 Beniamino Galvani 2015-12-18 14:20:00 UTC
(In reply to Thomas Haller from comment #2)
> seems this changed in the meantime?

Reworked and rebased on master

> >> device: retry DHCP when a lease expires
>
> When we fail (nm_device_ip4_method_failed()), will we continue to retry? It
> seems that if nm_device_ip4_method_failed() decides not to disconnect --
> because of may-fail=yes) -- we should also retry indefinitely.

We keep trying to renew the lease if the server does not respond, but
after 5 tries the method fails and we never try DHCP again (even if
the device remains active because of may-fail=yes and the other method
succeeded).

I'm not sure we should retry forever if the user specified that
the method can fail.
Comment 4 Dan Williams 2016-02-05 23:46:35 UTC
> device: don't fail the activation when a IP method fails

So this isn't going to work well with IPv6 for Bluetooth and WWAN (or really anything that does PPPv6 like PPPoE either) since PPP failure can be for IPv4 or IPv6, since both methods are run through the same pppd process.  I think in the PREPARE/CONFIG/NEED_AUTH cases we should still fail the entire activation because at that point we haven't start IP config.

But for the IP_CHECK/IP_CONFIG/SECONDARIES/ACITVATED we need the logic to handle both IPv4 and IPv6.  Maybe something like:

if (nm_device_activate_ip4_state_in_conf (device))
	nm_device_activate_schedule_ip4_config_timeout (device);
else if (nm_device_activate_ip6_state_in_conf (device))
	nm_device_activate_schedule_ip6_config_timeout (device);
else {
	if <IPv4 == IP_DONE>
		nm_device_ip4_method_failed (device, NM_DEVICE_STATE_REASON_IP_CONFIG_UNAVAILABLE);
        if <ipv6 == IP_DONE>
		nm_device_ip6_method_failed (device, NM_DEVICE_STATE_REASON_IP_CONFIG_UNAVAILABLE);
}


> device: retry DHCP when a lease expires

Instead of having a second variable dhcpX_restarted, maybe just check if dhcpX_num_restarts_left == DHCP_NUM_RESTARTS_MAX?  Those should be the same thing right?
Comment 5 Beniamino Galvani 2016-04-09 12:24:43 UTC
(In reply to Dan Williams from comment #4)
> But for the IP_CHECK/IP_CONFIG/SECONDARIES/ACITVATED we need the logic to
> handle both IPv4 and IPv6.  Maybe something like:
> [...]

Should be fixed now.

> Instead of having a second variable dhcpX_restarted, maybe just check if
> dhcpX_num_restarts_left == DHCP_NUM_RESTARTS_MAX?  Those should be the same
> thing right?

Yes, fixed and repushed.
Comment 6 Thomas Haller 2016-04-26 12:53:43 UTC
>> device: fail activation immediately only when may-fail=no
    


src/devices/bluetooth/nm-device-bt.c:
+         else if (nm_device_activate_ip4_state_in_conf (device))
                                       ^^^
+              nm_device_activate_schedule_ip6_config_timeout (device);




Other then that, lgtm
Comment 7 Beniamino Galvani 2016-04-28 08:58:49 UTC
(In reply to Thomas Haller from comment #6)
> src/devices/bluetooth/nm-device-bt.c:
> +         else if (nm_device_activate_ip4_state_in_conf (device))
>                                        ^^^
> +              nm_device_activate_schedule_ip6_config_timeout (device);

Fixed, thanks.
Comment 8 Lubomir Rintel 2016-05-02 16:19:40 UTC
Looks all good to me