Bug 792745 – DHCP6 does not retry on failure

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 792745 - DHCP6 does not retry on failure


Summary:	DHCP6 does not retry on failure


Status:	RESOLVED OBSOLETE

Product:	NetworkManager
Classification:	Platform
Component:	IP and DNS config
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-01-21 09:00 UTC by Scott Shambarger
Modified:	2020-11-12 14:29 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Scott Shambarger 2018-01-21 09:00:26 UTC

If ipv6.method=auto and the network has ipv6 RA with the managed flag, dhclient will acquire an address (as expected).  If the DHCPv6 server then fails, and the lease expires, the connection fails and no retries are attempted.  Even if the DHCPv6 server returns, no network addresses are configured.

Here's a few logs after the expire:

dhcp6 (lan0): DHCP reason 'EXPIRE6' -> state 'expire'
dhcp6 (lan0): state changed unknown -> expire
device[0x55e8a746f080] (lan0): new DHCPv6 client state 4
device[0x55e8a746f080] (lan0): DHCPv6 failed: timeout 0, num tries left 3
kill child process 'dhcp-client-lan0' (4269): waiting up to 500 milliseconds for proces
kill child process 'dhcp-client-lan0' (4269): after sending SIGTERM (15), process 4269 
dhcp6 (lan0): canceled DHCP transaction, DHCP client pid 4269
dhcp6 (lan0): state changed expire -> done

...and then nothing.  The code appears to want to retry at least 3 times, but doesn't.

The bug appears to be in src/devices/nm-device.c, between dhcp6_cleanup() and dhcp6_fail().

dhcp6_fail() calls dhcp6_cleanup(), and then checks if priv->dhcp6.mode==LEVEL_MANAGED, and retries if true (with additional checks).

Problem is dhcp6_cleanup() sets priv->dhcp6.mode=LEVEL_NONE.  The LEVEL_MANAGED test was added about commit 784d2631, with the retry code added later, but it seems like it's always been broken.

The simple solution would be to remove priv->dhcp6.mode=LEVEL_NONE from dhcp6_cleanup()... "mode" should be ok to keep after dhcp client cleanup, but I'm not familiar enough with the logic here to know for sure.

Ideally, I'd love to have a way to have dhclient retry forever... this would be especially useful for machines on whose dhcp server failed for awhile (say overnight), and although fixed in the morning, the network doesn't "just work"... (I tried playing with may-fail=no, but that doesn't seem to affect this situation).

Comment 1 Thomas Haller 2018-01-22 10:13:34 UTC

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1503587

Comment 2 Scott Shambarger 2018-01-24 23:35:17 UTC

Read through the related bug, and added my thoughts there.  It's really pretty independent of this issue however, although it does deal with my "option needed for dhclient to retry forever" :)

The issue here is just that retries in ipv6 never happen, even if (in the end), the config fails -- probably just delays the inevitable...

Comment 3 Francesco Giudici 2018-02-08 15:19:52 UTC

Moving back the discussion here.
In particular:

> 
> Since NetworkManager is intentionally overriding the quite robust retry
> logic in the dhclient to implement may-fail (or not as it appears :), the
> least it could do if have a setting to restore it.
> 
> How do I tell NetworkManager that I want the address family to retry forever
> (dhclient logic), and that I'm not using may-fail/autoconnect? (especially
> since it affects other addresses families which have their own lifecycle, or
> are static and have none!).  Perhaps ipX.dhcp-retry=-1 (infinity) so that
> the address family can return without affecting the whole connection? ...
> Ideally with no delays between retries... if the dhcp server returns, the
> host should get an address immediately;  dhclient does this! :)
> 
> In short, how about dhcp-retry=-1 means don't kill dhclient, let it do it's
> thing!

With ipX.dhcp-retry=-1 are you suggesting to add a new property to control the number of dhcp retries?
Ipv4 already has "ipv4.dhcp-timeout" to let dhcp client to stick trying to get an IP address, ipv6 anyway has nothing similar.

Comment 4 Francesco Giudici 2018-02-22 11:50:11 UTC

Fix has landed on master (and nm-1-10 too):
https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=1289450146e

So, DHCPv6 will now be retried.

The only way to keep DHCPv6 retrying to renew a successful lease forever, without even considering the "ipvx.may-fail" properties, is to add also a static IPv6 address to the connection (also if not used): this way, the ip6 internal conf will be (partially) there, keeping the connection up. NetworkManager will keep trying dhcpv6 every 120 seconds.

Marking the bug as resolved.
Please reopen if needed.

Comment 5 Scott Shambarger 2018-02-27 10:39:23 UTC

Finally had time to test this... bug is still present.  Here's what I found:

1) dhcp expires (dhcp server offline)
2) NM gets expire, calls dhcp6_fail, which calls dhcp6_cleanup
3) dhcp retries and times out
4) NM gets timeout, calls dhcp6_timeout...

Here's where things fall apart; dhcp6_cleanup in #2 above sets priv->dhcp6.mode = NM_NDISC_DHCP_LEVEL_NONE, so dhcp6_timeout doesn't think ipv6 is managed and skips the call to dhcp6_fail (and just goes to IP_DONE).

The problem is that priv->dhcp6.mode is only set to MANAGED if (a) method is dhcp, or (b) ndisc state changes and includes the RA managed flag.  Of course, after the ip6 has started managed dhcp the RA flags won't change, so the mode never gets set correctly after dhcp6_cleanup clears it.

I think the correct solution is to not have dhcp6_cleanup change the mode.  There's already a system in place to update the mode (ndisc or config change), so all the mode clear does is break retries.

(I've tried commenting out the mode clear in dhcp6_cleanup, and things work as expected :)

BTW, while I was debugging this, I also found another bug in the logic... if I restart NetworkManager, and it "assumes" the existing connection, there's logic in ensure_con_ip4_config to ignore configured ip addresses, so the call to nm_ip6_config_get_num_addresses in dhcp6_fail will forever return 0 after that (and the "device has IP addresses" dhcp restart condition breaks)

Setting a static ip to make dhcp work seems pretty hacky anyway... there must be a better solution :)

Comment 6 Beniamino Galvani 2018-03-13 09:35:07 UTC

(In reply to Scott Shambarger from comment #5)
> Finally had time to test this... bug is still present.  Here's what I found:
> 
> 1) dhcp expires (dhcp server offline)
> 2) NM gets expire, calls dhcp6_fail, which calls dhcp6_cleanup
> 3) dhcp retries and times out
> 4) NM gets timeout, calls dhcp6_timeout...
> 
> Here's where things fall apart; dhcp6_cleanup in #2 above sets
> priv->dhcp6.mode = NM_NDISC_DHCP_LEVEL_NONE, so dhcp6_timeout doesn't think
> ipv6 is managed and skips the call to dhcp6_fail (and just goes to IP_DONE).

Can you try the patch at https://bugzilla.gnome.org/show_bug.cgi?id=783391#c10 ?

I think it should resolve the problem because it keeps the DHCPv6 client running if IPv4 is up.

Comment 7 André Klapper 2020-11-12 14:29:00 UTC

bugzilla.gnome.org is being shut down in favor of a GitLab instance. 
We are closing all old bug reports and feature requests in GNOME Bugzilla which have not seen updates for a long time.

If you still use NetworkManager and if you still see this bug / want this feature in a recent and supported version of NetworkManager, then please feel free to report it at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/

Thank you for creating this report and we are sorry it could not be implemented (workforce and time is unfortunately limited).