GNOME Bugzilla – Bug 748268
DHCP6: dhclient exits when old lease file is found
Last modified: 2018-03-15 13:22:49 UTC
Created attachment 302110 [details] 01_nm+dhclient-shell-log.txt: Testing NM+dhclient with (un)expired lease files in a shell Given a network using IPv6 with DHCP6 and an expired DHCP6 lease file, dhclient started by NetworkManager will retrieve an IPv6 address, but then instantly exit. Thus the address will expire after a lifetime of e.g. 10min without renewal. The expected result would be to let dhclient running and renew the lease on a regular basis. If instead the lease file states it is still valid, then dhclient will continue running and renewing the lease. Both can be simulated for testing by editing the "starts" entries in the lease file as shown in attachment 01 [details]_nm+dhclient-shell-log.txt. If the lease file is (modified to be) expired, dhclient can be run manually and will continue running. For a log see 02_dhclient-in-shell-log.txt. In case of an expired lease file, the NetworkManager seems to kill the dhclient. At least this can be assumed, based on the above observations and the system log extract which can be found at line 60ff. of 03_unsuccessfull-journal.txt. For comparison see 04_successfull-journal.txt where the lease file is not expired and dhclient keeps running. (Created with NetworkManager.conf with logging domains=DHCP6:DEBUG) Im using Arch Linux with the following package versions: networkmanager 1.0.0-2 dhclient 4.3.2-1
Created attachment 302111 [details] 02_dhclient-in-shell-log.txt: Running dhclient manually in a shell
Created attachment 302112 [details] 03_unsuccessfull-journal.txt: System log with expired lease
Created attachment 302113 [details] 04_successfull-journal.txt: System log with unexpired lease file
Created attachment 325057 [details] TRACE debug log showing bound > expire with DEPREF6 I've been investigating this issue on Ubuntu. My in-depth analysis and notes are in the bug report at: https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1533631 In summary, when the state transitions from "expire" to "done" Network Manager kills the 'dhclient' process before it has chance to request and bind a fresh lease, If 'dhclient' is run manually with the same command-line options and allowed to continue running it correctly gains a new lease. The code path is: nm_dhcp_client_set_state () fires a NM_DHCP_STATE_EXPIRE signal dhcp6_state_changed () receives NM_DHCP_STATE_EXPIRE signal and calls dhcp6_fail (self, FALSE) dhcp6_cleanup(self, CLEANUP_TYPE_DECONFIGURE, FALSE) nm_dhcp_client_stop () NM_DHCP_CLIENT_GET_CLASS (self)->stop (self, release, priv->duid); I've attached a Network Manager TRACE log_level from syslog generated using: /etc/NetworkManager/conf.d/logging.conf: --- [logging] level=TRACE --- I've removed any extraneous messages from processes other than NM and dhclient. I've also removed IPv4 related messages to make it easier to follow. The debug log contains: Mar 31 01:34:59 hephaestion NetworkManager[9095]: <debug> [1459384499.752259] [dhcp-manager/nm-dhcp-client.c:222] reason_to_state(): (wlp3s0): unmapped DHCP state 'DEPREF6' Mar 31 01:34:59 hephaestion NetworkManager[9095]: <debug> [1459384499.758516] [dhcp-manager/nm-dhcp-client.c:757] nm_dhcp_client_handle_event(): (wlp3s0): DHCP reason 'DEPREF6' -> state 'unknown' Network Manager doesn't know how to handle "DEPREF6", which is sent from isc-dhcp dhclient to the helper script (set by "-sf" option): client/dhc6.c::do_depref() { ... if (addr->starts + addr->preferred_life <= cur_time) { script_init(client, "DEPREF6", NULL); dhc6_marshall_values("cur_", client, lease, ia, addr); script_write_requested6(client); script_go(client); ... } So it seems that to correctly solve this issue Network Manager must be taught how to handle DEPREF6.
So, the problem is that dhclient first sends a BOUND event for the expired lease and then immediately deprefers and expires the address. Since NM considers EXPIRED to mean that the client was unable to renew an existing lease, it terminates the client to start a new instance. I don't think that the handling of DEPREFERRED is relevant here, NM leaves the lease renewal logic to dhclient and is interested only in when the lease expires. What dhclient is doing doesn't look correct to me, but probably we should try to work around that. BTW, I found something similar in an old bug [1]. [1] https://bugzilla.redhat.com/show_bug.cgi?id=585418#c8
I'm currently working up a patch to handle it, so far so good. Will report back once its working.
Tested and working, so up for discussion. The following changes since commit 28e9ec499a7517f35cc5093e9b545302094e55d4: device: renew dhcp leases on awake for software devices (2016-03-31 16:31:34 +0200) are available in the git repository at: git://iam.tj/NetworkManager.git bug748268-DEPREF6 for you to fetch changes up to 84ca52aeca9210cde1af379d4f11c75d62ae78ad: DHCPv6: don't kill dhclient when address is depreferred (2016-03-31 16:45:30 +0100) ---------------------------------------------------------------- TJ (1): DHCPv6: don't kill dhclient when address is depreferred src/devices/nm-device.c | 7 ++++++- src/dhcp-manager/nm-dhcp-client.c | 4 ++++ src/dhcp-manager/nm-dhcp-client.h | 1 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/src/devices/nm-device.c b/src/devices/nm-device.c index 5085833..b2f3c3e 100644 --- a/src/devices/nm-device.c +++ b/src/devices/nm-device.c @@ -5263,12 +5263,17 @@ dhcp6_state_changed (NMDhcpClient *client, } else if (priv->ip6_state == IP_DONE) dhcp6_lease_change (self); break; + case NM_DHCP_STATE_DEPREFER: + if (priv->ip6_state == IP_DONE) + priv->ip6_state = IP_WAIT; + break; case NM_DHCP_STATE_TIMEOUT: dhcp6_timeout (self, client); break; case NM_DHCP_STATE_EXPIRE: /* Ignore expiry before we even have a lease (NAK, old lease, etc) */ - if (priv->ip6_state != IP_CONF) + /* Don't fail when a DEPREF6 has caused the EXPIRE6; wait for the new lease */ + if (priv->ip6_state != IP_CONF && priv->ip6_state != IP_WAIT) dhcp6_fail (self, FALSE); break; case NM_DHCP_STATE_DONE: diff --git a/src/dhcp-manager/nm-dhcp-client.c b/src/dhcp-manager/nm-dhcp-client.c index b8161ab..6c4fe3e 100644 --- a/src/dhcp-manager/nm-dhcp-client.c +++ b/src/dhcp-manager/nm-dhcp-client.c @@ -191,6 +191,7 @@ nm_dhcp_client_get_fqdn (NMDhcpClient *self) static const char *state_table[NM_DHCP_STATE_MAX + 1] = { [NM_DHCP_STATE_UNKNOWN] = "unknown", [NM_DHCP_STATE_BOUND] = "bound", + [NM_DHCP_STATE_DEPREFER] = "deprefer", [NM_DHCP_STATE_TIMEOUT] = "timeout", [NM_DHCP_STATE_EXPIRE] = "expire", [NM_DHCP_STATE_DONE] = "done", @@ -216,6 +217,9 @@ reason_to_state (NMDhcpClient *self, const char *iface, const char *reason) g_ascii_strcasecmp (reason, "rebind") == 0 || g_ascii_strcasecmp (reason, "rebind6") == 0) return NM_DHCP_STATE_BOUND; + else if (g_ascii_strcasecmp (reason, "depref") == 0 || + g_ascii_strcasecmp (reason, "depref6") == 0) + return NM_DHCP_STATE_DEPREFER; else if (g_ascii_strcasecmp (reason, "timeout") == 0) return NM_DHCP_STATE_TIMEOUT; else if (g_ascii_strcasecmp (reason, "nak") == 0 || diff --git a/src/dhcp-manager/nm-dhcp-client.h b/src/dhcp-manager/nm-dhcp-client.h index 1c78c5b..9fbc83c 100644 --- a/src/dhcp-manager/nm-dhcp-client.h +++ b/src/dhcp-manager/nm-dhcp-client.h @@ -46,6 +46,7 @@ typedef enum { NM_DHCP_STATE_UNKNOWN = 0, NM_DHCP_STATE_BOUND, /* new lease or lease changed */ + NM_DHCP_STATE_DEPREFER, /* lease out-of-date and will be replaced */ NM_DHCP_STATE_TIMEOUT, /* timed out contacting server */ NM_DHCP_STATE_DONE, /* client quit or stopped */ NM_DHCP_STATE_EXPIRE, /* lease expired or NAKed */
I think I'd rather see NMDhcpClient grow a "priv->prev_state" that we could check instead of playing around with IP_XXXX. Then for EXPIRE we check the prev_state and if it's DEPREFER we ignore the EXPIRE?
Could even send the prev_state in the DhcpClient state-changed signal to make it trivial for nm-device.c.
Something like this? (do I need to g_free(event_id) ? ) The following changes since commit 28e9ec499a7517f35cc5093e9b545302094e55d4: device: renew dhcp leases on awake for software devices (2016-03-31 16:31:34 +0200) are available in the git repository at: git://iam.tj/NetworkManager.git bug748268-DEPREF6-v2 for you to fetch changes up to 14f05c5e852b7f0677520a8a5ce68319e69e1396: DHCPv6: don't kill dhclient when address is depreferred (2016-03-31 23:43:55 +0100) ---------------------------------------------------------------- TJ (1): DHCPv6: don't kill dhclient when address is depreferred src/devices/nm-device.c | 5 ++++- src/dhcp-manager/nm-dhcp-client.c | 9 +++++++++ src/dhcp-manager/nm-dhcp-client.h | 1 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/devices/nm-device.c b/src/devices/nm-device.c index 5085833..1e879c5 100644 --- a/src/devices/nm-device.c +++ b/src/devices/nm-device.c @@ -5263,12 +5263,15 @@ dhcp6_state_changed (NMDhcpClient *client, } else if (priv->ip6_state == IP_DONE) dhcp6_lease_change (self); break; + case NM_DHCP_STATE_DEPREFER: + break; case NM_DHCP_STATE_TIMEOUT: dhcp6_timeout (self, client); break; case NM_DHCP_STATE_EXPIRE: /* Ignore expiry before we even have a lease (NAK, old lease, etc) */ - if (priv->ip6_state != IP_CONF) + /* Don't fail when a DEPREF6 has caused the EXPIRE6; wait for the new lease */ + if (priv->ip6_state != IP_CONF && !strcmp (event_id, "deprefer")) dhcp6_fail (self, FALSE); break; case NM_DHCP_STATE_DONE: diff --git a/src/dhcp-manager/nm-dhcp-client.c b/src/dhcp-manager/nm-dhcp-client.c index b8161ab..3905dcb 100644 --- a/src/dhcp-manager/nm-dhcp-client.c +++ b/src/dhcp-manager/nm-dhcp-client.c @@ -191,6 +191,7 @@ nm_dhcp_client_get_fqdn (NMDhcpClient *self) static const char *state_table[NM_DHCP_STATE_MAX + 1] = { [NM_DHCP_STATE_UNKNOWN] = "unknown", [NM_DHCP_STATE_BOUND] = "bound", + [NM_DHCP_STATE_DEPREFER] = "deprefer", [NM_DHCP_STATE_TIMEOUT] = "timeout", [NM_DHCP_STATE_EXPIRE] = "expire", [NM_DHCP_STATE_DONE] = "done", @@ -216,6 +217,9 @@ reason_to_state (NMDhcpClient *self, const char *iface, const char *reason) g_ascii_strcasecmp (reason, "rebind") == 0 || g_ascii_strcasecmp (reason, "rebind6") == 0) return NM_DHCP_STATE_BOUND; + else if (g_ascii_strcasecmp (reason, "depref") == 0 || + g_ascii_strcasecmp (reason, "depref6") == 0) + return NM_DHCP_STATE_DEPREFER; else if (g_ascii_strcasecmp (reason, "timeout") == 0) return NM_DHCP_STATE_TIMEOUT; else if (g_ascii_strcasecmp (reason, "nak") == 0 || @@ -320,6 +324,11 @@ nm_dhcp_client_set_state (NMDhcpClient *self, if (iaid && start) event_id = g_strdup_printf ("%s|%s", iaid, start); } + if (priv->ipv6 && + priv->state == NM_DHCP_STATE_DEPREFER && + new_state == NM_DHCP_STATE_EXPIRE) { + event_id = g_strdup_printf ("%s","deprefer"); + } _LOGI ("state changed %s -> %s%s%s%s", state_to_string (priv->state), diff --git a/src/dhcp-manager/nm-dhcp-client.h b/src/dhcp-manager/nm-dhcp-client.h index 1c78c5b..9fbc83c 100644 --- a/src/dhcp-manager/nm-dhcp-client.h +++ b/src/dhcp-manager/nm-dhcp-client.h @@ -46,6 +46,7 @@ typedef enum { NM_DHCP_STATE_UNKNOWN = 0, NM_DHCP_STATE_BOUND, /* new lease or lease changed */ + NM_DHCP_STATE_DEPREFER, /* lease out-of-date and will be replaced */ NM_DHCP_STATE_TIMEOUT, /* timed out contacting server */ NM_DHCP_STATE_DONE, /* client quit or stopped */ NM_DHCP_STATE_EXPIRE, /* lease expired or NAKed */
Whoops! accidentally inverted the logic of the strcmp()! This is the important bit too: diff --git a/src/devices/nm-device.c b/src/devices/nm-device.c index 1e879c5..ec85090 100644 --- a/src/devices/nm-device.c +++ b/src/devices/nm-device.c @@ -5271,7 +5271,7 @@ dhcp6_state_changed (NMDhcpClient *client, case NM_DHCP_STATE_EXPIRE: /* Ignore expiry before we even have a lease (NAK, old lease, etc) */ /* Don't fail when a DEPREF6 has caused the EXPIRE6; wait for the new lease */ - if (priv->ip6_state != IP_CONF && !strcmp (event_id, "deprefer")) + if (priv->ip6_state != IP_CONF && strcmp (event_id, "deprefer")) dhcp6_fail (self, FALSE); break; case NM_DHCP_STATE_DONE:
Last fix, honest - avoid null pointers always! diff --git a/src/devices/nm-device.c b/src/devices/nm-device.c index ec85090..e330175 100644 --- a/src/devices/nm-device.c +++ b/src/devices/nm-device.c @@ -5271,7 +5271,7 @@ dhcp6_state_changed (NMDhcpClient *client, case NM_DHCP_STATE_EXPIRE: /* Ignore expiry before we even have a lease (NAK, old lease, etc) */ /* Don't fail when a DEPREF6 has caused the EXPIRE6; wait for the new lease */ - if (priv->ip6_state != IP_CONF && strcmp (event_id, "deprefer")) + if (priv->ip6_state != IP_CONF && event_id && strcmp (event_id, "deprefer")) dhcp6_fail (self, FALSE); break; case NM_DHCP_STATE_DONE:
> DHCPv6: don't kill dhclient when address is depreferred case NM_DHCP_STATE_EXPIRE: /* Ignore expiry before we even have a lease (NAK, old lease, etc) */ - if (priv->ip6_state != IP_CONF) + /* Don't fail when a DEPREF6 has caused the EXPIRE6; wait for the new lease */ + if (priv->ip6_state != IP_CONF && !strcmp (event_id, "deprefer")) dhcp6_fail (self, FALSE); break; dhclient sends a DEPREFER6 after preferred_lifetime and EXPIRE6 after valid_lifetime if the lease cannot be renewed. So, in case the server goes down or can't be reached NM will receive both messages in sequence (the two lifetimes can be equal and in that case still both messages will be generated). So I think that with this change we will always ignore the expire message when a lease expires and thus the connection will stay active forever in case the DHCP server can't be contacted. Instead now we fail the connection and then NM will retry it according to its policy. Probably we should try to detect the presence of expired leases at dhclient startup in a different way? BTW, I guess I can't reproduce on Fedora the behavior reported in description probably because dhclient has this patch applied: http://pkgs.fedoraproject.org/cgit/rpms/dhcp.git/tree/dhcp-honor-expired.patch?h=f24 which fixes the BOUND -> DEPREFER -> EXPIRED cycle when the lease file contains an expired lease on startup.
(In reply to Beniamino Galvani from comment #13) > Probably we should try to detect the presence of expired leases at > dhclient startup in a different way? How about capturing a timestamp when BOUND happens and when DEPREFER6 arrives do diff = now - bound_time and pass that on to EXPIRE6 handling. Then it would be possible to use a sensible configuration default 'policy' to dictate the minimum difference for BOUND > EXPIRE transitions. e.g. EXPIRE does something like dhcp6_state_changed() ... case NM_DHCP_STATE_EXPIRE: if (priv->ip6_state != IP_CONF && now - bound_timestamp > bound_timestamp_minimum) dhcp6_fail (self, FALSE);
bah, no comment editing available. Ignore my previous first three lines of discussion; the code actually does something simpler which would presumably negate having to do any special DEPREFER6 handling at all. It only needs to deal with the BOUND -> EXPIRE transition time. There probably needs to be some kind of handling for the genuine deprefer case added at some point to cover other edge cases.
(In reply to TJ from comment #15) > bah, no comment editing available. Ignore my previous first three lines of > discussion; the code actually does something simpler which would presumably > negate having to do any special DEPREFER6 handling at all. It only needs to > deal with the BOUND -> EXPIRE transition time. This sounds reasonable to me, using a timestamp it should be possible to detect when such bogus BOUND -> EXPIRE events are received and don't fail the activation (it's still a hack but I think it's acceptable considering how dhclient behavior is broken).
I'm following up your link to the Fedora patch against dhclient - trying to find out why there has been no attempt to upstream it that I can see. I'm currently using that (in Ubuntu) and it has solved the issue, so my intention right now is to try and get it upstream, and either way add it to the Ubuntu packages. I'll also put a timeout-based BOUND > EXPIRE patch together for NM and see how that goes. Can't hurt to take a belt-and-braces approach. One question on that: where, and how best, can I add a configuration option to control the timeout such that the system administrator can alter the policy from the 'sensible default' if required (I'm thinking a value of 0 will disable the timeout) ?
(In reply to TJ from comment #17) > I'm following up your link to the Fedora patch against dhclient - trying to > find out why there has been no attempt to upstream it that I can see. AFAIK, getting patches accept upstream is quite a challenge. > I'll also put a timeout-based BOUND > EXPIRE patch together for NM and see > how that goes. Can't hurt to take a belt-and-braces approach. I agree, we should fix this in NM so that we don't depend on dhclient being patched. > One question on that: where, and how best, can I add a configuration option > to control the timeout such that the system administrator can alter the > policy from the 'sensible default' if required (I'm thinking a value of 0 > will disable the timeout) ? In my opinion users shouldn't have to deal with such low-level details and therefore the timeout shouldn't be configurable. It would be better to find a solution that works well in all cases without requiring to be tweaked or deactivated :)
I confirm that this behavior still is the same - instead of renewing its DHCPv6-lease the client silently gives up. Other OS like Windows and macOS just ask for it again. Are there any news in this case?
I think this issue got fixed by https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=17009ed91da8b3e0b10ee7e94d220be9bd3fa84c Under that assumption, I am closing this bug as fixed. Please re-open if you think that is wrong. Thank you!!