After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 783391 - NM kills dhclient on DHCPNAK and prevents new offers.
NM kills dhclient on DHCPNAK and prevents new offers.
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
1.6.x
Other Linux
: Normal major
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks: nm-review
 
 
Reported: 2017-06-03 21:30 UTC by Guillaume Maudoux [:layus]
Modified: 2018-03-13 14:17 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
verbose (INFO) log of the failures (35.34 KB, text/x-log)
2017-06-03 21:30 UTC, Guillaume Maudoux [:layus]
  Details
dhclient-lease-kill.diff (480 bytes, patch)
2018-02-28 01:06 UTC, Justin King-Lacroix
none Details | Review
dhclient-destroy-lease-before-script.diff (509 bytes, patch)
2018-02-28 01:09 UTC, Justin King-Lacroix
none Details | Review
[PATCH] dhcp: handle expiry by letting the client continue for some time (14.22 KB, patch)
2018-03-08 16:14 UTC, Beniamino Galvani
none Details | Review
[PATCH v2] dhcp: handle expiry by letting the client continue for some time (14.40 KB, patch)
2018-03-13 08:10 UTC, Beniamino Galvani
none Details | Review

Description Guillaume Maudoux [:layus] 2017-06-03 21:30:48 UTC
Created attachment 353126 [details]
verbose (INFO) log of the failures

In the attached log, you can see that when dhclient receives DHCPNAK from the server, it is killed (the pid changes on next log message, probably killed by NM itself) and restarted three times. Afterwards, the connection remains in a broken state for about 7 hours, or until I restart it from nm-applet.

I lack internal knowledge, but it seems that NM moves to the expire state and then cancels the DHCP transaction (why ?).

Even stranger, sometimes NM does not fail on DHCPNAK, an the log looks like this:
> May 30 22:48:10 ankh-morpork dhclient[1252]: DHCPNAK from 192.168.1.1
> May 30 22:48:10 ankh-morpork NetworkManager[31596]: <info>  [1496177290.0942] dhcp4 (wlp2s0): state changed unknown -> expire
> May 30 22:48:10 ankh-morpork NetworkManager[31596]: <info>  [1496177290.1003] dhcp4 (wlp2s0): state changed expire -> unknown
> May 30 22:48:10 ankh-morpork dhclient[1252]: DHCPDISCOVER on wlp2s0 to 255.255.255.255 port 67 interval 8
> May 30 22:48:12 ankh-morpork dhclient[1252]: DHCPREQUEST on wlp2s0 to 255.255.255.255 port 67
> May 30 22:48:12 ankh-morpork dhclient[1252]: DHCPOFFER from 192.168.1.1
> May 30 22:48:12 ankh-morpork dhclient[1252]: DHCPACK from 192.168.1.1

So, "expire -> unknown" seems to be the best case... weird :-).
Comment 1 Mikel Ward 2018-01-31 00:52:20 UTC
Looks like there's a partial workaround for this from https://bugzilla.gnome.org/show_bug.cgi?id=739482.  But that only handles the case where the client isn't configured yet.

This can also happen when a client gets an address then later tries to renew it. 
 If the DHCP server needs to give the client a different address (e.g. server config changed or client network changed), the server will send a DHCPNAK, and dhclient will send an EXPIRE event.  Then NetworkManager kills dhclient then schedules a restart.

When dhclient starts, it goes into INIT-REBOOT state, where it first tries the (now invalid) last-known address.  Which results in a DHCPNAK.  And it goes around it circles.

Maybe dhcp4_cleanup should run "dhclient -r" so dhclient doesn't keep trying to use the last-known address?  Or maybe it should remove the leases file?  Or maybe NetworkManager could let dhclient keep running, since it would sort things out itself?

(https://bugzilla.gnome.org/show_bug.cgi?id=748268 also seems related.)
Comment 2 Justin King-Lacroix 2018-02-28 01:02:35 UTC
I can see two different fixes for this issue... and honestly, both should probably be implemented.

Fix 1: change dhclient.c in isc-dhcp-client to destroy its lease *before* calling out to its script file. This ensures that by the time dhclient would be killed by NetworkManager, the lease has been invalidated.
Fix 2: change nm-dhcp-dhclient.c to delete the lease file in stop(). The core assumption NetworkManager is making seems to be that stop() destroys all DHCP client state. The fact that the leases file hangs around violates that assumption.

I have patches for both of these; I will attach both shortly.
Comment 3 Justin King-Lacroix 2018-02-28 01:06:07 UTC
Created attachment 369074 [details] [review]
dhclient-lease-kill.diff

Patch to delete the lease file when killing dhclient.
Comment 4 Justin King-Lacroix 2018-02-28 01:09:39 UTC
Created attachment 369075 [details] [review]
dhclient-destroy-lease-before-script.diff
Comment 5 Justin King-Lacroix 2018-02-28 01:10:38 UTC
(Note that this also affects NM 1.8.x, and current HEAD.)
Comment 6 Guillaume Maudoux [:layus] 2018-02-28 11:43:58 UTC
Both patches look good to me. Thanks !
Comment 7 Beniamino Galvani 2018-03-08 16:11:00 UTC
(In reply to Justin King-Lacroix from comment #3)
> Created attachment 369074 [details] [review] [review]
> dhclient-lease-kill.diff
> 
> Patch to delete the lease file when killing dhclient.

I don't think this is correct, if the lease file is deleted every time the client is stopped, the interface will get a different IP address at the next connection attempt.

In my opinion the correct way to fix this is to change dhcp4_fail() to keep dhclient running if the lease expires, instead of killing it and restarting multiple times. Patch coming...
Comment 8 Beniamino Galvani 2018-03-08 16:14:48 UTC
Created attachment 369462 [details] [review]
[PATCH] dhcp: handle expiry by letting the client continue for some time

I think this is the right approach to fix the problem. The patch is a draft, I'm still doing tests to see if it breaks anything (also, there are some FIXMEs).
Comment 9 Justin King-Lacroix 2018-03-12 23:47:26 UTC
That also makes a whole lot of sense. Letting the dhcp client continue in order to allow the lease to be purged would fix the problem.
Comment 10 Beniamino Galvani 2018-03-13 08:10:50 UTC
Created attachment 369604 [details] [review]
[PATCH v2] dhcp: handle expiry by letting the client continue for some time

I slightly modified the patch, now CI tests are all green.
Comment 11 Thomas Haller 2018-03-13 10:42:06 UTC
(In reply to Beniamino Galvani from comment #10)
> Created attachment 369604 [details] [review] [review]
> [PATCH v2] dhcp: handle expiry by letting the client continue for some time
> 
> I slightly modified the patch, now CI tests are all green.

lgtm