GNOME Bugzilla – Bug 768362
DHCP should support fallback profiles
Last modified: 2020-11-12 14:33:02 UTC
I am the maintainer of a web based management tool for running Gentoo Linux on a headless machine. Because I can't know in what environments these machines start up I need a fallback profile in the DHCP client. This is where NM fails. I have been implementing DHCPCD on the previous version of the software, using Init-V and I kept this client when moving to systemd with NM. When accessing the fallback profile I noticed the IP address being dropped several times and eventually the interface being dropped all together. I found some references of similar behaviour here and the answer was always that DHCPCD was not fully supported by NM and may be dropped all together. So I switched to dhclient and the only difference is that the interface is dropped even sooner. In other words: NM does not respond correctly when a fallback profile is defined and used. The cause appears to be that both clients specify something other than reason="BOUND" in this case: DHCPCD returns reason="STATIC" and dhclient returns a more generic reason="TIMEOUT" that doesn't inform at all that there is in fact a binding.
how do you configure the fallback profile? Note that NetworkManager is connection based, thus putting a fallback profile to the global /etc means that the fallback profile will be used for every connection. And as you said, it depends on the DHCP plugin, and is not accessible via D-Bus. That doesn't fly well... dhcpcd is supported as well as there are users who use it, report bugs and submit patches. I don't know who uses it, for who it works and how well. Why can you not just configure an additional static IP address to the connection and set ipv4.may-fail=yes?
The fallback profile is defined in the dhcp client: /etc/dhcpcd.conf: +-+-+-+-+-+-+ # static profile for eth1 profile static_eth1 static ip_address=192.168.10.1/24 static domain_name=localdomain static domain_search=localdomain static domain_name_servers=8.8.8.8 interface eth1 fallback static_eth1 noipv4ll timeout 10 +-+-+-+-+-+-+ /etc/dhcp/dhclient.conf +-+-+-+-+-+-+ # static profile for eth1 lease { interface "eth1"; fixed-address 192.168.10.1; option domain-name "localdomain"; option subnet-mask 255.255.255.0; option routers 192.168.10.1; renew 2 2026/7/1 00:00:01; rebind 2 2026/7/1 00:00:01; expire 2 2026/7/1 00:00:01; } timeout 10; +-+-+-+-+-+-+ This works correctly when running either client from shell. The problem is that the fallback IP is ignored and NM restarts the client several times before giving up. ipv4.may-fail is default, i.e. "yes", but it would appear that this does not mean "one-shot" but a limited retry count. Which is why I do get intermittent access prior to NM shutting down the interface completely. AFAIK NM does not have any IP fallback methods/profiles or conditional starts of its own. It seems impossible that I missed that option, because I researched literally all the nmlib docs for creating a tool that reconfigures NM connections based on input from the web UI. I can't have the fallback profile to be active as an always enabled second address on that interface, because there is a DHCP server that will be serving addresses in that range and cause conflicts when inserted into an existing network with another DHCP server. As such, the suggested workaround does not appear suited for my needs.
(In reply to Gordon Bos from comment #2) > AFAIK NM does not have any IP fallback methods/profiles or conditional > starts of its own. Maybe you can create two connection profiles, one with: connection.autoconnect=yes connection.autoconnect-priority=20 ipv4.method=auto ipv4.may-fail=no and another with: connection.autoconnect=yes connection.autoconnect-priority=10 ipv4.method=manual ipv4.address=192.168.10.1/24 ipv4.may-fail=no When NM is started or the device connected the first connection will be started. If it fails NM will activate the second one. In order to speed up the failover you can also decrease the DHCP timeout changing the ipv4.dhcp-timeout property (since NM 1.2).
I'm intrigued by that last comment. So essentially what you're saying here is that ipv4.may-fail refers to the device rather than the connection? I definitely missed that part. Regretfully this workaround conflicts with the method I set in the top level manager, which deletes connections that interfere with the settings controlled by the web front end. While not impossible to adapt the code for creating two connections rather than one (which does in fact happen when this particular interface is switched to become part of a bridge with a wifi interface) it will in fact be much simpler to replace /usr/libexec/nm-dhcp-helper with a shell script that replaces reason="TIMEOUT" (reason="STATIC" with dhcpcd) with reason="BOUND" prior to calling the original nm-dhcp-helper application. BTW The highest available version of NM in Gentoo's stable package tree is 1.0.12 and I gather from the comment that this version has its own preset dhcp timeout. Question: could the much smaller timeout set in the dhcp client itself (10 seconds) account for the retry attempts I'm seeing performed by NM? Does this mean that regardless of what the dhcp client is configured to do NM will stall boot procedure until it reached its own set timeout? That seems like ill behaviour to me as well. Particularly when observing the behaviour with dhcpcd which literally takes minutes before NM shuts down the interface.
Returning on that shell script to capture nm-dhcp-helper calls: As it happened that proved to be quite some challenge. Reason being that the method to offer a fallback address in dhclient is actually a "first" lease that can be overruled by later successful leases. So the last received lease becomes the fallback address, making it unpredictable. In the end I came up with the following script: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ #!/bin/sh # dhcpcd fallback address if [ "$reason" = "STATIC" ];then export reason=BOUND fi # dhclient "fallback" if [ "$reason" = "TIMEOUT" ] && [ "$interface" = "eth1" ];then if [ "$new_ip_address" != "" ] && [ "$new_ip_address" != "192.168.10.1" ];then rm /var/lib/NetworkManager/dhclient-*-eth1.lease fi if [ "$new_ip_address" = "192.168.10.1" ]; then export reason=BOUND fi fi /usr/local/usr/libexec/nm-dhcp-helper +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ But the final verdict has to be that dhclient is a really bad choice to use as a back end for DHCP. As a result, bug 768362 therefore should be redefined as "DHCP should accept reason=STATIC similar to reason=BOUND". Which will provide correct behaviour with dhcpcd as a back end. Of course the better solution would be for NM itself to provide the fallback address feature in a transparent way.
(In reply to Gordon Bos from comment #4) > I'm intrigued by that last comment. So essentially what you're saying here > is that ipv4.may-fail refers to the device rather than the connection? I > definitely missed that part. I didn't say that, at least intentionally :) The two snippets I showed are part of two different connections and the may-fail property is per-connection. > Regretfully this workaround conflicts with the method I set in the top level > manager, which deletes connections that interfere with the settings > controlled by the web front end. While not impossible to adapt the code for > creating two connections rather than one (which does in fact happen when > this particular interface is switched to become part of a bridge with a wifi > interface) it will in fact be much simpler to replace > /usr/libexec/nm-dhcp-helper with a shell script that replaces > reason="TIMEOUT" (reason="STATIC" with dhcpcd) with reason="BOUND" prior to > calling the original nm-dhcp-helper application. > > BTW The highest available version of NM in Gentoo's stable package tree is > 1.0.12 and I gather from the comment that this version has its own preset > dhcp timeout. Yes, in 1.0 the DHCP timeout was fixed to 45 seconds. > Question: could the much smaller timeout set in the dhcp client itself (10 > seconds) account for the retry attempts I'm seeing performed by NM? Unfortunately NM will still perform 4 tries before switching to a different connection, and use the given timeout value at each attempt. Currently there is no way to change the number of autoconnect retries. > Does this mean that regardless of what the dhcp client is configured to do NM > will stall boot procedure until it reached its own set timeout? That seems > like ill behaviour to me as well. Particularly when observing the behaviour > with dhcpcd which literally takes minutes before NM shuts down the interface. What do you mean by "stall the boot procedure"? The lack of replies from a DHCP server only causes that device to be stuck in the connecting state until the timeout if reached. It doesn't influence other devices or the boot of the system (unless there are services that explicitly depend on network connectivity, but in that case blocking the boot is what the user told to do).
Ah! But that would be really bad. As stated, I'm developing against a headless machine and as such I cannot have a network failed state stopping services from starting even if the result is that the machine receives a valid IP on either interface. If the system does not start at least the web server and/or ssh the owner cannot control or access it. Yes I did notice the 45 seconds timeout set by NM. I'm also starting to grasp the difference between dhcpcd giving me several minutes with the configured fallback address and dhclient dropping the remembered lease almost instantly. Being that dhclient returns a timeout state in 'reason', which is supported (so you can in fact restrict timeout to less than 45 seconds in 1.0.x), while dhcpcd returns an unsupported state and NM continues listening for it to return something that is recognized until timing out itself. Recap for devs: 1) Please add reason="STATIC" to the list of supported dhcp return states, with the same meaning as "BOUND". 2) It is my understanding that NM is supposed to have its own internal dhcp client. Please add fallback IP support similar to dhcpcd to either this new internal client or top level NM itself. The latter would of course allow NM to coexist with dhclient again if this feature is needed.
Okay, I found a solution. Which as it happens is a two step fix because there is a bug in dhcpcd as well. 1) Don't know how this will work out in email, but the following patch allows NM to accept the fallback static address from dhcpcd: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NetworkManager needs to accept dhcpcd static profile to allow headless machines to have a fixed fallback address where they can be accessed. Patch by Gordon Bos --- src/dhcp-manager/nm-dhcp-client.c +++ src/dhcp-manager/nm-dhcp-client.c @@ -202,6 +202,7 @@ { if (g_ascii_strcasecmp (reason, "bound") == 0 || g_ascii_strcasecmp (reason, "bound6") == 0 || + g_ascii_strcasecmp (reason, "static") == 0 || g_ascii_strcasecmp (reason, "renew") == 0 || g_ascii_strcasecmp (reason, "renew6") == 0 || g_ascii_strcasecmp (reason, "reboot") == 0 || +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2) When dhcpcd accesses a fallback profile it ignores the script parameter from the command line. As a result, it does not call nm-dhcp-helper but runs its regular dhcpcd-run-hooks script. Obviously this is something for the dhcpcd devs to handle, but a workaround may be added to the docs. My final entry in /etc/dhcpcd.conf is as follows: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ # static profile for eth1 profile default_LAN static ip_address=192.168.10.1/24 static domain_name=localdomain script /usr/libexec/nm-dhcp-helper # config for eth1 interface eth1 fallback default_LAN noipv4ll timeout 10 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Status update: The issue with dhcpcd ignoring the script parameter when accessing a fallback profile has been resolved by the dhcpcd dev and he even supplied a patch for me to use with the version of dhcpcd provided by my distro. Details here: https://forums.gentoo.org/viewtopic-t-1047062.html So it's all up to you guys now to add "static" as a valid dhcp return state.
bugzilla.gnome.org is being shut down in favor of a GitLab instance. We are closing all old bug reports and feature requests in GNOME Bugzilla which have not seen updates for a long time. If you still use NetworkManager and if you still see this bug / want this feature in a recent and supported version of NetworkManager, then please feel free to report it at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/ Thank you for creating this report and we are sorry it could not be implemented (workforce and time is unfortunately limited).