Bug 768362 – DHCP should support fallback profiles

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 768362 - DHCP should support fallback profiles


Summary:	DHCP should support fallback profiles


Status:	RESOLVED OBSOLETE

Product:	NetworkManager
Classification:	Platform
Component:	IP and DNS config
Version:	1.5.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-07-04 09:24 UTC by Gordon Bos
Modified:	2020-11-12 14:33 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Gordon Bos 2016-07-04 09:24:13 UTC

I am the maintainer of a web based management tool for running Gentoo Linux on a headless machine. Because I can't know in what environments these machines start up I need a fallback profile in the DHCP client. This is where NM fails.

I have been implementing DHCPCD on the previous version of the software, using Init-V and I kept this client when moving to systemd with NM. When accessing the fallback profile I noticed the IP address being dropped several times and eventually the interface being dropped all together. I found some references of similar behaviour here and the answer was always that DHCPCD was not fully supported by NM and may be dropped all together. So I switched to dhclient and the only difference is that the interface is dropped even sooner.

In other words: NM does not respond correctly when a fallback profile is defined and used. The cause appears to be that both clients specify something other than reason="BOUND" in this case: DHCPCD returns reason="STATIC" and dhclient returns a more generic reason="TIMEOUT" that doesn't inform at all that there is in fact a binding.

Comment 1 Thomas Haller 2016-07-04 10:00:26 UTC

how do you configure the fallback profile? Note that NetworkManager is connection based, thus putting a fallback profile to the global /etc means that the fallback profile will be used for every connection. And as you said, it depends on the DHCP plugin, and is not accessible via D-Bus. That doesn't fly well...

dhcpcd is supported as well as there are users who use it, report bugs and submit patches. I don't know who uses it, for who it works and how well.


Why can you not just configure an additional static IP address to the connection and set ipv4.may-fail=yes?

Comment 2 Gordon Bos 2016-07-04 12:14:00 UTC

The fallback profile is defined in the dhcp client:

/etc/dhcpcd.conf:
+-+-+-+-+-+-+
# static profile for eth1
profile static_eth1
static ip_address=192.168.10.1/24
static domain_name=localdomain
static domain_search=localdomain
static domain_name_servers=8.8.8.8

interface eth1
fallback static_eth1

noipv4ll
timeout 10
+-+-+-+-+-+-+


/etc/dhcp/dhclient.conf
+-+-+-+-+-+-+
# static profile for eth1
lease {
  interface "eth1";
  fixed-address 192.168.10.1;
  option domain-name "localdomain";
  option subnet-mask 255.255.255.0;
  option routers 192.168.10.1;
  renew 2 2026/7/1 00:00:01;
  rebind 2 2026/7/1 00:00:01;
  expire 2 2026/7/1 00:00:01;
}

timeout 10;
+-+-+-+-+-+-+

This works correctly when running either client from shell. The problem is that the fallback IP is ignored and NM restarts the client several times before giving up. ipv4.may-fail is default, i.e. "yes", but it would appear that this does not mean "one-shot" but a limited retry count. Which is why I do get intermittent access prior to NM shutting down the interface completely.

AFAIK NM does not have any IP fallback methods/profiles or conditional starts of its own. It seems impossible that I missed that option, because I researched literally all the nmlib docs for creating a tool that reconfigures NM connections based on input from the web UI. I can't have the fallback profile to be active as an always enabled second address on that interface, because there is a DHCP server that will be serving addresses in that range and cause conflicts when inserted into an existing network with another DHCP server. As such, the suggested workaround does not appear suited for my needs.

Comment 3 Beniamino Galvani 2016-07-05 12:43:42 UTC

(In reply to Gordon Bos from comment #2)

> AFAIK NM does not have any IP fallback methods/profiles or conditional
> starts of its own. 

Maybe you can create two connection profiles, one with:

 connection.autoconnect=yes
 connection.autoconnect-priority=20
 ipv4.method=auto
 ipv4.may-fail=no

and another with:

 connection.autoconnect=yes
 connection.autoconnect-priority=10
 ipv4.method=manual
 ipv4.address=192.168.10.1/24
 ipv4.may-fail=no

When NM is started or the device connected the first connection will
be started. If it fails NM will activate the second one. In order to
speed up the failover you can also decrease the DHCP timeout changing
the ipv4.dhcp-timeout property (since NM 1.2).

Comment 4 Gordon Bos 2016-07-05 14:21:28 UTC

I'm intrigued by that last comment. So essentially what you're saying here is that ipv4.may-fail refers to the device rather than the connection? I definitely missed that part.

Regretfully this workaround conflicts with the method I set in the top level manager, which deletes connections that interfere with the settings controlled by the web front end. While not impossible to adapt the code for creating two connections rather than one (which does in fact happen when this particular interface is switched to become part of a bridge with a wifi interface) it will in fact be much simpler to replace /usr/libexec/nm-dhcp-helper with a shell script that replaces reason="TIMEOUT" (reason="STATIC" with dhcpcd) with reason="BOUND" prior to calling the original nm-dhcp-helper application.

BTW The highest available version of NM in Gentoo's stable package tree is 1.0.12 and I gather from the comment that this version has its own preset dhcp timeout.

Question: could the much smaller timeout set in the dhcp client itself (10 seconds) account for the retry attempts I'm seeing performed by NM? Does this mean that regardless of what the dhcp client is configured to do NM will stall boot procedure until it reached its own set timeout? That seems like ill behaviour to me as well. Particularly when observing the behaviour with dhcpcd which literally takes minutes before NM shuts down the interface.

Comment 5 Gordon Bos 2016-07-05 19:30:59 UTC

Returning on that shell script to capture nm-dhcp-helper calls:

As it happened that proved to be quite some challenge. Reason being that the method to offer a fallback address in dhclient is actually a "first" lease that can be overruled by later successful leases. So the last received lease becomes the fallback address, making it unpredictable.

In the end I came up with the following script:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
#!/bin/sh

# dhcpcd fallback address
if [ "$reason" = "STATIC" ];then
    export reason=BOUND
fi


# dhclient "fallback"
if [ "$reason" = "TIMEOUT" ] && [ "$interface" = "eth1" ];then
    if [ "$new_ip_address" != "" ] && [ "$new_ip_address" != "192.168.10.1" ];then
        rm /var/lib/NetworkManager/dhclient-*-eth1.lease
    fi
    if [ "$new_ip_address" = "192.168.10.1" ]; then
        export reason=BOUND
    fi
fi

/usr/local/usr/libexec/nm-dhcp-helper
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

But the final verdict has to be that dhclient is a really bad choice to use as a back end for DHCP. As a result, bug 768362 therefore should be redefined as "DHCP should accept reason=STATIC similar to reason=BOUND". Which will provide correct behaviour with dhcpcd as a back end. Of course the better solution would be for NM itself to provide the fallback address feature in a transparent way.

Comment 6 Beniamino Galvani 2016-07-06 14:30:59 UTC

(In reply to Gordon Bos from comment #4)
> I'm intrigued by that last comment. So essentially what you're saying here
> is that ipv4.may-fail refers to the device rather than the connection? I
> definitely missed that part.

I didn't say that, at least intentionally :) The two snippets I showed
are part of two different connections and the may-fail property is
per-connection.

> Regretfully this workaround conflicts with the method I set in the top level
> manager, which deletes connections that interfere with the settings
> controlled by the web front end. While not impossible to adapt the code for
> creating two connections rather than one (which does in fact happen when
> this particular interface is switched to become part of a bridge with a wifi
> interface) it will in fact be much simpler to replace
> /usr/libexec/nm-dhcp-helper with a shell script that replaces
> reason="TIMEOUT" (reason="STATIC" with dhcpcd) with reason="BOUND" prior to
> calling the original nm-dhcp-helper application.
>
> BTW The highest available version of NM in Gentoo's stable package tree is
> 1.0.12 and I gather from the comment that this version has its own preset
> dhcp timeout.

Yes, in 1.0 the DHCP timeout was fixed to 45 seconds.

> Question: could the much smaller timeout set in the dhcp client itself (10
> seconds) account for the retry attempts I'm seeing performed by NM?

Unfortunately NM will still perform 4 tries before switching to a
different connection, and use the given timeout value at each
attempt. Currently there is no way to change the number of autoconnect
retries.

> Does this mean that regardless of what the dhcp client is configured to do NM
> will stall boot procedure until it reached its own set timeout? That seems
> like ill behaviour to me as well. Particularly when observing the behaviour
> with dhcpcd which literally takes minutes before NM shuts down the interface.

What do you mean by "stall the boot procedure"? The lack of replies
from a DHCP server only causes that device to be stuck in the
connecting state until the timeout if reached. It doesn't influence
other devices or the boot of the system (unless there are services
that explicitly depend on network connectivity, but in that case
blocking the boot is what the user told to do).

Comment 7 Gordon Bos 2016-07-06 17:04:08 UTC

Ah! But that would be really bad. As stated, I'm developing against a headless machine and as such I cannot have a network failed state stopping services from starting even if the result is that the machine receives a valid IP on either interface. If the system does not start at least the web server and/or ssh the owner cannot control or access it.

Yes I did notice the 45 seconds timeout set by NM. I'm also starting to grasp the difference between dhcpcd giving me several minutes with the configured fallback address and dhclient dropping the remembered lease almost instantly. Being that dhclient returns a timeout state in 'reason', which is supported (so you can in fact restrict timeout to less than 45 seconds in 1.0.x), while dhcpcd returns an unsupported state and NM continues listening for it to return something that is recognized until timing out itself.

Recap for devs:

1) Please add reason="STATIC" to the list of supported dhcp return states, with the same meaning as "BOUND".

2) It is my understanding that NM is supposed to have its own internal dhcp client. Please add fallback IP support similar to dhcpcd to either this new internal client or top level NM itself. The latter would of course allow NM to coexist with dhclient again if this feature is needed.

Comment 8 Gordon Bos 2016-07-07 15:55:10 UTC

Okay, I found a solution. Which as it happens is a two step fix because there is a bug in dhcpcd as well.

1) Don't know how this will work out in email, but the following patch allows NM to accept the fallback static address from dhcpcd:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
NetworkManager needs to accept dhcpcd static profile to allow
headless machines to have a fixed fallback address where they
can be accessed.

Patch by Gordon Bos

--- src/dhcp-manager/nm-dhcp-client.c
+++ src/dhcp-manager/nm-dhcp-client.c
@@ -202,6 +202,7 @@
 {
 	if (g_ascii_strcasecmp (reason, "bound") == 0 ||
 	    g_ascii_strcasecmp (reason, "bound6") == 0 ||
+	    g_ascii_strcasecmp (reason, "static") == 0 ||
 	    g_ascii_strcasecmp (reason, "renew") == 0 ||
 	    g_ascii_strcasecmp (reason, "renew6") == 0 ||
 	    g_ascii_strcasecmp (reason, "reboot") == 0 ||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


2) When dhcpcd accesses a fallback profile it ignores the script parameter from the command line. As a result, it does not call nm-dhcp-helper but runs its regular dhcpcd-run-hooks script. Obviously this is something for the dhcpcd devs to handle, but a workaround may be added to the docs. My final entry in /etc/dhcpcd.conf is as follows:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
# static profile for eth1
profile default_LAN
static ip_address=192.168.10.1/24
static domain_name=localdomain
script /usr/libexec/nm-dhcp-helper

# config for eth1
interface eth1
fallback default_LAN

noipv4ll
timeout 10
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Comment 9 Gordon Bos 2016-08-04 07:59:23 UTC

Status update:

The issue with dhcpcd ignoring the script parameter when accessing a fallback profile has been resolved by the dhcpcd dev and he even supplied a patch for me to use with the version of dhcpcd provided by my distro.

Details here: https://forums.gentoo.org/viewtopic-t-1047062.html

So it's all up to you guys now to add "static" as a valid dhcp return state.

Comment 10 André Klapper 2020-11-12 14:33:02 UTC

bugzilla.gnome.org is being shut down in favor of a GitLab instance. 
We are closing all old bug reports and feature requests in GNOME Bugzilla which have not seen updates for a long time.

If you still use NetworkManager and if you still see this bug / want this feature in a recent and supported version of NetworkManager, then please feel free to report it at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/

Thank you for creating this report and we are sorry it could not be implemented (workforce and time is unfortunately limited).