GNOME Bugzilla – Bug 763236
NetworkManager 1.1.X prevents tap devices created by SSH from getting an IPv6 link-local address
Last modified: 2016-04-04 13:38:40 UTC
Hi! I occasionally use SSH as a simple Layer-2-VPN solution with *only* IPv6 link-local addresses on the automatically created tapX devices. NM 1.0.X simply ignored these devices, but NM 1.1.X seems to break this setup: root@shepard:~# NetworkManager --version 1.1.91 root@shepard:~# ssh -N -o Tunnel=ethernet -o TunnelDevice=5:5 root@hammond & [1] 3618 root@shepard:~# ip link set up dev tap5 root@shepard:~# ip address show dev tap5 7: tap5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 500 link/ether ca:12:a8:e7:2d:45 brd ff:ff:ff:ff:ff:ff As you can see, the tap5 device does not have an IPv6 link-local address. Stopping NetworkManager fixes this: root@shepard:~# systemctl stop NetworkManager root@shepard:~# ssh -N -o Tunnel=ethernet -o TunnelDevice=5:5 root@hammond & [1] 3644 root@shepard:~# ip link set up dev tap5 root@shepard:~# ip address show dev tap5 8: tap5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 500 link/ether 56:3d:ae:7a:d6:7f brd ff:ff:ff:ff:ff:ff inet6 fe80::543d:aeff:fe7a:d67f/64 scope link valid_lft forever preferred_lft forever Now an IPv6 link-local address has successfully been assigned. I had a quick look at the debug output of NetworkManager (--debug --log-level=debug), and it seems that it decides to fiddle with the /proc/sys/net/ipv6/conf/tap5/disable_ipv6 file for some reason. I'd be happy to provide detailed logs if requested, but I think the issue is easy enough to reproduce. All you need is a SSH client/server, root on both sides, and the "PermitTunnel" setting enabled on the server. Best regards Alexander Kurtz
Hi, this is a consequence of commit [1], which prevents NM from seamlessly take over a device when this has only a IPv6 link-local address. The reason behind this commit is that in most cases users want to activate an existing on-disk connection on devices and 'assuming' the current device state would prevent it. So in this case NM does not take over the configuration, puts the device in the DISCONNECTED state (removing the LL address) and then tries to activate on-disk connections for the device, which in this particular case don't exist. The quick workaround to this is to set assume-ipv6ll-only=* (or =tap5) in the [main] section of /etc/NetworkManager/NetworkManager.conf to restore old behavior. A more general solution would be to move the IPv6 deconfiguration from the DISCONNECTED state to the moment in which we actually start configuring a device. Or maybe only avoid the deconfiguration when we know that no connection exists for the device. I'll try to investigate these 2 ideas... BTW, this behavior was also reported in [2]. [1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3bc097b084bcabfc682b532991dd05cbe8e3161a [2] https://bugzilla.redhat.com/show_bug.cgi?id=1261809
Created attachment 323509 [details] [review] [PATCH] device: don't deconfigure interfaces which can't be assumed or activated This patch seems to fix the problem; and probably it also breaks other valid use cases I can't imagine right now (but at least the beaker test suite is all green). Any comments? Better ideas?
afais, the patch doesn't apply on master, nm-1-0, or master from 9th March.
Created attachment 324087 [details] [review] [PATCH v2] device: don't deconfigure interfaces which can't be assumed or activated (In reply to Thomas Haller from comment #3) > afais, the patch doesn't apply on master, nm-1-0, or master from 9th March. The patch should apply fine on current master with the --3way option. Anyway, here is the updated version.
I think this is not right. The device is externally created, but it moves the device state to "DISCONNECTED" - without actually being disconnected (i.e. there is local IPv6 connectivity) - without the user wanting NM to manage it I think such a device should stay as unmanaged, and can at any time be actively managed, for example via `nmcli device $DEVICE managed yes`. Basically, it's bug 746440. For other software devices we already do something similar. When a (for example) bridge gets added and is !IFF_UP, we set it NM_UNMANAGED_EXTERNAL_DOWN to achieve this. However, in this case, the software device *is* up, so the analogy ends there... What if we extend EXTERNAL_DOWN to also account for devices that are up, but have no IP configuration (except v6 link-local). How about branch th/unmanaged-external-down-bgo763236 ? (the first commit of that branch anyway seems to be a worthy cleanup).
Your approach seems more correct since it clearly distinguishes if we want to manage the device or not, however I think it works only for software devices created by NM (devices for which can_unmanaged_external_down() returns TRUE). There is a bug [1] similar to this one, but regarding ethernet devices and the situation is more complex in that case. We always want to try to activate an existing persistent connection for non-software devices and, for how things work now, the first step towards activation is DISCONNECTED. We must reach that state before checking if there are available connections, but at that point if there are none, we have already nuked the existing interface configuration (the link-local IPv6 address). [1] https://bugzilla.redhat.com/show_bug.cgi?id=1261809
I tested th/unmanaged-external-down-bgo763236: It fixes the clatd issue reported in #763236, and it does not have any problems with disconnecting wired ethernet interfaces like the attached patch has (see comment #4 in bug #763236). So functionality wise it LGTM (but I haven't at all looked at the code). Tore
Sorry: /comment #7/s/#763236/#761389/g
Ok, the approach in the attached patch is wrong since it changes the semantic of DISCONNECTED state. Thomas' branch looks good to me (only s/set_unmanged_external_down/set_unmanaged_external_down/ in last patch). We'll have to fix also [1], but that probably can be treated as a different bug. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1261809
> device: refactor setting unmanaged flag EXTERNAL_DOWN Can interfaces without an ifindex ever get to device_link_changed()? I don't think so? Quite possibly I'm missing something but I think we can assert/return if there ifindex <= 0 there. > device: consider external devices as unmanaged until they have an IP address I feel like we keep adding special cases here and things are already pretty complicated. Have we thought about just having externally-created software devices left in unmanaged unless (a) we can generate a connection for them and then assume that connection OR (b) there's a defined, non-generated, autoconnect=true connection that already exists for them? The cases where the user expects NM to activate a connection and Do Stuff (TM) on an externally generated interface are probably smaller in number than the cases where that external thing is doing stuff by itself.
(In reply to Dan Williams from comment #10) > > device: refactor setting unmanaged flag EXTERNAL_DOWN > > Can interfaces without an ifindex ever get to device_link_changed()? I > don't think so? Quite possibly I'm missing something but I think we can > assert/return if there ifindex <= 0 there. you are correct. Added fixup. > > device: consider external devices as unmanaged until they have an IP address > > I feel like we keep adding special cases here and things are already pretty > complicated. Have we thought about just having externally-created software > devices left in unmanaged unless (a) we can generate a connection for them > and then assume that connection OR (b) there's a defined, non-generated, > autoconnect=true connection that already exists for them? The cases where > the user expects NM to activate a connection and Do Stuff (TM) on an > externally generated interface are probably smaller in number than the cases > where that external thing is doing stuff by itself. That is what bug 746440 is about. IMO we shouldn't at all use nm_device_generate_connection() to create a NM_SETTINGS_CONNECTION_FLAGS_NM_GENERATED_ASSUMED connection, and neither use it to find a best-connection via nm_utils_match_connection(). As detailed in https://mail.gnome.org/archives/networkmanager-list/2015-November/msg00031.html . But I think that is too late now for 1.2. Repushed (split the commits a bit)
merged early part of branch to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=05326c205147ec054549d7099d76001dc5472af3 Rebased the remainder and re-pushed.
Should be fixed on master with https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2716c7f11592e00df5a37f181966c97e1ca2ca16