Bug 763236 – NetworkManager 1.1.X prevents tap devices created by SSH from getting an IPv6 link-local address

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 763236 - NetworkManager 1.1.X prevents tap devices created by SSH from getting an IPv6 link-local address


Summary:	NetworkManager 1.1.X prevents tap devices created by SSH from getting an IPv6...


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	IP and DNS config
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:	761389

Reported:	2016-03-07 14:34 UTC by Alexander Kurtz
Modified:	2016-04-04 13:38 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH] device: don't deconfigure interfaces which can't be assumed or activated (5.57 KB, patch) 2016-03-09 13:52 UTC, Beniamino Galvani	none	Details \| Review
[PATCH v2] device: don't deconfigure interfaces which can't be assumed or activated (4.79 KB, patch) 2016-03-16 11:07 UTC, Beniamino Galvani	none	Details \| Review

Description Alexander Kurtz 2016-03-07 14:34:03 UTC

Hi!

I occasionally use SSH as a simple Layer-2-VPN solution with *only* IPv6 link-local addresses on the automatically created tapX devices. NM 1.0.X simply ignored these devices, but NM 1.1.X seems to break this setup:

root@shepard:~# NetworkManager --version
1.1.91
root@shepard:~# ssh -N -o Tunnel=ethernet -o TunnelDevice=5:5 root@hammond &
[1] 3618
root@shepard:~# ip link set up dev tap5
root@shepard:~# ip address show dev tap5
7: tap5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 500
    link/ether ca:12:a8:e7:2d:45 brd ff:ff:ff:ff:ff:ff

As you can see, the tap5 device does not have an IPv6 link-local address. Stopping NetworkManager fixes this:

root@shepard:~# systemctl stop NetworkManager
root@shepard:~# ssh -N -o Tunnel=ethernet -o TunnelDevice=5:5 root@hammond &
[1] 3644
root@shepard:~# ip link set up dev tap5
root@shepard:~# ip address show dev tap5
8: tap5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 500
    link/ether 56:3d:ae:7a:d6:7f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::543d:aeff:fe7a:d67f/64 scope link 
       valid_lft forever preferred_lft forever

Now an IPv6 link-local address has successfully been assigned.

I had a quick look at the debug output of NetworkManager (--debug --log-level=debug), and it seems that it decides to fiddle with the /proc/sys/net/ipv6/conf/tap5/disable_ipv6 file for some reason. I'd be happy to provide detailed logs if requested, but I think the issue is easy enough to reproduce. All you need is a SSH client/server, root on both sides, and the "PermitTunnel" setting enabled on the server.

Best regards

Alexander Kurtz

Comment 1 Beniamino Galvani 2016-03-07 22:08:13 UTC

Hi,

this is a consequence of commit [1], which prevents NM from seamlessly
take over a device when this has only a IPv6 link-local address. The
reason behind this commit is that in most cases users want to activate
an existing on-disk connection on devices and 'assuming' the current
device state would prevent it. So in this case NM does not take over
the configuration, puts the device in the DISCONNECTED state (removing
the LL address) and then tries to activate on-disk connections for the
device, which in this particular case don't exist.

The quick workaround to this is to set assume-ipv6ll-only=* (or =tap5)
in the [main] section of /etc/NetworkManager/NetworkManager.conf to
restore old behavior.

A more general solution would be to move the IPv6 deconfiguration from
the DISCONNECTED state to the moment in which we actually start
configuring a device. Or maybe only avoid the deconfiguration when we
know that no connection exists for the device. I'll try to investigate
these 2 ideas...

BTW, this behavior was also reported in [2].

[1] https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3bc097b084bcabfc682b532991dd05cbe8e3161a
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1261809

Comment 2 Beniamino Galvani 2016-03-09 13:52:41 UTC

Created attachment 323509 [details] [review]
[PATCH] device: don't deconfigure interfaces which can't be assumed or activated

This patch seems to fix the problem; and probably it also breaks other
valid use cases I can't imagine right now (but at least the beaker
test suite is all green). Any comments? Better ideas?

Comment 3 Thomas Haller 2016-03-16 10:18:57 UTC

afais, the patch doesn't apply on master, nm-1-0, or master from 9th March.

Comment 4 Beniamino Galvani 2016-03-16 11:07:15 UTC

Created attachment 324087 [details] [review]
[PATCH v2] device: don't deconfigure interfaces which can't be assumed or activated

(In reply to Thomas Haller from comment #3)
> afais, the patch doesn't apply on master, nm-1-0, or master from 9th March.

The patch should apply fine on current master with the --3way option. Anyway, here is the updated version.

Comment 5 Thomas Haller 2016-03-21 14:34:15 UTC

I think this is not right.

The device is externally created, but it moves the device state to "DISCONNECTED"
  - without actually being disconnected (i.e. there is local IPv6 connectivity)
  - without the user wanting NM to manage it

I think such a device should stay as unmanaged, and can at any time be actively managed, for example via `nmcli device $DEVICE managed yes`. Basically, it's bug 746440.



For other software devices we already do something similar. When a (for example) bridge gets added and is !IFF_UP, we set it NM_UNMANAGED_EXTERNAL_DOWN to achieve this. However, in this case, the software device *is* up, so the analogy ends there...

What if we extend EXTERNAL_DOWN to also account for devices that are up, but have no IP configuration (except v6 link-local).
How about branch th/unmanaged-external-down-bgo763236 ?
(the first commit of that branch anyway seems to be a worthy cleanup).

Comment 6 Beniamino Galvani 2016-03-21 16:00:52 UTC

Your approach seems more correct since it clearly distinguishes if we
want to manage the device or not, however I think it works only for
software devices created by NM (devices for which
can_unmanaged_external_down() returns TRUE). There is a bug [1]
similar to this one, but regarding ethernet devices and the situation
is more complex in that case.

We always want to try to activate an existing persistent connection
for non-software devices and, for how things work now, the first step
towards activation is DISCONNECTED. We must reach that state before
checking if there are available connections, but at that point if
there are none, we have already nuked the existing interface
configuration (the link-local IPv6 address).

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1261809

Comment 7 Tore Anderson 2016-03-22 07:08:52 UTC

I tested th/unmanaged-external-down-bgo763236: It fixes the clatd issue reported in #763236, and it does not have any problems with disconnecting wired ethernet interfaces like the attached patch has (see comment #4 in bug #763236). So functionality wise it LGTM (but I haven't at all looked at the code).

Tore

Comment 8 Tore Anderson 2016-03-22 07:11:31 UTC

Sorry: /comment #7/s/#763236/#761389/g

Comment 9 Beniamino Galvani 2016-03-22 07:42:34 UTC

Ok, the approach in the attached patch is wrong since it changes the semantic of DISCONNECTED state. Thomas' branch looks good to me (only s/set_unmanged_external_down/set_unmanaged_external_down/ in last patch).

We'll have to fix also [1], but that probably can be treated as a different bug.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1261809

Comment 10 Dan Williams 2016-03-22 20:39:02 UTC

> device: refactor setting unmanaged flag EXTERNAL_DOWN

Can interfaces without an ifindex ever get to device_link_changed()?  I don't think so?  Quite possibly I'm missing something but I think we can assert/return if there ifindex <= 0 there.


> device: consider external devices as unmanaged until they have an IP address

I feel like we keep adding special cases here and things are already pretty complicated.  Have we thought about just having externally-created software devices left in unmanaged unless (a) we can generate a connection for them and then assume that connection OR (b) there's a defined, non-generated, autoconnect=true connection that already exists for them?  The cases where the user expects NM to activate a connection and Do Stuff (TM) on an externally generated interface are probably smaller in number than the cases where that external thing is doing stuff by itself.

Comment 11 Thomas Haller 2016-03-23 09:03:37 UTC

(In reply to Dan Williams from comment #10)
> > device: refactor setting unmanaged flag EXTERNAL_DOWN
> 
> Can interfaces without an ifindex ever get to device_link_changed()?  I
> don't think so?  Quite possibly I'm missing something but I think we can
> assert/return if there ifindex <= 0 there.

you are correct. Added fixup.


> > device: consider external devices as unmanaged until they have an IP address
> 
> I feel like we keep adding special cases here and things are already pretty
> complicated.  Have we thought about just having externally-created software
> devices left in unmanaged unless (a) we can generate a connection for them
> and then assume that connection OR (b) there's a defined, non-generated,
> autoconnect=true connection that already exists for them?  The cases where
> the user expects NM to activate a connection and Do Stuff (TM) on an
> externally generated interface are probably smaller in number than the cases
> where that external thing is doing stuff by itself.

That is what bug 746440 is about. IMO we shouldn't at all use nm_device_generate_connection() to create a NM_SETTINGS_CONNECTION_FLAGS_NM_GENERATED_ASSUMED connection, and neither use it to find a best-connection via nm_utils_match_connection(). As detailed in https://mail.gnome.org/archives/networkmanager-list/2015-November/msg00031.html .

But I think that is too late now for 1.2. 



Repushed (split the commits a bit)

Comment 12 Thomas Haller 2016-03-31 08:52:38 UTC

merged early part of branch to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=05326c205147ec054549d7099d76001dc5472af3

Rebased the remainder and re-pushed.

Comment 13 Thomas Haller 2016-04-04 13:38:40 UTC

Should be fixed on master with

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2716c7f11592e00df5a37f181966c97e1ca2ca16