GNOME Bugzilla – Bug 746440
improve behavior for assumed and unmanaged devices, do better at seamless take over, and don't touch devices
Last modified: 2020-11-12 14:32:01 UTC
Current state: ============== Currently a device can be unmanaged. For example, some device types are unmanged by default, and you can configure keyfile.unmanged-devices. An unmanaged devices still has an IPConfig D-Bus object associated, but it doesn't have a connection and no active-connection (even if it is up). Also, it cannot be activate -- unless it is only unmanaged due to user-unmanaged, in which case a user up-command can activate the connection. Otherwise, when a device is externally configured, we assume a connection. Either by reusing one of the persistent connections (if applicable) or generating a temporary assumed connection. The connection appears to be activate normally, but we kinda promise that we don't mess with it. The promise that we don't touch devices is shallow. Especially if the assumed connection has ipv6.method=auto, we issue RS and accept Router Announcements. We also accept callbacks from the dhclient helper (nm-dhcp-helper) and will extend DHCP leases. How it should be ================ "unmanaged" should become more like assumed is now. We should always generate an assumed connection for it and pretend that the device is active (if it is configured). But with the additional promise that we really, really don't mess with the device. Not at all. Ever. User told us not to, so we don't do anything except observing. "assumed" should be more like a seamless-take-over. This is especially needed if you stop and restart NM. On stop it leaves the device up, but on restart it must seamlessly take over it -- but with managing it afterwards. Especially, on such an assumed device it must handle new DHCP leases and do SLAAC. currently assuming a connection tells you that it won't manage your device, but it still does some careful managing.
Agreed with "How it should be". For the "assumed" state, if we can't tag the interface somehow in the kernel, then I guess we could serialize some state to /run and try to match that state on startup (if it's new enough), and then if the state matches the interface is "assumed". If the state doesn't match it's "unmanaged". The state would include most attributes of the interface, addresses, routes, L2 properties, /proc/sys/net/conf stuff, etc.
This depends on bug 746566.
Proposed solution should look like: https://mail.gnome.org/archives/networkmanager-list/2015-November/msg00031.html
See also: bug 699843
Is this the bug where we figure out how to get NM to ignore Docker veth devices again? Following: https://bugzilla.gnome.org/show_bug.cgi?id=731014 Two years later, with: NetworkManager-1.2.0-1.fc24.x86_64 I still see huge amounts of log spam from NM every time Docker creates a veth device.
(In reply to Colin Walters from comment #5) > Is this the bug where we figure out how to get NM to ignore Docker veth > devices again? > > Following: https://bugzilla.gnome.org/show_bug.cgi?id=731014 > > Two years later, with: NetworkManager-1.2.0-1.fc24.x86_64 > > I still see huge amounts of log spam from NM every time Docker creates a > veth device. NM won't touch them, but NM will see the device and expose it's configuration on D-Bus like any other device. That's likely the logspam you're seeing. But there should be no change to the veth configuration if it's created outside of NM.
Maybe there's not, but it's a lot of logspam. Do I really need to know over and over that `veth` doesn't have carrier every time I start a Docker container? Aug 18 09:20:04 wlan-196-48.bos.redhat.com NetworkManager[1144]: <info> [1471526404.2288] device (veth74b5778): driver 'veth' does not support carrier detection. Including for short-lived containers, warnings like: Aug 18 09:20:08 wlan-196-48.bos.redhat.com NetworkManager[1144]: <warn> [1471526408.9017] device (veth9c8beab): failed to find device 24 'veth9c8beab' with udev (Since the device disappears by the time NM probes it) NM isn't the only offender here - avahi,libvirt,and udev also try to respond to veth devices: Aug 18 09:20:04 wlan-196-48.bos.redhat.com systemd-udevd[5807]: Could not generate persistent MAC address for veth9b0eae4: No such file or directory Aug 18 09:20:05 wlan-196-48.bos.redhat.com avahi-daemon[1048]: Registering new address record for fe80::5845:21ff:fed5:5c9b on veth9b0eae4.*. Aug 18 09:20:04 wlan-196-48.bos.redhat.com libvirtd[1195]: ethtool ioctl error: No such device But I feel that NM should be leading here - if e.g. we define a mechanism to instruct host system management daemons to ignore a device, NM would be a good place to implement it first and have others follow.
(In reply to Dan Williams from comment #6) > > NM won't touch them, IMO that is not true. "Assuming a connection on the device" means for NM the contradicting things "seamless take-over" and "externally managed". As they contradict, you cannot fix one without breaking the other. For the "seamless take-over" NM starts DHCP, toggling disable_ipv6, etc. Certainly it touches the devices. This bug really causes issues all over the place, and the solution is detailed above, especially comment 3.
First part of th/assume-vs-unmanaged-bgo746440 up to "device: set user-explicit unmanaged flag based on loaded device-state" looks good to me.
> cb5c5f2 core: minor cleanups Why do you export nm_manager_get_connection_device() here? > 7279f00 core/trivial: give symbols in src/nm-dispatcher.h header an "NM" prefix Well, technically you don't touch a single symbol; just types and macros. Just nitpicking though. > 6e6f4c8 core/dispatcher: add and use nm_dispatcher_call_hostname() Why does it have paraemters you don't use anyway? > b89327b core/dispatcher: cleanup nm_dispatcher_call_connectivity() > 60302fc core/dispatcher: pass act-request to device dispatcher calls > on the settings connection. That is not optimal, because whether > a connection is assumed/external, is a property of the What does it mean that a connection is assumed/external? > dd24efa device: return nm_device_master_add_slave() whether a slave was added > f316764 core/trivial: rename nm-generated-assumed flag to garbage-collect I don't think garbage_collect is a good name here. Would you object naming this "volatile"? > 5e325cc device: pass the user-explict flag to nm_device_realize_start() > @@ -2373,6 +2375,7 @@ link_type_compatible (NMDevice *self, > * nm_device_realize_start(): > * @self: the #NMDevice > * @plink: an existing platform link or %NULL > + * @unmanaged_user_explicit: the user-explicit unmanaged flag > * @out_compatible: %TRUE on return if @self is compatible with @plink > * @error: location to store error, or %NULL The docstring change is not very helpful. It is still unclear what are the implications of setting different values here. > a95aa3c device: set user-explicit unmanaged flag based on loaded device-state > baa9421 core: add activation-type property to active-connection Why is NMActivationType needed? It looks like a just a boolean. > + * @activation_type: the #NMActivationType. The docstring is insufficient. Also, an extra blank line at the end of the file. > 311f7f7 active-connection: use activation-type for active connection instead of assumed flag > d9c845d core/trivial: rename activation-type related checks for device and active-connection > - if (nm_device_uses_assumed_connection ((NMDevice *) self)) > + if (nm_device_has_activation_type_assume_or_external ((NMDevice *) self)) Okay, what is "external" here? This really needs documentation. > f237f37 core: track external activations types in the active-connection Uh, okay -- this commit probably explains my concerns about what is an "external" activation or why is boolean not sufficient to track activation type. I'm not removing them -- perhaps you could comment on your intentions in the commit messages to make reviews (and possibly going back in git history to understand original intentions in future) easier. Maybe some of the concepts could use a standalone documentation document, perhaps start a Documentation/connection-model.txt or something. Just an idea of course. comment #0 might be a good start too. > fef6a10 core: only assume connections that were managed in a previous run of NetworkManager > c37228e core: upgrade EXTERNAL activation type when user saves connection > 6dd75f6 core: once activated an assumed connection make it NM_ACTIVATION_TYPE_FULL > fe98047 manager: merge/inline assume_connection() in recheck_assume_connection()
> core/dispatcher: pass act-request to device dispatcher calls + * @act_request: the #NMActRequest for the action. If %NULL, us the + * current request of the device. s/us/use/ > core/dispatcher: pass act-request to device dispatcher calls Triggers the following assertion, not sure why: (gdb) backtrace
+ Trace 237257
$6 = (NMDevice *) 0x0 > core: add activation-type property to active-connection + (long long unsigned) priv->version_id, Just a matter of preference, and pre-existing, but can we use (unsigned long long) instead? > core/trivial: rename activation-type related checks for device and active-connection static void parent_hwaddr_maybe_changed (NMDevice *parent, GParamSpec *pspec, gpointer user_data) { [...] /* Never touch assumed devices */ - if (nm_device_uses_assumed_connection ((NMDevice *) self)) + if (nm_device_has_activation_type_assume_or_external ((NMDevice *) self)) We fully manage assumed devices, and thus perhaps we should keep adjusting the MAC address if the parent changes for them (as opposed to external ones, where we never touch the device)? i.e. this could be changed to nm_device_has_activation_type_external() The rest LGTM.
merged https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2d1b85f8d7f1e53b581e56f0f542b63e8a80da98 leaving bug open, because there is still (much) more to do. This is only another step.
Sorry I'm late to the party. In general this approach looks OK to me. The only corner case I see is one like: 1) a device is managed and has a persistent connection 2) user makes the device unmanaged 3) user sets same configuration on the device as persistent connection 4) now NM will make a new in-memory connection for this external/unmanaged device, and we'll essentially have duplicate connections. This could be very confusing to the user.
(In reply to Dan Williams from comment #13) > Sorry I'm late to the party. In general this approach looks OK to me. The > only corner case I see is one like: > > 1) a device is managed and has a persistent connection > 2) user makes the device unmanaged > 3) user sets same configuration on the device as persistent connection > 4) now NM will make a new in-memory connection for this external/unmanaged > device, and we'll essentially have duplicate connections. This could be > very confusing to the user. the unmanaged-by-user flag is also persisted to the state-file. If the user makes the devices unmanaged and restart NM, the device will stay unmanaged and no connection will be generated.
bugzilla.gnome.org is being shut down in favor of a GitLab instance. We are closing all old bug reports and feature requests in GNOME Bugzilla which have not seen updates for a long time. If you still use NetworkManager and if you still see this bug / want this feature in a recent and supported version of NetworkManager, then please feel free to report it at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/ Thank you for creating this report and we are sorry it could not be implemented (workforce and time is unfortunately limited).