Bug 746440 – improve behavior for assumed and unmanaged devices, do better at seamless take over, and don't touch devices

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 746440 - improve behavior for assumed and unmanaged devices, do better at seamless take over, and don't touch devices


Summary:	improve behavior for assumed and unmanaged devices, do better at seamless tak...


Status:	RESOLVED OBSOLETE

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Thomas Haller
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:	746566
Blocks:	699843 nm-next

Reported:	2015-03-19 10:30 UTC by Thomas Haller
Modified:	2020-11-12 14:32 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Thomas Haller 2015-03-19 10:30:47 UTC

Current state:
==============

Currently a device can be unmanaged. For example, some device types are unmanged by default, and you can configure keyfile.unmanged-devices.

An unmanaged devices still has an IPConfig D-Bus object associated, but it doesn't have a connection and no active-connection (even if it is up).

Also, it cannot be activate -- unless it is only unmanaged due to user-unmanaged, in which case a user up-command can activate the connection.



Otherwise, when a device is externally configured, we assume a connection. Either by reusing one of the persistent connections (if applicable) or generating a temporary assumed connection.

The connection appears to be activate normally, but we kinda promise that we don't mess with it.



The promise that we don't touch devices is shallow. Especially if the assumed connection has ipv6.method=auto, we issue RS and accept Router Announcements. We also accept callbacks from the dhclient helper (nm-dhcp-helper) and will extend DHCP leases.


How it should be
================


"unmanaged" should become more like assumed is now. We should always generate an assumed connection for it and pretend that the device is active (if it is configured). But with the additional promise that we really, really don't mess with the device. Not at all. Ever. User told us not to, so we don't do anything except observing.

"assumed" should be more like a seamless-take-over. This is especially needed if you stop and restart NM. On stop it leaves the device up, but on restart it must seamlessly take over it -- but with managing it afterwards.
Especially, on such an assumed device it must handle new DHCP leases and do SLAAC.

currently assuming a connection tells you that it won't manage your device, but it still does some careful managing.

Comment 1 Dan Williams 2015-04-17 16:44:07 UTC

Agreed with "How it should be".  For the "assumed" state, if we can't tag the interface somehow in the kernel, then I guess we could serialize some state to /run and try to match that state on startup (if it's new enough), and then if the state matches the interface is "assumed".  If the state doesn't match it's "unmanaged".  The state would include most attributes of the interface, addresses, routes, L2 properties, /proc/sys/net/conf stuff, etc.

Comment 2 Thomas Haller 2015-06-02 13:43:07 UTC

This depends on bug 746566.

Comment 3 Thomas Haller 2015-11-09 16:53:03 UTC

Proposed solution should look like: https://mail.gnome.org/archives/networkmanager-list/2015-November/msg00031.html

Comment 4 Thomas Haller 2016-02-23 09:59:23 UTC

See also: bug 699843

Comment 5 Colin Walters 2016-05-11 18:07:19 UTC

Is this the bug where we figure out how to get NM to ignore Docker veth devices again?

Following: https://bugzilla.gnome.org/show_bug.cgi?id=731014

Two years later, with: NetworkManager-1.2.0-1.fc24.x86_64

I still see huge amounts of log spam from NM every time Docker creates a veth device.

Comment 6 Dan Williams 2016-08-17 18:43:45 UTC

(In reply to Colin Walters from comment #5)
> Is this the bug where we figure out how to get NM to ignore Docker veth
> devices again?
> 
> Following: https://bugzilla.gnome.org/show_bug.cgi?id=731014
> 
> Two years later, with: NetworkManager-1.2.0-1.fc24.x86_64
> 
> I still see huge amounts of log spam from NM every time Docker creates a
> veth device.

NM won't touch them, but NM will see the device and expose it's configuration on D-Bus like any other device.  That's likely the logspam you're seeing.  But there should be no change to the veth configuration if it's created outside of NM.

Comment 7 Colin Walters 2016-08-19 13:38:28 UTC

Maybe there's not, but it's a lot of logspam.  Do I really need to know over and over that `veth` doesn't have carrier every time I start a Docker container?

Aug 18 09:20:04 wlan-196-48.bos.redhat.com NetworkManager[1144]: <info>  [1471526404.2288] device (veth74b5778): driver 'veth' does not support carrier detection.

Including for short-lived containers, warnings like:

Aug 18 09:20:08 wlan-196-48.bos.redhat.com NetworkManager[1144]: <warn>  [1471526408.9017] device (veth9c8beab): failed to find device 24 'veth9c8beab' with udev

(Since the device disappears by the time NM probes it)

NM isn't the only offender here - avahi,libvirt,and udev also try to respond to veth devices:

Aug 18 09:20:04 wlan-196-48.bos.redhat.com systemd-udevd[5807]: Could not generate persistent MAC address for veth9b0eae4: No such file or directory
Aug 18 09:20:05 wlan-196-48.bos.redhat.com avahi-daemon[1048]: Registering new address record for fe80::5845:21ff:fed5:5c9b on veth9b0eae4.*.
Aug 18 09:20:04 wlan-196-48.bos.redhat.com libvirtd[1195]: ethtool ioctl error: No such device


But I feel that NM should be leading here - if e.g. we define a mechanism to instruct host system management daemons to ignore a device, NM would be a good place to implement it first and have others follow.

Comment 8 Thomas Haller 2016-08-20 08:13:59 UTC

(In reply to Dan Williams from comment #6)
>
> NM won't touch them,

IMO that is not true.

"Assuming a connection on the device" means for NM the contradicting things "seamless take-over" and "externally managed". As they contradict, you cannot fix one without breaking the other.

For the "seamless take-over" NM starts DHCP, toggling disable_ipv6, etc. Certainly it touches the devices.

This bug really causes issues all over the place, and the solution is detailed above, especially comment 3.

Comment 9 Beniamino Galvani 2017-03-08 08:18:40 UTC

First part of th/assume-vs-unmanaged-bgo746440 up to "device: set user-explicit unmanaged flag based on loaded device-state" looks good to me.

Comment 10 Lubomir Rintel 2017-03-10 09:48:04 UTC

> cb5c5f2 core: minor cleanups

Why do you export nm_manager_get_connection_device() here?

> 7279f00 core/trivial: give symbols in src/nm-dispatcher.h header an "NM" prefix

Well, technically you don't touch a single symbol; just types and macros.
Just nitpicking though.

> 6e6f4c8 core/dispatcher: add and use nm_dispatcher_call_hostname()

Why does it have paraemters you don't use anyway?

> b89327b core/dispatcher: cleanup nm_dispatcher_call_connectivity()


> 60302fc core/dispatcher: pass act-request to device dispatcher calls
>     on the settings connection. That is not optimal, because whether
>     a connection is assumed/external, is a property of the

What does it mean that a connection is assumed/external?

> dd24efa device: return nm_device_master_add_slave() whether a slave was added


> f316764 core/trivial: rename nm-generated-assumed flag to garbage-collect

I don't think garbage_collect is a good name here.
Would you object naming this "volatile"?

> 5e325cc device: pass the user-explict flag to nm_device_realize_start()

> @@ -2373,6 +2375,7 @@ link_type_compatible (NMDevice *self,
>   * nm_device_realize_start():
>   * @self: the #NMDevice
>   * @plink: an existing platform link or %NULL
> + * @unmanaged_user_explicit: the user-explicit unmanaged flag
>   * @out_compatible: %TRUE on return if @self is compatible with @plink
>   * @error: location to store error, or %NULL

The docstring change is not very helpful. It is still unclear what are the
implications of setting different values here.

> a95aa3c device: set user-explicit unmanaged flag based on loaded device-state

> baa9421 core: add activation-type property to active-connection

Why is NMActivationType needed? It looks like a just a boolean.

> + * @activation_type: the #NMActivationType.

The docstring is insufficient.

Also, an extra blank line at the end of the file.

> 311f7f7 active-connection: use activation-type for active connection instead of assumed flag

> d9c845d core/trivial: rename activation-type related checks for device and active-connection

> -       if (nm_device_uses_assumed_connection ((NMDevice *) self))
> +       if (nm_device_has_activation_type_assume_or_external ((NMDevice *) self))

Okay, what is "external" here?

This really needs documentation.

> f237f37 core: track external activations types in the active-connection

Uh, okay -- this commit probably explains my concerns about what is an
"external" activation or why is boolean not sufficient to track activation type.

I'm not removing them -- perhaps you could comment on your intentions in the
commit messages to make reviews (and possibly going back in git history to understand
original intentions in future) easier.

Maybe some of the concepts could use a standalone documentation document,
perhaps start a Documentation/connection-model.txt or something. Just an idea
of course. comment #0 might be a good start too.

> fef6a10 core: only assume connections that were managed in a previous run of NetworkManager
> c37228e core: upgrade EXTERNAL activation type when user saves connection
> 6dd75f6 core: once activated an assumed connection make it NM_ACTIVATION_TYPE_FULL
> fe98047 manager: merge/inline assume_connection() in recheck_assume_connection()

Comment 11 Beniamino Galvani 2017-03-13 19:59:52 UTC

> core/dispatcher: pass act-request to device dispatcher calls

+ * @act_request: the #NMActRequest for the action. If %NULL, us the
+ *   current request of the device.

s/us/use/


> core/dispatcher: pass act-request to device dispatcher calls

Triggers the following assertion, not sure why:

(gdb) backtrace

+ Trace 237257

#0 __GI_raise
at ../sysdeps/unix/sysv/linux/raise.c line 58
#1 __GI_abort
at abort.c line 89
#2 g_assertion_message
#3 g_assertion_message_expr
#4 nm_dispatcher_call_device
at src/nm-dispatcher.c line 751
#5 _set_state_full
at src/devices/nm-device.c line 12298
#6 nm_device_state_changed
at src/devices/nm-device.c line 12326
#7 queued_state_set
at src/devices/nm-device.c line 12352
#8 g_idle_dispatch
at gmain.c line 5545
#9 g_main_dispatch
at gmain.c line 3203
#10 g_main_context_dispatch
at gmain.c line 3856
#11 g_main_context_iterate
at gmain.c line 3929
#12 g_main_loop_run
at gmain.c line 4125
#13 main
at src/main.c line 423

$6 = (NMDevice *) 0x0


> core: add activation-type property to active-connection

+              (long long unsigned) priv->version_id,

Just a matter of preference, and pre-existing, but can we use
(unsigned long long) instead?


> core/trivial: rename activation-type related checks for device and active-connection

 static void
 parent_hwaddr_maybe_changed (NMDevice *parent,
                              GParamSpec *pspec,
                              gpointer user_data)
 {
[...]
        /* Never touch assumed devices */
-       if (nm_device_uses_assumed_connection ((NMDevice *) self))
+       if (nm_device_has_activation_type_assume_or_external ((NMDevice *) self))

We fully manage assumed devices, and thus perhaps we should keep
adjusting the MAC address if the parent changes for them (as opposed
to external ones, where we never touch the device)? i.e. this could be
changed to nm_device_has_activation_type_external()

The rest LGTM.

Comment 12 Thomas Haller 2017-03-16 17:37:29 UTC

merged https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2d1b85f8d7f1e53b581e56f0f542b63e8a80da98


leaving bug open, because there is still (much) more to do. This is only another step.

Comment 13 Dan Williams 2017-03-18 04:17:43 UTC

Sorry I'm late to the party.  In general this approach looks OK to me.  The only corner case I see is one like:

1) a device is managed and has a persistent connection
2) user makes the device unmanaged
3) user sets same configuration on the device as persistent connection
4) now NM will make a new in-memory connection for this external/unmanaged device, and we'll essentially have duplicate connections.  This could be very confusing to the user.

Comment 14 Thomas Haller 2017-03-21 14:03:31 UTC

(In reply to Dan Williams from comment #13)
> Sorry I'm late to the party.  In general this approach looks OK to me.  The
> only corner case I see is one like:
> 
> 1) a device is managed and has a persistent connection
> 2) user makes the device unmanaged
> 3) user sets same configuration on the device as persistent connection
> 4) now NM will make a new in-memory connection for this external/unmanaged
> device, and we'll essentially have duplicate connections.  This could be
> very confusing to the user.

the unmanaged-by-user flag is also persisted to the state-file.
If the user makes the devices unmanaged and restart NM, the device will stay unmanaged and no connection will be generated.

Comment 15 André Klapper 2020-11-12 14:32:01 UTC

bugzilla.gnome.org is being shut down in favor of a GitLab instance. 
We are closing all old bug reports and feature requests in GNOME Bugzilla which have not seen updates for a long time.

If you still use NetworkManager and if you still see this bug / want this feature in a recent and supported version of NetworkManager, then please feel free to report it at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/

Thank you for creating this report and we are sorry it could not be implemented (workforce and time is unfortunately limited).