Bug 737458 – [review] dcbw/devices-for-all: create unrealized devices for all connections

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 737458 - [review] dcbw/devices-for-all: create unrealized devices for all connections


Summary:	[review] dcbw/devices-for-all: create unrealized devices for all connections


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:	nm-1-2 752234

Reported:	2014-09-26 18:56 UTC by Dan Williams
Modified:	2015-12-04 11:29 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Dan Williams 2014-09-26 18:56:07 UTC

The goal of this branch is to create devices for all connections known to NetworkManager. These devices are first created as "unrealized", eg they have no resources backing them. When resources become available the device is realized.

Unrealized devices are always in the UNMANAGED state and when realized, potentially become managed based on the same criteria as before.

For example:

wlan0 wifi connected my home wifi
enp0s25 ethernet disconnected --
virbr0 bridge unavailable --
bond0 bond unmanaged --
br33 bridge unmanaged --
lo loopback unmanaged --
virbr0-nic tap unmanaged --
vlan0.3 vlan unmanaged --

The devices bond0, br33, and vlan0.3 have no kernel resources backing them and do not actually exist on the system. However, they have available connections and are set "realized=FALSE" in the D-Bus interface.

This behavior has a couple benefits:

1) it represents all things that *could* be devices as NMDevice objects, simplifying clients. Instead of a client having to build a list of "things I can activate" from both existing devices *and* connections, the client can just use the device list

2) it will make the "delete on deactivate" logic simpler because the device will exist over the netdev delete operation

------

The one outstanding issue is that this is all based on interface name, because interface name must be unique when realized.

But if we have a bond and a bridge connection with the interface name "external", the first connection wins and we'll end up with an NMDeviceBond named 'external'. Then if you try to "nmcli con up <bridge external>" activation will fail, because the Bridge connection isn't compatible with the NMDeviceBond.

I guess this somewhat complicates #1 above though, since a client depending on the NMDevice list wouldn't show the Bridge connection at all. Thoughts?

------

A further branch will create an NMDeviceVpn for each VPN connection.

Comment 1 Dan Winship 2014-09-29 18:14:28 UTC

(In reply to comment #0)
> But if we have a bond and a bridge connection with the interface name
> "external", the first connection wins and we'll end up with an NMDeviceBond
> named 'external'.  Then if you try to "nmcli con up <bridge external>"
> activation will fail, because the Bridge connection isn't compatible with the
> NMDeviceBond.
> 
> I guess this somewhat complicates #1 above though, since a client depending on
> the NMDevice list wouldn't show the Bridge connection at all.  Thoughts?

We should probably just forbid this case; NM can accept the definition of whichever connection it sees first, and reject the second one with a warning message.

> core: let device plugins advertise supported link and setting types

>+DEFINE_DEVICE_FACTORY_LINK_TYPES    (NM_LINK_TYPE_UNKNOWN)

NM_LINK_TYPE_NONE is the "error"/"unset" value for NMLinkType, and would be clearer just in terms of English too, I think. You could also reorganize the macros so that it worked like:

  DEFINE_DEVICE_FACTORY_SUPPORTED_TYPES (
      DEVICE_FACTORY_LINK_TYPES (NM_LINK_TYPE_WIMAX)
      DEVICE_FACTORY_SETTING_TYPES (NM_SETTING_WIMAX_SETTING_NAME)
  )

  DEFINE_DEVICE_FACTORY_SUPPORTED_TYPES (
      DEVICE_FACTORY_SETTING_TYPES (NM_SETTING_ADSL_SETTING_NAME)
  )

(the get_supported_types() code would fill in default return values before expanding the other macros, meaning you could just leave LINK_TYPES out if none were supported.)

>+	/* Ignore Bluetooth PAN interfaces; they are handled by their NMDeviceBt
>+	 * parent and don't get a separate interface.
>+	 */
>+	if (!strncmp (plink->name, "bnep", STRLEN ("bnep")))

need to get that out of nm-manager.c at some point...



> core: split device creation and device setup

>+	 * realize_existing():
>+...
>+	 * Returns: %TRUE on success or failure to find backing resources, %FALSE
>+	 * if some error ocurred when realizing the device

The "or failure to find backing resources" part is pretty weird.

It's also weird that NMDeviceVlan is the only class that implements this method. (It looks like other methods end up logging an error and then continuing if they fail in setup(), rather than bailing out beforehand like NMDeviceVlan.)

>+ * nm_device_realize_existing():
>+...
>+ * If backing resources (kernel devices, etc) exist, this function sets the
>+ * device up for use.

And what if they don't exist? But then, these docs also sound wrong; It looks to me like nm_device_realize_existing() is only called when we know that the device already exists.

Actually... it's not clear whether "realized" is supposed to mean "the device has backing resources", or "the device has backing resources *and* the NMDevice has been initialized from those resources". In particular, "nm_device_realize_existing()" only makes sense as a name if you assume the second definition (such that an NMDevice can have backing resources, but not be considered "realized" yet), but nm_device_unrealize() seems to assumes the first definition since it actually *removes* the backing resources in the software case.

Maybe nm_device_realize_existing() should just be nm_device_setup(), and nm_device_realize_new() should be nm_device_realize()?

>+device_get_driver_info (NMDevice *self,

It seems like NMDevice ought to be getting this info from an NMPlatform method. Actually, NMPlatformLink already has the driver, and if the ethtool ioctl would succeed, then it seems like we must also be guaranteed to have an NMPlatformLink.

>+nm_device_factory_create_device (NMDeviceFactory *factory,
>+                                 const char *iface,
>+                                 NMPlatformLink *plink,
>+                                 NMConnection *connection)

plink and connection seem to be unused. (Well, nm-wifi-factory and nm-wimax-factory use plink->name, but they should just use iface instead.)

(OK, they get used later...)

>+	if (   !nm_platform_bond_add (iface, out_plink)
>+		&& nm_platform_get_error () != NM_PLATFORM_ERROR_EXISTS) {

indentation

>+		/* If udev signaled the VLAN interface before it signaled
>+		 * the VLAN's parent at startup we may not know about the
>+		 * parent device yet.  But we'll find it on the second pass
>+		 * from nm_manager_start().
>+		 */

This is old code that's just being moved around within nm-device-vlan.c, but it's wrong/unnecessary; there is no second pass any more, since we guarantee that the first pass happens in a usable order.



> core: add "realized" NMDevice property

The nm-settings.c parts of "core: add "realized" NMDevice property" should probably be part of this commit.


> core: add class function for device unrealization

>+	g_clear_pointer (&priv->initial_hw_addr, g_free);

NMDeviceVlan initializes this field in update_initial_hw_address(), but clears it in unrealize(). Which is unbalanced...

Probably we should just drop the update_permanent_hw_address() and update_initial_hw_address() methods and have subclasses just do that as part of their setup(). (Though then we still have unrealize() being the opposite of both realize() and setup().)

>+		if (nm_device_is_software (device)) {
>+			if (!nm_device_unrealize (device, TRUE, &error)) {

Both nm-manager and nm-device are checking nm_device_is_software()...

Also, NMManager isn't calling remove_device() if it's a software device. Oh, which is eventually correct I guess, but I think it's not correct at this point in the branch.



> core: earlier software capability detection

> We know whether we can create interfaces of any given NMDevice
> subclass or not.

I think that's supposed to say "We need to know ..." ?

> and unfortunately again, libnl3 doesn't yet support InfiniBand
> attributes so we can't use netlink either.

We can. nm_rtnl_link_parse_info_data() exists specifically to get netlink attributes that libnl3 doesn't yet support.

>+#define NM_DEVICE_INFINIBAND_IS_VLAN "is-vlan"

I've tried to avoid calling them VLANs, since everywhere else in NM (and in 99% of the kernel), "VLAN" means specifically 802.1q. The specs refer to them as "partitions" (which is what the "P" in "P_Key" stands for).



> core: create devices first and realize them later

> 3) they must be looser in checking compatible connections until
> they are realized

maybe nm_device_realize_new() should finish up with:

  g_warn_if_fail (nm_device_check_connection_compatible (device, connection));

(that would be part of the original "split device creation and device setup" patch, not here)

>+device_link_master_changed (NMDevice *self, NMPlatformLink *plink)

That name suggests that it's called specifically when the device's master changes, which it's not. Maybe "device_recheck_slave_status()".

>@@ -1097,6 +1126,12 @@ device_link_changed (NMDevice *self, NMPlatformLink *info)
>... 
>+	if (info->driver && g_strcmp0 (priv->driver, info->driver) != 0) {

under what circumstances would that change?

>-	NM_DEVICE_GET_CLASS (self)->setup (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
>+	NM_DEVICE_GET_CLASS (self)->setup_start (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
>+	NM_DEVICE_GET_CLASS (self)->setup_finish (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);

Why is this split needed? I see that NMManager calls add_device() in between nm_device_realize_existing() and nm_device_setup_finish() in two places, but it doesn't seem like that ordering actually matters...

>+	/* Otherwise create backing resources if the device has any autoconnect connections */

"Otherwise" doesn't seem to refer to anything.

Can the code here use nm_device_get_best_auto_connection()?

> connection_changed (NMSettings *settings,
>-                    NMSettingsConnection *connection,
>+                    NMConnection *connection,
>                     NMManager *manager)
> {
>-	/* FIXME: Some virtual devices may need to be updated in the future. */

the FIXME still seems relevant: if I change a connection from bond0 to bond1, and there are no other bond0 connections, then we should destroy the bond0 NMDevice.

Likewise in connection_removed()... I'm not sure what the existing comment there is supposed to mean. Unless we emit "removed" for all connections on shutdown, which would be silly...


Do we need to worry about exporting unrealized NMDevices as effectively being a D-Bus ABI break? (If so we could have a separate all-devices property.)

Comment 2 Dan Williams 2014-10-06 17:19:10 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > But if we have a bond and a bridge connection with the interface name
> > "external", the first connection wins and we'll end up with an NMDeviceBond
> > named 'external'.  Then if you try to "nmcli con up <bridge external>"
> > activation will fail, because the Bridge connection isn't compatible with the
> > NMDeviceBond.
> > 
> > I guess this somewhat complicates #1 above though, since a client depending on
> > the NMDevice list wouldn't show the Bridge connection at all.  Thoughts?
> 
> We should probably just forbid this case; NM can accept the definition of
> whichever connection it sees first, and reject the second one with a warning
> message.

Ok, that could be done.  It's a bit unfortunate that we virtual interface names rely on the parent interface in both the VLAN and Infiniband cases though, which makes it pretty hard to reject connections early.  But at least in most cases we can do this.

(I regret allowing UUID to be used for 'parent' and 'master'...)

> > core: let device plugins advertise supported link and setting types
> 
> >+DEFINE_DEVICE_FACTORY_LINK_TYPES    (NM_LINK_TYPE_UNKNOWN)
> 
> NM_LINK_TYPE_NONE is the "error"/"unset" value for NMLinkType, and would be
> clearer just in terms of English too, I think. You could also reorganize the
> macros so that it worked like:
> 
>   DEFINE_DEVICE_FACTORY_SUPPORTED_TYPES (
>       DEVICE_FACTORY_LINK_TYPES (NM_LINK_TYPE_WIMAX)
>       DEVICE_FACTORY_SETTING_TYPES (NM_SETTING_WIMAX_SETTING_NAME)
>   )
> 
>   DEFINE_DEVICE_FACTORY_SUPPORTED_TYPES (
>       DEVICE_FACTORY_SETTING_TYPES (NM_SETTING_ADSL_SETTING_NAME)
>   )
> 
> (the get_supported_types() code would fill in default return values before
> expanding the other macros, meaning you could just leave LINK_TYPES out if none
> were supported.)

Done.

> >+	/* Ignore Bluetooth PAN interfaces; they are handled by their NMDeviceBt
> >+	 * parent and don't get a separate interface.
> >+	 */
> >+	if (!strncmp (plink->name, "bnep", STRLEN ("bnep")))
> 
> need to get that out of nm-manager.c at some point...

Yeah, I'm kinda waiting until we get all the parent/child relationship stuff sorted out eventually.  At that point, we'd actually create an NMDevice for the bnep interface and then just hand it over to the parent NMDeviceBt which would handle whatever needed to be done.

> > core: split device creation and device setup
> 
> >+	 * realize_existing():
> >+...
> >+	 * Returns: %TRUE on success or failure to find backing resources, %FALSE
> >+	 * if some error ocurred when realizing the device
> 
> The "or failure to find backing resources" part is pretty weird.

I think that's a left-over.

> It's also weird that NMDeviceVlan is the only class that implements this
> method. (It looks like other methods end up logging an error and then
> continuing if they fail in setup(), rather than bailing out beforehand like
> NMDeviceVlan.)

realize_existing() should ensure the backing resources are compatible and bail if they are not compatible or cannot be read.  Which is the reason VLAN does the checking in realize_existing(); the VLAN ID is critical to the VLAN device and if we cannot read the VLAN ID we cannot use the device.

The reason this isn't done in NMDeviceInfiniband is that we apparently don't care about the P_Key in check_connection_compatible() or anywhere else except when creating the device.  I suppose that should get fixed.

setup() is more like GObject::constructed() in that it just sets stuff up but shouldn't fail, because we've already verified that the backing resources are compatible and usable.  If there are properties here that are critical to device operation then yeah, they should be read in realize_existing() which should bail if reading them fails.

I've updated the code docs attempting to make this clearer.  How did I do?

> >+ * nm_device_realize_existing():
> >+...
> >+ * If backing resources (kernel devices, etc) exist, this function sets the
> >+ * device up for use.
> 
> And what if they don't exist? But then, these docs also sound wrong; It looks
> to me like nm_device_realize_existing() is only called when we know that the
> device already exists.

Code docs updated.  At this point we know some exist.

> Actually... it's not clear whether "realized" is supposed to mean "the device
> has backing resources", or "the device has backing resources *and* the NMDevice
> has been initialized from those resources". In particular,

The second, "has backing resources and has been initialized from them".

> "nm_device_realize_existing()" only makes sense as a name if you assume the
> second definition (such that an NMDevice can have backing resources, but not be
> considered "realized" yet), but nm_device_unrealize() seems to assumes the
> first definition since it actually *removes* the backing resources in the
> software case.

Right, and when the resources are removed, the device becomes unrealized, and has no backing resources anymore, and should be uninitialized.  I'm probably too familiar with it, but I don't see where unrealize() assumes the first?

A possible clearer name would be realize_from_existing()?

> Maybe nm_device_realize_existing() should just be nm_device_setup(), and
> nm_device_realize_new() should be nm_device_realize()?

I was trying to keep 'realize' in the name for both, since they both do 'realize' the device at the end.  A newly-created NMDevice is unrealized even if there are backing resources, because it hasn't been realized using them yet.

> >+device_get_driver_info (NMDevice *self,
> 
> It seems like NMDevice ought to be getting this info from an NMPlatform method.
> Actually, NMPlatformLink already has the driver, and if the ethtool ioctl would
> succeed, then it seems like we must also be guaranteed to have an
> NMPlatformLink.

Added a new commit before the "split device creation" commit that moves this code into the platform.

> >+nm_device_factory_create_device (NMDeviceFactory *factory,
> >+                                 const char *iface,
> >+                                 NMPlatformLink *plink,
> >+                                 NMConnection *connection)
> 
> plink and connection seem to be unused. (Well, nm-wifi-factory and
> nm-wimax-factory use plink->name, but they should just use iface instead.)
> 
> (OK, they get used later...)

Yeah :)  I started with just 'iface' too and had a later commit that added plink, but figured that churn wasn't worth it so I merged that later commit back to this one.

> >+	if (   !nm_platform_bond_add (iface, out_plink)
> >+		&& nm_platform_get_error () != NM_PLATFORM_ERROR_EXISTS) {
> 
> indentation

Fixed.

> >+		/* If udev signaled the VLAN interface before it signaled
> >+		 * the VLAN's parent at startup we may not know about the
> >+		 * parent device yet.  But we'll find it on the second pass
> >+		 * from nm_manager_start().
> >+		 */
> 
> This is old code that's just being moved around within nm-device-vlan.c, but
> it's wrong/unnecessary; there is no second pass any more, since we guarantee
> that the first pass happens in a usable order.

I think the the comment is wrong, but I think the code still applies...

If the VLAN is initialized by the platform before the parent, then the platform will announce the VLAN (or NMManager will read it via nm_platform_query_devices() -> platform_link_cb()).  The manager will attempt to realize the VLAN because we have a kernel link, but that will fail because the parent isn't announceable yet.  But then when the parent does get announced

> > core: add "realized" NMDevice property
> 
> The nm-settings.c parts of "core: add "realized" NMDevice property" should
> probably be part of this commit.

I don't follow you here...  They are already part of this commit?

> > core: add class function for device unrealization
> 
> >+	g_clear_pointer (&priv->initial_hw_addr, g_free);
> 
> NMDeviceVlan initializes this field in update_initial_hw_address(), but clears
> it in unrealize(). Which is unbalanced...
> 
> Probably we should just drop the update_permanent_hw_address() and
> update_initial_hw_address() methods and have subclasses just do that as part of

Moved reading the perm/initial addresses to NMDevice and NMPlatform (yay, kill another use of ethtool in generic code) and simplified this somewhat.

> their setup(). (Though then we still have unrealize() being the opposite of
> both realize() and setup().)

Yeah, unrealize() is a composite of those, because there's no reason to split it up that I can think of.  If your device is being unrealized because it was deleted externally, then you have to clean this stuff up.  If it's being unrealized but is a software device, then you have to clean this stuff up too...

> >+		if (nm_device_is_software (device)) {
> >+			if (!nm_device_unrealize (device, TRUE, &error)) {
> 
> Both nm-manager and nm-device are checking nm_device_is_software()...

Right, the manager never wants to unrealize a non-software device (because that would return/print an error), but nm_device_unrealize() wants to ensure it is never called for a non-software device unintentionally.  I think the current double check is valid...

> Also, NMManager isn't calling remove_device() if it's a software device. Oh,
> which is eventually correct I guess, but I think it's not correct at this point
> in the branch.

Fixed.

> > core: earlier software capability detection
> 
> > We know whether we can create interfaces of any given NMDevice
> > subclass or not.
> 
> I think that's supposed to say "We need to know ..." ?

Fixed.

> > and unfortunately again, libnl3 doesn't yet support InfiniBand
> > attributes so we can't use netlink either.
> 
> We can. nm_rtnl_link_parse_info_data() exists specifically to get netlink
> attributes that libnl3 doesn't yet support.

Turns out we only need to know if the device has a parent to know if it's software or not, since (as we discussed on IRC) the parent may have a non-zero PKey too.

But I'd already done this work so that's now in "platform: move InfiniBand property reading into the platform".

> >+#define NM_DEVICE_INFINIBAND_IS_VLAN "is-vlan"
> 
> I've tried to avoid calling them VLANs, since everywhere else in NM (and in 99%
> of the kernel), "VLAN" means specifically 802.1q. The specs refer to them as
> "partitions" (which is what the "P" in "P_Key" stands for).

Changed to "is_partition".

> > core: create devices first and realize them later
> 
> > 3) they must be looser in checking compatible connections until
> > they are realized
> 
> maybe nm_device_realize_new() should finish up with:
> 
>   g_warn_if_fail (nm_device_check_connection_compatible (device, connection));
> 
> (that would be part of the original "split device creation and device setup"
> patch, not here)
> 
> >+device_link_master_changed (NMDevice *self, NMPlatformLink *plink)
> 
> That name suggests that it's called specifically when the device's master
> changes, which it's not. Maybe "device_recheck_slave_status()".

Sounds fine to me, done.

> >@@ -1097,6 +1126,12 @@ device_link_changed (NMDevice *self, NMPlatformLink *info)
> >... 
> >+	if (info->driver && g_strcmp0 (priv->driver, info->driver) != 0) {
> 
> under what circumstances would that change?

When the device is first created we don't have any details about it, we only have the interface name and type.  So only when the device is realized do we get the plink or udev info that tells us what the driver is.

> >-	NM_DEVICE_GET_CLASS (self)->setup (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
> >+	NM_DEVICE_GET_CLASS (self)->setup_start (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
> >+	NM_DEVICE_GET_CLASS (self)->setup_finish (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
> 
> Why is this split needed? I see that NMManager calls add_device() in between
> nm_device_realize_existing() and nm_device_setup_finish() in two places, but it
> doesn't seem like that ordering actually matters...

Turns out it does matter for master/slaves, hence the split and why add_device() must be called in between to first export the slave to D-Bus.

If they weren't split, the slave would be added to the master before it got exported, which makes the master's 'slaves' property somewhat complicated, because now it contains devices that have no dbus path.  Which means we'd have to notify PROP_SLAVES when any of the slaves became realized, which meant watching slave device's 'realized' property, plus a new signal in NMDevice (which manages the slave list) solely to make subclasses emit a notify.  Seemed like more code than it was worth versus just splitting setup() and calling add_device() in between.  Does that tradeoff make sense?  I can do the signal/notify approach too...

> >+	/* Otherwise create backing resources if the device has any autoconnect connections */
> 
> "Otherwise" doesn't seem to refer to anything.

Fixed.

> Can the code here use nm_device_get_best_auto_connection()?

I don't think so, because get_best_auto() means "what is the best I can connect to right *now*" which takes stuff like AP list and connection availability into account, while here we just want to know if it would ever be possible to use this connection.

Seems like a 'compatible-connections' property would be useful at some point, both internally and for clients though.

> > connection_changed (NMSettings *settings,
> >-                    NMSettingsConnection *connection,
> >+                    NMConnection *connection,
> >                     NMManager *manager)
> > {
> >-	/* FIXME: Some virtual devices may need to be updated in the future. */
> 
> the FIXME still seems relevant: if I change a connection from bond0 to bond1,
> and there are no other bond0 connections, then we should destroy the bond0
> NMDevice.
> 
> Likewise in connection_removed()... I'm not sure what the existing comment
> there is supposed to mean. Unless we emit "removed" for all connections on
> shutdown, which would be silly...

Ok, I'll see what I can do here in another commit.  So still outstanding.

> 
> Do we need to worry about exporting unrealized NMDevices as effectively being a
> D-Bus ABI break? (If so we could have a separate all-devices property.)

Yeah, probably best.  Done in follow-up commits:

api/manager: add GetAllDevices() method and AllDevices property
libnm-glib/libnm: add support for Realized device property
api/core: add GetAllDevices() method and AllDevices property

Comment 3 Thomas Haller 2014-10-07 11:47:34 UTC

(In reply to comment #0)
> The one outstanding issue is that this is all based on interface name, because
> interface name must be unique when realized.

(Without looking at the code).

When having such an unrealized device and a kernel device of another type appears, NM must be prepared to throw away the unrealized device and create the properly typed one.

The same could happen, if the user tries to activate a bridge-typed connection on an unrealized bond-type device. Throw away the instance and create a new bridge-typed one.

Only downside of that, is that the bridge-type connection for a unrealized bond-device looks not activiable in the UI.



Why does the ifname of these devices be unique anyway? What's the problem with creating both an unrealized bond and bridge device (of same name)? Then you wouldn't have to throw away anything.


Also note that it is possible to rename interfaces in the kernel. If you rename a device to an already existing (unrealized) device, you have to merge the two. And if there are still connections for the old name, you have to create a new unrealized device.

Comment 4 Dan Winship 2014-10-07 19:16:22 UTC

(In reply to comment #2)
> > It's also weird that NMDeviceVlan is the only class that implements this
> > method. (It looks like other methods end up logging an error and then
> > continuing if they fail in setup(), rather than bailing out beforehand like
> > NMDeviceVlan.)
> 
> realize_existing() should ensure the backing resources are compatible and bail
> if they are not compatible or cannot be read.  Which is the reason VLAN does
> the checking in realize_existing(); the VLAN ID is critical to the VLAN device
> and if we cannot read the VLAN ID we cannot use the device.
> 
> The reason this isn't done in NMDeviceInfiniband is that we apparently don't
> care about the P_Key in check_connection_compatible()

So realize_existing() only fails if it can't read properties whose values are used by check_connection_compatible()?

Eg, if we have an NMDeviceVxlan, and nm_platform_vxlan_get_info() fails, should we still be claiming to have that device, even though all of its vxlan-specific properties will be wrong?

> I'm probably
> too familiar with it, but I don't see where unrealize() assumes the first?

Yes, I seem to have confused myself before.

> A possible clearer name would be realize_from_existing()?

Yes. Or just nm_device_realize(), and make nm_device_realize_new() be nm_device_create_and_realize().

> > plink and connection seem to be unused. (Well, nm-wifi-factory and
> > nm-wimax-factory use plink->name, but they should just use iface instead.)
> > 
> > (OK, they get used later...)
> 
> Yeah :)  I started with just 'iface' too and had a later commit that added
> plink, but figured that churn wasn't worth it so I merged that later commit
> back to this one.

It's a little messy that 'iface' is redundant, and as mentioned above, that it's not even used consistently.

> > This is old code that's just being moved around within nm-device-vlan.c, but
> > it's wrong/unnecessary; there is no second pass any more, since we guarantee
> > that the first pass happens in a usable order.
> 
> I think the the comment is wrong, but I think the code still applies...

Ah, yes, we do need to return failure rather than dereferencing a NULL parent...

> If the VLAN is initialized by the platform before the parent, then the platform
> will announce the VLAN (or NMManager will read it via
> nm_platform_query_devices() -> platform_link_cb()).

nm_platform_query_devices() always sorts the devices in such a way that that doesn't happen (thanks to the sorting code at the end of nm_platform_link_get_all()), and thus the "second pass" that the comment talks about no longer happens.

After startup it could still happen I guess, although it's a pretty unlikely race condition (that a hardware device would be added and then something creates a vlan on it before udev can process it). But we don't handle that case anyway (either in master or in dcbw/devices-for-all); the VLAN device only gets announced once, and if its parent hasn't already been announced at that point, then the attempt to create/realize it will fail and we'll end up creating it as an NMDeviceGeneric.

(This is related to a discussion a long time ago about NMDeviceOlpcMesh:companion, NMDeviceVeth:peer, etc. See also danw/wip/devicemanager.)

> The manager will attempt
> to realize the VLAN because we have a kernel link, but that will fail because
> the parent isn't announceable yet.  But then when the parent does get announced

comment ends mid-sentence...

> > > core: add "realized" NMDevice property
> > 
> > The nm-settings.c parts of "core: add "realized" NMDevice property" should
> > probably be part of this commit.
> 
> I don't follow you here...  They are already part of this commit?

Haha. Cut and paste fail. I meant the nm-settings.c parts of "core: create devices first and realize them later".

> But I'd already done this work so that's now in "platform: move InfiniBand
> property reading into the platform".

Mumble mumble people compiling against old kernel headers?

> > >-	NM_DEVICE_GET_CLASS (self)->setup (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
> > >+	NM_DEVICE_GET_CLASS (self)->setup_start (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
> > >+	NM_DEVICE_GET_CLASS (self)->setup_finish (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
...
> Does that tradeoff make sense?

Yes, sounds like it would be a pain otherwise.

> Ok, I'll see what I can do here in another commit.  So still outstanding.

OK

> api/manager: add GetAllDevices() method and AllDevices property

>+          Realized devices are those which have backing resources (eg from the
>+          kernel or a management daemon like ModemManager, teamd, etc).

Hm... but there's already a word that means that: "device". It's fine to have unrealized *NM*Devices, but it's weird to try to make the English/UNIX word "device" mean "either a device or not actually a device"...

How about making the D-Bus/libnm property be called "real" rather than "realized", and then talk about "real devices" and "device placeholders"? (Or make the property be "placeholder"... not sure which sense makes more sense.)

(And would it make more sense to have an UnrealizedDevices/DevicePlaceholders property rather than AllDevices?)

>+          Unrealized devices are software devices which
>+          do not yet have backing resources, but for which backing resources
>+          can be created if the device is activated. 

"Unrealized devices / Device placeholders refer to software devices which do not currently exist, but which NetworkManager would automatically create if one of their AvailableConnections was activated."

(And then all the same issues with the library level naming/docs.)

Comment 5 Thomas Haller 2014-10-08 14:35:19 UTC

> core: let device plugins advertise supported link and setting types

Could we name DEFINE_DEVICE_FACTORY_INTERNAL_WITH_DEVTYPE, DEFINE_FACTORY_LINK_TYPES et al. to something NM_DEVICE_FACTORY_*? They are declared in nm-device-factory.h after all.




also, nm-manager.c is getting more logic that I feel belongs to nm-device-factory.c. I noticed that before, but now it's getting more.
Could we move all functions that touch priv->factories to nm-device-factory.h.

For now, I would just add a
 static GSList *factories;
to nm-device-factory.h. No need for a NMDeviceFactoryManager singleton at the moment.






Just a thought:

find_device_factory_for_types() returns the first factory that satisfies ANY of the provided needle arguments. I think it should match *all*. There are only 3 callers of the function. 2 only pass one argument. Then there is the duplicate-plugin-check. I think a plugin is only a duplicate if it provides all the same types.

But anyway, since there is no stable ABI for the device plugins and no plugins currently define overlapping types, no need to change that now. If we ever need it, we can change it and all plugins must be recompiled anyway.


>> platform: add nm_platform_link_get_by_address()

Just wondering why: the new function passes the binary address as byte-array -- contrary to that we now use strings mostly. Also, it does not use nm_utils_hwaddr_matches().
Ah... or did we agree, that platform API is the border where we treat hwaddr as binary vs. string?

Also, the new function is never used (AFAIS)...


>> platform: move driver & firmware version reading into the platform

I think fake-platform should set the out-argument to NULL (if pointers are provided).


>> core: move permanent and initial MAC address reading to NMDevice and NMPlatform

The final block in check_connection_compatible() (/* Check for MAC address blacklist */) should be enclosed by a "if (perm_hw_addr) { }"

Maybe it also needs a:

   if (!perm_hw_addr && mac)
       return FALSE;



>> core: split device creation and device setup

-    if (plink)
-         g_return_val_if_fail (strcmp (iface, plink->name) == 0, NULL);
+    g_return_val_if_fail (!plink || strcmp (iface, plink->name) == 0, NULL);



system_create_virtual_device():

+    device = nm_device_factory_create_device (factory, iface, NULL, connection);

if we get a NULL device, maybe log a warning?


Also platform_link_added() should do something like:

          device = nm_device_factory_create_device (factory, plink->name, plink, NULL);
          if (device) {
               if (!nm_device_realize_existing (device, plink, &error)) {
                    nm_log_warn (LOGD_HW, "%s: factory failed to create device: (%d) %s",
                                 plink->udi,
                                 error ? error->code : -1,
                                 error ? error->message : "(unknown)");
                    g_clear_error (&error);
                    return;
               }
-         }
+         } else {
+              nm_log_warn...
+              return;
+         }

otherwise, you end up in the switch() below.



>> core: check duplicate devices by interface name not UDI

As stated above, I think that it makes sense to have several *unrealized devices* of *different type* with the same ifname. I question that devices are unique by ifname.

I would guess: at any time:
(1) only one device with a certain ifname can be realized.
(2) devices of equal ifname, must have a different link-type
(2b) maybe there are link-types where (2) does not hold?

also note, that we might have devices (in the future) without ifname at all. E.g. representing a NM-virtual Vpn tunnel device.

Interesting situation: suppose you have br0 up and realized. If you now have another bridge connection with the same ifname 'br0' but different NM_SETTING_BRIDGE_MAC_ADDRESS, the connection is not applicable on br0. Should the UI show the connection activatable on device br0, or should we create yet another generic, unrealized br0 device?



>> core: add "realized" NMDevice property

+ * Indicates if the device is a software-based virtual device without
+ * backing hardware, which can be added and removed programmatically.

hm, I guess, later we would no longer really remove the NMDevice instance. We would un-realize it... hence, you can only remove the ~backing~ kernel interface, not the NMDevice instance.

The change in src/settings/nm-settings.c:nm_settings_device_added() does not seem to belong to this commit. I don't understand why this "if" is there in first place.


>> core: add class function for device unrealization

nm_device_unrealize() some freeze/thaw_notification() ? Also, maybe check, if the property actually changed on only emit the notification in that case?


I find the @link_deleted name of the parameter confusing. I would prefer "delete_link" or "do_delete_link" to tell the function what to do. As the name is now, I don't think that the function should care whether the kernel link exists or not. The argument should tell the function what to do: "touch or not touch system".


src/devices/nm-device-vlan.c:unrealize() does not chain to the parent function.
unrealize (NMDevice *self, gboolean link_deleted, GError **error)



core: earlier software capability detection

    
    We need to know whether we can create interfaces of any given
    NMDevice subclass or not.  So don't rely on just the NMPlatformLink
    for that information, because it prevents knowing whether a device
    can be realized before it is actually realized if we don't have
    the NMPlatformLink, like when creating pure software devices.

"can be realized before it is actually realized"?


-    return !!(NM_DEVICE_GET_PRIVATE (self)->capabilities & NM_DEVICE_CAP_IS_SOFTWARE);
+    return NM_FLAGS_HAS (NM_DEVICE_GET_PRIVATE (self)->capabilities, NM_DEVICE_CAP_IS_SOFTWARE);




     if (NM_DEVICE_GET_CLASS (self)->get_generic_capabilities)
          priv->capabilities |= NM_DEVICE_GET_CLASS (self)->get_generic_capabilities (self);
+    g_object_notify (G_OBJECT (self), NM_DEVICE_CAPABILITIES);
...
-    /* Indicate software device in capabilities. */
-    if (priv->is_software)
-         priv->capabilities |= NM_DEVICE_CAP_IS_SOFTWARE;
     g_object_notify (G_OBJECT (self), NM_DEVICE_CAPABILITIES);


setup() now emits two notifications for CAPABILITIES.
Btw. freeze&thaw?



infiniband_info_data_parser():

are we sure, that the tb[IFLA_IPOIB_*] are not NULL? Does infiniband_info_policy guarantee that? I would add some "if (tb[IFLA_IPOIB_MODE])" ...



>> api/manager: add GetAllDevices() method and AllDevices property

+          can be created if the device is activated. 
trailing space.......................................^





Pushed fixup for typo.

Comment 6 Dan Williams 2014-10-09 19:48:17 UTC

(In reply to comment #3)
> (In reply to comment #0)
> > The one outstanding issue is that this is all based on interface name, because
> > interface name must be unique when realized.
> 
> (Without looking at the code).
> 
> When having such an unrealized device and a kernel device of another type
> appears, NM must be prepared to throw away the unrealized device and create the
> properly typed one.

Fixed in "core: remove and recreate unrealized devices when an incompatible link is found"

> The same could happen, if the user tries to activate a bridge-typed connection
> on an unrealized bond-type device. Throw away the instance and create a new
> bridge-typed one.

Fixed in "core: remove and recreate unrealized devices when an incompatible connection is activated"

> Only downside of that, is that the bridge-type connection for a unrealized
> bond-device looks not activiable in the UI.

Yeah.

> Why does the ifname of these devices be unique anyway? What's the problem with
> creating both an unrealized bond and bridge device (of same name)? Then you
> wouldn't have to throw away anything.

True, I'll look into this.

Comment 7 Dan Williams 2014-10-09 23:20:40 UTC

(In reply to comment #4)
> (In reply to comment #2)
> > > It's also weird that NMDeviceVlan is the only class that implements this
> > > method. (It looks like other methods end up logging an error and then
> > > continuing if they fail in setup(), rather than bailing out beforehand like
> > > NMDeviceVlan.)
> > 
> > realize_existing() should ensure the backing resources are compatible and bail
> > if they are not compatible or cannot be read.  Which is the reason VLAN does
> > the checking in realize_existing(); the VLAN ID is critical to the VLAN device
> > and if we cannot read the VLAN ID we cannot use the device.
> > 
> > The reason this isn't done in NMDeviceInfiniband is that we apparently don't
> > care about the P_Key in check_connection_compatible()
> 
> So realize_existing() only fails if it can't read properties whose values are
> used by check_connection_compatible()?

Yeah, but moreso that realize_existing() should fail if properties which are
intrinsic and /cannot be changed programmatically/ fail to be read.  e.g. you
can change the MAC address of an interface but you can't ever change the VLAN
ID.

> Eg, if we have an NMDeviceVxlan, and nm_platform_vxlan_get_info() fails, should
> we still be claiming to have that device, even though all of its vxlan-specific
> properties will be wrong?

It was previous the same though, since VXLAN, tun/tap, VLAN all read this stuff
in constructed() which cannot fail.  Though in the case of VLAN it did some
wierd dance to detect failure, but now that's done during realization which
can fail.

But in the end, if we can't read this kind of stuff, I think we should
still create the device and "realize" it (since it really does exist in
the kernel) but keep it in UNMANAGED or UNAVAILABLE since it cannot be used
without reading these properties.

> > I'm probably
> > too familiar with it, but I don't see where unrealize() assumes the first?
> 
> Yes, I seem to have confused myself before.
> 
> > A possible clearer name would be realize_from_existing()?
> 
> Yes. Or just nm_device_realize(), and make nm_device_realize_new() be
> nm_device_create_and_realize().

Your suggestion has merit :)  Done in a fixup and the rest rebased on top of that.

> > > plink and connection seem to be unused. (Well, nm-wifi-factory and
> > > nm-wimax-factory use plink->name, but they should just use iface instead.)
> > > 
> > > (OK, they get used later...)
> > 
> > Yeah :)  I started with just 'iface' too and had a later commit that added
> > plink, but figured that churn wasn't worth it so I merged that later commit
> > back to this one.
> 
> It's a little messy that 'iface' is redundant, and as mentioned above, that
> it's not even used consistently.

Yeah, I'd like to just pass the ifname and possibly the link type, but
WiMAX uses the driver name too, and instead of passing 3 things I just opted
to pass the plink too...  I'm not sure there's a better/clearer way.

Fixed up wimax and wifi to use 'iface' instead of plink->name.

> > > This is old code that's just being moved around within nm-device-vlan.c, but
> > > it's wrong/unnecessary; there is no second pass any more, since we guarantee
> > > that the first pass happens in a usable order.
> > 
> > I think the the comment is wrong, but I think the code still applies...
> 
> Ah, yes, we do need to return failure rather than dereferencing a NULL
> parent...
> 
> > If the VLAN is initialized by the platform before the parent, then the platform
> > will announce the VLAN (or NMManager will read it via
> > nm_platform_query_devices() -> platform_link_cb()).
> 
> nm_platform_query_devices() always sorts the devices in such a way that that
> doesn't happen (thanks to the sorting code at the end of
> nm_platform_link_get_all()), and thus the "second pass" that the comment talks
> about no longer happens.

Yeah, you're right, link_get_all() will ensure that the list it returns is
consistent.  So we don't have this problem at startup.

Removed the comment in a fixup.

> After startup it could still happen I guess, although it's a pretty unlikely
> race condition (that a hardware device would be added and then something
> creates a vlan on it before udev can process it). But we don't handle that case
> anyway (either in master or in dcbw/devices-for-all); the VLAN device only gets
> announced once, and if its parent hasn't already been announced at that point,
> then the attempt to create/realize it will fail and we'll end up creating it as
> an NMDeviceGeneric.

I guess we should fix that at some point.  THe concern I have here is that udev
can take any amount of time (0.1 to 120 seconds) to do its thing and if the
VLAN takes less time than the master we'll hit this issue.  A further fixup
though I suppose.

> > The manager will attempt
> > to realize the VLAN because we have a kernel link, but that will fail because
> > the parent isn't announceable yet.  But then when the parent does get announced
> 
> comment ends mid-sentence...

Sorry, I was researching the issue and I think I just took a break and forgot
about it when I got back.  But it's irrelevent give above.

> > > > core: add "realized" NMDevice property
> > > 
> > > The nm-settings.c parts of "core: add "realized" NMDevice property" should
> > > probably be part of this commit.
> > 
> > I don't follow you here...  They are already part of this commit?
> 
> Haha. Cut and paste fail. I meant the nm-settings.c parts of "core: create
> devices first and realize them later".

Though at this point, add_device() is always called *after* the device has
been realized, so nm-settings.c::device_realized() will never be called.
But it is a logical commit to put it so I've fixed that part up.

> > But I'd already done this work so that's now in "platform: move InfiniBand
> > property reading into the platform".
> 
> Mumble mumble people compiling against old kernel headers?

Looks like the IFLA_IPOIB stuff showed up in 3.7.  I've added defines for
the IFLA_IPOIB stuff and added fallback code in case we're running on an
earlier kernel.

The rest to come...

Comment 8 Dan Williams 2014-10-10 20:53:24 UTC

(In reply to comment #4)
> > api/manager: add GetAllDevices() method and AllDevices property
> 
> >+          Realized devices are those which have backing resources (eg from the
> >+          kernel or a management daemon like ModemManager, teamd, etc).
> 
> Hm... but there's already a word that means that: "device". It's fine to have
> unrealized *NM*Devices, but it's weird to try to make the English/UNIX word
> "device" mean "either a device or not actually a device"...

Yeah, true.

> How about making the D-Bus/libnm property be called "real" rather than
> "realized", and then talk about "real devices" and "device placeholders"? (Or
> make the property be "placeholder"... not sure which sense makes more sense.)

Done, I renamed it "real" in "fixup! core: add "real" NMDevice property" + "fixup! libnm-glib/libnm: add support for Realized device property" and rebased everything else on top of that.

I went with "device placeholders" or "placeholder devices", I think that's a good term for them.

> (And would it make more sense to have an UnrealizedDevices/DevicePlaceholders
> property rather than AllDevices?)

I think it's easier for clients to get one list of NMDevices and then call nm_device_is_real() for the ones they care about than to have the devices divided between two properties, necessitating two loops.

> >+          Unrealized devices are software devices which
> >+          do not yet have backing resources, but for which backing resources
> >+          can be created if the device is activated. 
> 
> "Unrealized devices / Device placeholders refer to software devices which do
> not currently exist, but which NetworkManager would automatically create if one
> of their AvailableConnections was activated."
> 
> (And then all the same issues with the library level naming/docs.)

I fixed up the docs in all the places I could find and used this wording.

Comment 9 Dan Williams 2014-10-14 22:02:59 UTC

(In reply to comment #5)
> > core: let device plugins advertise supported link and setting types
> 
> Could we name DEFINE_DEVICE_FACTORY_INTERNAL_WITH_DEVTYPE,
> DEFINE_FACTORY_LINK_TYPES et al. to something NM_DEVICE_FACTORY_*? They are
> declared in nm-device-factory.h after all.

Changed to prefix with NM_DEVICE_FACTORY:

NM_DEVICE_FACTORY_DEFINE_INTERNAL
NM_DEVICE_FACTORY_DECLARE_TYPES
NM_DEVICE_FACTORY_DECLARE_SETTING_TYPES
NM_DEVICE_FACTORY_DECLARE_LINK_TYPES

> also, nm-manager.c is getting more logic that I feel belongs to
> nm-device-factory.c. I noticed that before, but now it's getting more.
> Could we move all functions that touch priv->factories to nm-device-factory.h.

Sure, done in a fixup.

> Just a thought:
> 
> find_device_factory_for_types() returns the first factory that satisfies ANY of
> the provided needle arguments. I think it should match *all*. There are only 3
> callers of the function. 2 only pass one argument. Then there is the
> duplicate-plugin-check. I think a plugin is only a duplicate if it provides all
> the same types.

I would disagree; we just don't support two plugins that provide the same type
right now.  I think this prevents ambiguities later if we change the supported
types around or change something internal and old files are left on-disk.

> But anyway, since there is no stable ABI for the device plugins and no plugins
> currently define overlapping types, no need to change that now. If we ever need
> it, we can change it and all plugins must be recompiled anyway.

Yeah, if we do support overlapping plugins later then we can add this.

> >> platform: add nm_platform_link_get_by_address()
> 
> Just wondering why: the new function passes the binary address as byte-array --
> contrary to that we now use strings mostly. Also, it does not use
> nm_utils_hwaddr_matches().
> Ah... or did we agree, that platform API is the border where we treat hwaddr as
> binary vs. string?

I think we kept the platform using uint8/size_t instead of GBytes or a string,
not really sure why.  But I followed that pattern with this new function.

> Also, the new function is never used (AFAIS)...

Yeah, I just kept it around since I thought I might need it later, but since
it's unused I'll just drop it.

> >> platform: move driver & firmware version reading into the platform
> 
> I think fake-platform should set the out-argument to NULL (if pointers are
> provided).

Done.

> >> core: move permanent and initial MAC address reading to NMDevice and NMPlatform
> 
> The final block in check_connection_compatible() (/* Check for MAC address
> blacklist */) should be enclosed by a "if (perm_hw_addr) { }"

Done for both cases (wifi & ethernet)

> Maybe it also needs a:
> 
>    if (!perm_hw_addr && mac)
>        return FALSE;

Done for both cases.

> >> core: split device creation and device setup
> 
> -    if (plink)
> -         g_return_val_if_fail (strcmp (iface, plink->name) == 0, NULL);
> +    g_return_val_if_fail (!plink || strcmp (iface, plink->name) == 0, NULL);

Done.

> system_create_virtual_device():
> 
> +    device = nm_device_factory_create_device (factory, iface, NULL,
> connection);
> 
> if we get a NULL device, maybe log a warning?

Done.

> Also platform_link_added() should do something like:
> 
>           device = nm_device_factory_create_device (factory, plink->name,
> plink, NULL);
>           if (device) {
>                if (!nm_device_realize_existing (device, plink, &error)) {
>                     nm_log_warn (LOGD_HW, "%s: factory failed to create device:
> (%d) %s",
>                                  plink->udi,
>                                  error ? error->code : -1,
>                                  error ? error->message : "(unknown)");
>                     g_clear_error (&error);
>                     return;
>                }
> -         }
> +         } else {
> +              nm_log_warn...
> +              return;
> +         }
> 
> otherwise, you end up in the switch() below.

That's somewhat expected though, if the plugin doesn't support something
specific about the device.  The WiMAX plugin only supports Intel WiMAX devices
which it checks by the driver name, and if it's not an Intel device we should
fall back to a Generic device.  I think WiMAX is the only plugin that does that
now, but maybe others will in the future?  If we removed this bit though,
we'd still have to find a way to let the WiMAX plugin claim some devices but
not others.  So I'd rather leave it as-is and just create the generic device
like we do now.

> >> core: check duplicate devices by interface name not UDI
> 
> As stated above, I think that it makes sense to have several *unrealized
> devices* of *different type* with the same ifname. I question that devices are
> unique by ifname.

I'll investigate this possibility and I agree the branch should not be merged
until we've decided on this.  I'll keep reworking for other comments in
parallel though.

> I would guess: at any time:
> (1) only one device with a certain ifname can be realized.

Yes.

> (2) devices of equal ifname, must have a different link-type

Not necessarily, you can name your VLAN device "foobar" but have two VLAN IDs
for them.  Obviously you cannot create the kernel interfaces for both.

> (2b) maybe there are link-types where (2) does not hold?

I think any software device that has create-time properties would meet this
definition since the interface name is arbitrary.

> also note, that we might have devices (in the future) without ifname at all.
> E.g. representing a NM-virtual Vpn tunnel device.

I think all devices should have an interface name, and I was going to just
assign the VPNs an interface name based on a hash of the connection name.
Otherwise you cannot do stuff like "nmcli dev connect", though I guess that's
a minor consideration.  I'm open to the possibility of this not being required.

> Interesting situation: suppose you have br0 up and realized. If you now have
> another bridge connection with the same ifname 'br0' but different
> NM_SETTING_BRIDGE_MAC_ADDRESS, the connection is not applicable on br0. Should
> the UI show the connection activatable on device br0, or should we create yet
> another generic, unrealized br0 device?

I think in this case, since the MAC address can be changed on the bridge,
it should be a 'compatible' connection with the existing device.  I think that
only connections which contain attributes that cannot be changed after a device
is realized (eg, VLAN ID or Infiniband P_Key) should trigger creation of
separate devices.

> >> core: add "realized" NMDevice property
> 
> + * Indicates if the device is a software-based virtual device without
> + * backing hardware, which can be added and removed programmatically.
> 
> hm, I guess, later we would no longer really remove the NMDevice instance. We
> would un-realize it... hence, you can only remove the ~backing~ kernel
> interface, not the NMDevice instance.

Fixed that up in a fixup to "core: create devices first and realize them later".

> The change in src/settings/nm-settings.c:nm_settings_device_added() does not
> seem to belong to this commit. I don't understand why this "if" is there in
> first place.

I think belongs in this commit because we only want to add a default connection
for a device once all its properties are set up, and this patch re-defines that
point to when the realized changes to TRUE.  So I added the code here since it
logically makes sense, even though it's not strictly necessary here.

> >> core: add class function for device unrealization
> 
> nm_device_unrealize() some freeze/thaw_notification() ? Also, maybe check, if
> the property actually changed on only emit the notification in that case?

Done in a fixup.

> I find the @link_deleted name of the parameter confusing. I would prefer
> "delete_link" or "do_delete_link" to tell the function what to do. As the name
> is now, I don't think that the function should care whether the kernel link
> exists or not. The argument should tell the function what to do: "touch or not
> touch system".

Done, renamed to "remove_resources" since it's not really about platform
links, but all backing resources.

> src/devices/nm-device-vlan.c:unrealize() does not chain to the parent function.
> unrealize (NMDevice *self, gboolean link_deleted, GError **error)

Good catch, fixed.

> core: earlier software capability detection
> 
> 
>     We need to know whether we can create interfaces of any given
>     NMDevice subclass or not.  So don't rely on just the NMPlatformLink
>     for that information, because it prevents knowing whether a device
>     can be realized before it is actually realized if we don't have
>     the NMPlatformLink, like when creating pure software devices.
> 
> "can be realized before it is actually realized"?

Reworded the commit message.

> -    return !!(NM_DEVICE_GET_PRIVATE (self)->capabilities &
> NM_DEVICE_CAP_IS_SOFTWARE);
> +    return NM_FLAGS_HAS (NM_DEVICE_GET_PRIVATE (self)->capabilities,
> NM_DEVICE_CAP_IS_SOFTWARE);

Fixed.

>      if (NM_DEVICE_GET_CLASS (self)->get_generic_capabilities)
>           priv->capabilities |= NM_DEVICE_GET_CLASS
> (self)->get_generic_capabilities (self);
> +    g_object_notify (G_OBJECT (self), NM_DEVICE_CAPABILITIES);
> ...
> -    /* Indicate software device in capabilities. */
> -    if (priv->is_software)
> -         priv->capabilities |= NM_DEVICE_CAP_IS_SOFTWARE;
>      g_object_notify (G_OBJECT (self), NM_DEVICE_CAPABILITIES);
> 
> 
> setup() now emits two notifications for CAPABILITIES.
> Btw. freeze&thaw?

Fixed both issues (two notifications and freeze/thaw).

> infiniband_info_data_parser():
> 
> are we sure, that the tb[IFLA_IPOIB_*] are not NULL? Does
> infiniband_info_policy guarantee that? I would add some "if
> (tb[IFLA_IPOIB_MODE])" ...

Done.

> >> api/manager: add GetAllDevices() method and AllDevices property
> 
> +          can be created if the device is activated. 
> trailing space.......................................^

Fixed.

Repushed.

Comment 10 Dan Winship 2014-10-16 18:20:03 UTC

(In reply to comment #9)
> > >> platform: add nm_platform_link_get_by_address()
> > 
> > Just wondering why: the new function passes the binary address as byte-array --
> > contrary to that we now use strings mostly. Also, it does not use
> > nm_utils_hwaddr_matches().
> > Ah... or did we agree, that platform API is the border where we treat hwaddr as
> > binary vs. string?
> 
> I think we kept the platform using uint8/size_t instead of GBytes or a string,
> not really sure why.

NMPlatform seemed to be the point where hwaddrs-as-strings became less convenient rather than more. Eg, if nm_platform_link_get_address() returned a string, it would have to be allocated, not const, since the NMPlatformLinux version has nowhere obvious to store the hwaddr between calls.

Comment 11 Dan Williams 2014-10-17 00:03:07 UTC

I've pushed a new branch, dcbw/devices-for-all-1 that implements what Thomas suggested for multiple devices with the same interface name.  It makes some things simpler, and some things  more complicated.  There are also some new patches in this series, which I'll hi-light here:

core: replace open-coded find_device_by_ip_iface() call in IPv4LL code
core: don't activate failed queued activation requests
fixup! platform: return link objects from add functions
platform: ensure created interface matches requested type
fixup! core: create devices first and realize them later (Don't leak unrealized devices in existing APIs.)
fixup! core: create devices first and realize them later (All virtual connections are now backed by NMDevices.)
core: ensure platform links are compatible with the NMDevice
core: allow multiple devices with the same interface name

But most of the fixup! commits could also be looked at one more time.

In general, I think this approach will work pretty well.  I've tried activating child interfaces that have a different master of the same name, eg (1) start a bond child which realizes a bond named blahblah0, (2) start a bridge child that tries to realize a bridge named blahblah0 when the bond with that name still exists, and a couple other tests, but putting this through some tests of your own would be great.

Comment 12 Dan Winship 2014-10-20 14:01:23 UTC

In the last "fixup! core: create devices first and realize them later" commit, the if() continuation line is indented too far now:

> 	if (master_state == NM_ACTIVE_CONNECTION_STATE_DEACTIVATING &&
>-	    nm_active_connection_get_device (master) == NULL) {
>-		/* Master failed without ever creating its device */
>+		(!master_device || !nm_device_is_real (master_device))) {
>+		/* Master failed without ever creating or realizing its device */


> core: ensure platform links are compatible with the NMDevice
> 
> Ensure the a platform link with the same interface name as the

"the a"

>+	NM_DEVICE_CLASS_DECLARE_TYPES(klass, NULL, NM_LINK_TYPE_INFINIBAND)

That was missing a connection_type declaration before, but I assume that was a bug and the NULL should be NM_SETTING_INFINIBAND_SETTING_NAME here?

>+	NM_DEVICE_CLASS_DECLARE_TYPES(klass, NULL, NM_LINK_TYPE_WIFI, NM_LINK_TYPE_OLPC_MESH)

Likewise here, and also, is it correct for NMDeviceWifi to claim both WIFI and OLPC_MESH?

>+	NM_DEVICE_CLASS_DECLARE_TYPES(klass, NULL, NM_LINK_TYPE_WIMAX)

and again...

Comment 13 Dan Williams 2014-10-21 21:18:17 UTC

(In reply to comment #12)
> In the last "fixup! core: create devices first and realize them later" commit,
> the if() continuation line is indented too far now:
> 
> > 	if (master_state == NM_ACTIVE_CONNECTION_STATE_DEACTIVATING &&
> >-	    nm_active_connection_get_device (master) == NULL) {
> >-		/* Master failed without ever creating its device */
> >+		(!master_device || !nm_device_is_real (master_device))) {
> >+		/* Master failed without ever creating or realizing its device */

Fixed.

> > core: ensure platform links are compatible with the NMDevice
> > 
> > Ensure the a platform link with the same interface name as the
> 
> "the a"

Fixed.

> >+	NM_DEVICE_CLASS_DECLARE_TYPES(klass, NULL, NM_LINK_TYPE_INFINIBAND)
> 
> That was missing a connection_type declaration before, but I assume that was a
> bug and the NULL should be NM_SETTING_INFINIBAND_SETTING_NAME here?

Yeah, I hadn't changed it originally because it wasn't set before, but you are correct.  Fixed.

> >+	NM_DEVICE_CLASS_DECLARE_TYPES(klass, NULL, NM_LINK_TYPE_WIFI, NM_LINK_TYPE_OLPC_MESH)
> 
> Likewise here, and also, is it correct for NMDeviceWifi to claim both WIFI and
> OLPC_MESH?

Both points are correct.  Fixed, letting OLPC Mesh define it's own types.

> >+	NM_DEVICE_CLASS_DECLARE_TYPES(klass, NULL, NM_LINK_TYPE_WIMAX)
> 
> and again...

Fixed.

Comment 14 Thomas Haller 2014-10-22 14:38:28 UTC

>> core: let device plugins advertise supported link and setting types

pushed some fixups.


>> core: move virtual interface name handling into device plugins
    
commit message:
 "Instead of having a bunch of logic in the Manager for determiningin the"
                                                        ^^^^^^^^^^^^^


>> core: move virtual interface name handling into device plugins

nm_device_factory_get_virtual_iface_name() and likewise nm_device_factory_get_connection_parent() should only and directly delegate to NM_DEVICE_FACTORY_GET_INTERFACE (factory)->get_virtual_iface_name(). And NMDeviceFactor should implement the default logic.

That way, the derived classes can override the function (if they want to do something special), but they still can invoke the default implementation in the parent class.



>> core: check duplicate devices by interface name not UDI

wouldn't it be best to check "real" devices based on the ifindex, which does not change -- contrary to the ifname.

Also in face of realized devices, while unrealized devices can have a ifname clash, only real devices should have an ifindex (and that one should be unique).



>> core: add class function for device unrealization

nm_device_unrealize()

emit PROP_IFINDEX and PROP_IP_IFINDEX notifications?



>> core: remove and recreate unrealized devices when an incompatible link is found

platform_link_added() has 
     device = find_device_by_iface (self, plink->name);
     if (device) {
-         if (!nm_device_is_real (device)) {
-              if (nm_device_realize (device, plink, &error))
+         gboolean incompatible;
+
+         if (nm_device_is_real (device))
+              return;
+
+         if (nm_device_realize (device, plink, &error)) {
+              /* Success */
               nm_device_setup_finish (device, plink);
-              else {
+              return;
+         }

hm... is the iface really unique in face of non-realized devices? Seems strange to me.



>> core: remove and recreate unrealized devices when an incompatible connection is activated

+    /* If the device is unrealized and the requested connection is not
+     * comaptible with the unrealized device (eg bond device but bridge
+     * connection) then we must destroy the unrealized device and create
+     * a new device compatible with the connection.
+     */

hm, why do we have to remove it?



also, make check fails for the branch.

Comment 15 Dan Williams 2014-10-28 15:20:49 UTC

(In reply to comment #14)
> >> core: let device plugins advertise supported link and setting types
> 
> pushed some fixups.

Ah, into dcbw/devices-for-all.  I should have clarified, when I implemented your suggestion to allow multiple devices with the same interface name, I did that in dcbw/devices-for-all-1 and left devices-for-all alone.

So I've changed this review bug title to dcbw/devices-for-all-1 to reflect the actual branch for review.

And I cherry-picked your fixes and rebased them into the appropriate place.  Thanks!

> >> core: move virtual interface name handling into device plugins
> 
> commit message:
>  "Instead of having a bunch of logic in the Manager for determiningin the"

Fixed.

> >> core: move virtual interface name handling into device plugins
> 
> nm_device_factory_get_virtual_iface_name() and likewise
> nm_device_factory_get_connection_parent() should only and directly delegate to
> NM_DEVICE_FACTORY_GET_INTERFACE (factory)->get_virtual_iface_name(). And
> NMDeviceFactor should implement the default logic.
> 
> That way, the derived classes can override the function (if they want to do
> something special), but they still can invoke the default implementation in the
> parent class.

Done in "fixup! core: move virtual interface name handling into device plugins".

> >> core: check duplicate devices by interface name not UDI
> 
> wouldn't it be best to check "real" devices based on the ifindex, which does
> not change -- contrary to the ifname.

If we have one, yes, since not all "real" devices are netdevs, some are ATM devices or modems or bluetooth, and those only have interface names.  So we can't rely only on the ifindex.

> Also in face of realized devices, while unrealized devices can have a ifname
> clash, only real devices should have an ifindex (and that one should be
> unique).

I've added ifindex duplicate checking in "fixup! core: check duplicate devices by interface name not UDI".  This code also just gets removed later though, but is included for commit-by-commit buildability.

> >> core: add class function for device unrealization
> 
> nm_device_unrealize()
> 
> emit PROP_IFINDEX and PROP_IP_IFINDEX notifications?

Fixed for PROP_IFINDEX, but IP_IFINDEX actually isn't a property.  I guess since it changes whenever IP_IFACE does.

> >> core: remove and recreate unrealized devices when an incompatible link is found
> 
> platform_link_added() has 
>      device = find_device_by_iface (self, plink->name);
>      if (device) {
> -         if (!nm_device_is_real (device)) {
> -              if (nm_device_realize (device, plink, &error))
> +         gboolean incompatible;
> +
> +         if (nm_device_is_real (device))
> +              return;
> +
> +         if (nm_device_realize (device, plink, &error)) {
> +              /* Success */
>                nm_device_setup_finish (device, plink);
> -              else {
> +              return;
> +         }
> 
> hm... is the iface really unique in face of non-realized devices? Seems strange
> to me.

That bit (and it's commit) got reworked and is no longer present in dcbw/devices-for-all-1.

> >> core: remove and recreate unrealized devices when an incompatible connection is activated
> 
> +    /* If the device is unrealized and the requested connection is not
> +     * comaptible with the unrealized device (eg bond device but bridge
> +     * connection) then we must destroy the unrealized device and create
> +     * a new device compatible with the connection.
> +     */
> 
> hm, why do we have to remove it?

That commit got reworked and is no longer present in dcbw/devices-for-all-1.

> also, make check fails for the branch.

Yeah, because of libnm bugs.  That's fixed in dcbw/libnm-fixes.

Comment 16 Dan Williams 2014-12-01 15:52:04 UTC

I also pushed dcbw/devices-for-all-squashed if that's easier to review.

Comment 17 Dan Winship 2015-03-19 17:33:42 UTC

This probably needs a rebase...

Comment 18 Dan Williams 2015-06-09 16:42:11 UTC

And rebased, and large portions already merged to master.  Back for review.

Comment 19 Thomas Haller 2015-06-16 11:09:31 UTC

>> core: split device creation and device setup
    

nm_device_factory_create_device()

g_return_val_if_fail (iface && *iface, NULL);


+    } else if (connection) {

should not be "else". If both plink && connection are given, let's check whether the factory supports ~both~.

Or alternatively:

  g_return_val_if_fail (plink || connection, NULL);
  g_return_val_if_fail (!plink || !connection, NULL);



is nm_device_realize() guaranteed to be do nothing if the device is already realized? Otherwise I'd add a boolean NMDevicePrivate:is_realized, and g_return_val_if_fail() where appropriate.
I guess later we also add an unrealize()?




     g_object_class_install_property
          (object_class, PROP_DRIVER,
           g_param_spec_string (NM_DEVICE_DRIVER, "", "",
                                NULL,
-                               G_PARAM_READWRITE | G_PARAM_CONSTRUCT_ONLY |
+                               G_PARAM_READWRITE |
                                G_PARAM_STATIC_STRINGS));

only READABLE?




In NMDevice, could we restore the log line:
   _LOGD (LOGD_DEVICE, "constructor(): %s, kernel ifindex %d", 
which should be the very first message this instance logs. (to say "hi").




update_device_from_platform_link() lacks 
  g_object_notify (G_OBJECT (self), NM_DEVICE_UDI);
and others (NM_DEVICE_IFACE).


update_device_from_platform_link()
+    g_free (priv->iface);
+    priv->iface = g_strdup (plink->name);

instead: 
 
     if (!priv->iface)
         priv->iface = g_strdup (plink->name);




also:

     g_return_if_fail (priv->ifindex <= 0 || priv->ifindex == plink->ifindex);




nm-wwan-factory.c:

create_device()
{
    *out_ignore = TRUE;
    g_return_val_if_fail (plink, NULL);
    g_return_val_if_fail (plink->type == NM_LINK_TYPE_WWAN_ETHERNET, NULL);
    return NULL;
}


nm-wifi-factory.c:


     g_return_val_if_fail (plink != NULL, NULL);
 
     if (plink->type == NM_LINK_TYPE_WIFI)
          return nm_device_wifi_new (iface);
     else if (plink->type == NM_LINK_TYPE_OLPC_MESH)
          return nm_device_olpc_mesh_new (iface);
     g_return_val_if_reached (NULL);


 nm_device_olpc_mesh_new (const char *iface)
 nm_device_wifi_new (const char *iface)
 {
-    g_return_val_if_fail (platform_device != NULL, NULL);
+    g_return_val_if_fail (iface && *iface, NULL);




Overall, this commit looks good. Lets fix it up and merge it early on?
(it also conflicts with platform-refactoring branch, if I happen to merge that one earlier, I will take the task to resolve the conflicts).

Comment 20 Thomas Haller 2015-06-19 13:10:05 UTC

(In reply to Thomas Haller from comment #19)
> Overall, this commit looks good. Lets fix it up and merge it early on?
> (it also conflicts with platform-refactoring branch, if I happen to merge
> that one earlier, I will take the task to resolve the conflicts).

I rebased the branch on master and repushed it.
No further changes, beside fixing conflicts during rebase.

The previous version is still there (dcbw/devices-for-all-2)

Comment 21 Dan Williams 2015-07-14 19:42:56 UTC

Rebased on git master.

Comment 22 Thomas Haller 2015-07-15 15:40:51 UTC

comment 19 still applies

Comment 23 Dan Williams 2015-07-16 23:20:37 UTC

(In reply to Thomas Haller from comment #19)
> >> core: split device creation and device setup
>     
> 
> nm_device_factory_create_device()
> 
> Or alternatively:
> 
>   g_return_val_if_fail (plink || connection, NULL);
>   g_return_val_if_fail (!plink || !connection, NULL);

Fixed; it's the later.  Only one should be passed at a given time.  We should never be handed a plink *and* a connection, because there's nothing we can do if the connection doesn't match the plink.  In any case, there are only a few cases where a device is created through this link: (a) we get a kernel link, or (b) we need to create a virtual device from a connection.  We'll never get both at the same time.


> is nm_device_realize() guaranteed to be do nothing if the device is already
> realized? Otherwise I'd add a boolean NMDevicePrivate:is_realized, and

In this commit, yes.  This is only called from two places:

1) factory_device_added_cb() - factories should never, ever be emitting 'created' for the same device twice

2) platform_link_added() - we call it right after we create the NMDevice

These two calls will never overlap.  Later in the series we do add a 'real' property that gets updated and checked before allowing re-realization.

>      g_object_class_install_property
>           (object_class, PROP_DRIVER,
>            g_param_spec_string (NM_DEVICE_DRIVER, "", "",
>                                 NULL,
> -                               G_PARAM_READWRITE | G_PARAM_CONSTRUCT_ONLY |
> +                               G_PARAM_READWRITE |
>                                 G_PARAM_STATIC_STRINGS));
> 
> only READABLE?

This was because NMDeviceGeneric sets the driver name during setup from the plink, but that turns out to be bogus (because NMDevice does that in update_device_from_platform_link() during NMDevice::setup() already).  Re-added CONSTRUCT_ONLY.

> In NMDevice, could we restore the log line:
>    _LOGD (LOGD_DEVICE, "constructor(): %s, kernel ifindex %d", 
> which should be the very first message this instance logs. (to say "hi").

There's no constructor() anymore... that line has been replaced with this one in NMDevice::setup(), which gets called at about the same time, after priv->ifindex has been set:

_LOGD (LOGD_DEVICE, "setup(): %s, kernel ifindex %d", G_OBJECT_TYPE_NAME (self), priv->ifindex);

> update_device_from_platform_link() lacks 
>   g_object_notify (G_OBJECT (self), NM_DEVICE_UDI);
> and others (NM_DEVICE_IFACE).

You're right; though at this point the device hasn't been exported yet so the signals don't mean much.  But they will later I think; so I've fixed this.

> update_device_from_platform_link()
> +    g_free (priv->iface);
> +    priv->iface = g_strdup (plink->name);
> 
> instead: 
>  
>      if (!priv->iface)
>          priv->iface = g_strdup (plink->name);

Fixed.


> also:
> 
>      g_return_if_fail (priv->ifindex <= 0 || priv->ifindex ==
> plink->ifindex);

We should never ever get a NMPlatformLink with an ifindex <= 0 though...  and this function is what sets priv->ifindex, so that check is unnecessary.

> nm-wwan-factory.c:
> 
> create_device()
> {
>     *out_ignore = TRUE;
>     g_return_val_if_fail (plink, NULL);
>     g_return_val_if_fail (plink->type == NM_LINK_TYPE_WWAN_ETHERNET, NULL);
>     return NULL;
> }

I don't think the plugin should be telling NM core to ignore a device that it doesn't actually handle...  I'm happy to add the other two, but I think the *out_ignore should be at the bottom.

> nm-wifi-factory.c:
> 
> 
>      g_return_val_if_fail (plink != NULL, NULL);
>  
>      if (plink->type == NM_LINK_TYPE_WIFI)
>           return nm_device_wifi_new (iface);
>      else if (plink->type == NM_LINK_TYPE_OLPC_MESH)
>           return nm_device_olpc_mesh_new (iface);
>      g_return_val_if_reached (NULL);

Done.

>  nm_device_olpc_mesh_new (const char *iface)
>  nm_device_wifi_new (const char *iface)
>  {
> -    g_return_val_if_fail (platform_device != NULL, NULL);
> +    g_return_val_if_fail (iface && *iface, NULL);

This is ensured for every factory by nm_device_factory_create_device().  I added the "&& *iface" to that function.

> Overall, this commit looks good. Lets fix it up and merge it early on?
> (it also conflicts with platform-refactoring branch, if I happen to merge
> that one earlier, I will take the task to resolve the conflicts).

Repushed with new "fixup! core: split device creation and device setup".

Comment 24 Thomas Haller 2015-07-17 16:48:32 UTC

>> core: split device creation and device setup

factory_device_added_cb() passes a NULL plink to nm_device_realize(). I think that leads to a crash (e.g. realize() in nm-device-vlan.c)


diff --git a/src/devices/nm-device.c b/src/devices/nm-device.c
index f587132..68cd4b7 100644
--- a/src/devices/nm-device.c
+++ b/src/devices/nm-device.c
@@ -1629,7 +1629,7 @@ nm_device_create_and_realize (NMDevice *self,
 
     NM_DEVICE_GET_CLASS (self)->setup (self, (plink.type != NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
 
-    g_warn_if_fail (nm_device_check_connection_compatible (self, connection));
+    g_return_val_if_fail (nm_device_check_connection_compatible (self, connection), TRUE);
     return TRUE;
 }
 
diff --git a/src/nm-manager.c b/src/nm-manager.c
index 16b3694..c880ff1 100644
--- a/src/nm-manager.c
+++ b/src/nm-manager.c
@@ -1043,8 +1043,9 @@ system_create_virtual_device (NMManager *self, NMConnection *connection, GError
                * previously existed there might be one.
                */
               add_device (self, device, !nm_owned);
-         }
-         g_clear_object (&device);
+              g_object_unref (device);
+         } else
+              g_clear_object (&device);
     }
 
 out:
@@ -1838,7 +1839,7 @@ factory_device_added_cb (NMDeviceFactory *factory,
     if (nm_device_realize (device, NULL, &error))
          add_device (NM_MANAGER (user_data), device, TRUE);
     else {
-         nm_log_warn (LOGD_DEVICE, "(%s): %s", nm_device_get_iface (device), error->message);
+         nm_log_warn (LOGD_DEVICE, "(%s): failed to realize device: %s", nm_device_get_iface (device), error->message);
          g_error_free (error);
     }
 }

Comment 25 Dan Williams 2015-07-17 22:33:10 UTC

(In reply to Thomas Haller from comment #24)
> >> core: split device creation and device setup
> 
> factory_device_added_cb() passes a NULL plink to nm_device_realize(). I
> think that leads to a crash (e.g. realize() in nm-device-vlan.c)

It won't currently because the VLAN factory never emits that signal.  Only external plugins (WWAN, ATM, Bluetooth, etc) do, and only those where the device is not represented by a NMPlatformLink.

But I agree it's something that we should warn/assert about, since there's no way to programmatically enforce that factory_device_added_cb() is only called under these conditions, so I've added a g_return_val_if_fail() there.

> diff --git a/src/devices/nm-device.c b/src/devices/nm-device.c
> index f587132..68cd4b7 100644
> --- a/src/devices/nm-device.c
> +++ b/src/devices/nm-device.c
> @@ -1629,7 +1629,7 @@ nm_device_create_and_realize (NMDevice *self,
>  
>      NM_DEVICE_GET_CLASS (self)->setup (self, (plink.type !=
> NM_LINK_TYPE_UNKNOWN) ? &plink : NULL);
>  
> -    g_warn_if_fail (nm_device_check_connection_compatible (self,
> connection));
> +    g_return_val_if_fail (nm_device_check_connection_compatible (self,
> connection), TRUE);
>      return TRUE;
>  }

Done.

> diff --git a/src/nm-manager.c b/src/nm-manager.c
> index 16b3694..c880ff1 100644
> --- a/src/nm-manager.c
> +++ b/src/nm-manager.c
> @@ -1043,8 +1043,9 @@ system_create_virtual_device (NMManager *self,
> NMConnection *connection, GError
>                 * previously existed there might be one.
>                 */
>                add_device (self, device, !nm_owned);
> -         }
> -         g_clear_object (&device);
> +              g_object_unref (device);
> +         } else
> +              g_clear_object (&device);
>      }

Good catch; fixed that up slightly differently but same result.

>  out:
> @@ -1838,7 +1839,7 @@ factory_device_added_cb (NMDeviceFactory *factory,
>      if (nm_device_realize (device, NULL, &error))
>           add_device (NM_MANAGER (user_data), device, TRUE);
>      else {
> -         nm_log_warn (LOGD_DEVICE, "(%s): %s", nm_device_get_iface
> (device), error->message);
> +         nm_log_warn (LOGD_DEVICE, "(%s): failed to realize device: %s",
> nm_device_get_iface (device), error->message);
>           g_error_free (error);
>      }
>  }

Fixed.

Added all these to the "fixup!" commit for the first patch.

Comment 26 Thomas Haller 2015-07-19 09:58:00 UTC

If I get to it, I will review again. Otherwise, I have no concrete suggestions at the moment. Looks good so far.

Comment 27 Beniamino Galvani 2015-07-27 08:23:50 UTC

I think we need also to add the "Since: 1.2" tag and NM_AVAILABLE_IN_1_2 attribute on new libnm public functions.

Comment 28 Dan Williams 2015-07-31 19:05:39 UTC

I fixed up the Since/NM_AVAILABLE_IN stuff and repushed.

Comment 29 Dan Williams 2015-07-31 19:09:10 UTC

And I merged the first patch "core: split device creation and device setup" per our agreement earlier in this bug.

e8139f56c26ae3bcc5e14abdb29970ae07e93299

We can continue reviewing and reworking the rest of the patches in dcbw/devices-for-all (or Thomas can take them over, or whatever :).

Comment 30 Beniamino Galvani 2015-08-03 12:04:05 UTC

On master branch I see the following error at NM start when there is a bond
connection configured, is it related to the device creation changes?

NetworkManager:ERROR:devices/nm-device-bond.c:478:create_and_realize: assertion failed: (nm_device_get_ifindex (device) <= 0)

+ Trace 235319

#0 raise
#1 abort
#2 g_assertion_message
#3 g_assertion_message_expr
#4 create_and_realize
at devices/nm-device-bond.c line 478
#5 nm_device_create_and_realize
at devices/nm-device.c line 1599
#6 system_create_virtual_device
at nm-manager.c line 1039
#7 system_create_virtual_devices
at nm-manager.c line 1078
#8 add_device
at nm-manager.c line 1837
#9 platform_link_added
at nm-manager.c line 1939
#10 platform_query_devices
at nm-manager.c line 2011
#11 nm_manager_start
at nm-manager.c line 4174
#12 main
at main.c line 469

Comment 31 Dan Williams 2015-08-06 12:20:11 UTC

Yeah, it's related.  I'll look into it.

Comment 32 Thomas Haller 2015-08-06 14:40:40 UTC

There were compile errors, after the last rebase.

I took the current state, and pushed it to "dcbw/devices-for-all-3" for reference.

Then I rebased the branch on master.

I fixed the compile errors and thereby added three new commits:
  712d42b utils: add @filter_func argument to nm_utils_g_value_set_object_
  34d28e9 core: add nm_exported_object_list_to_object_path_array() helper
  08ec3e9 device: let NM_DEVICE_MASTER be equal to nm_device_get_master()

(where the first is just split out from another patch from dcbw).



Then I added a few fixup! commits. Please review them whether the are right.



Finally, I added 3 commits:
  bf56149 device/trivial: rename fields to have unique names
  0c1a3ef device/trivial: rename master related function of nm-device
  68f31d5 WIP: device: modify handling master device
Not sure, whether they are right. But the master-slave handling was very hard to grasp for me. These patches aim to make it more understandable (?).

Comment 33 Dan Williams 2015-08-12 21:27:51 UTC

> fixup! core: create devices first and realize them later

I think this will break master changes where priv->master != plink->master.  The first if() there will actually un-enslave @self, so after running that priv->enslaved will be FALSE and the second if() statement will run.  This commit breaks that becuase the second part isn't run.  Yeah, I suppose it's icky that the first if depends on side-effects of the second if, and that deserves a comment if nothing else.

The rest of the commits up to the WIP one look OK.

Comment 34 Thomas Haller 2015-08-19 11:06:24 UTC

Rebased on master, and squashed older fixup commits.

On review again.

Comment 35 Dan Williams 2015-09-25 22:08:15 UTC

So all the code changes look fine to me, though there are a few issues I noticed which were probably errors in my original patches or snuck in after reabases:

1) I pushed a fixup that fixes the python example "get-devices.py" which is what I was using to test stuff

2) for some reason, after activating a bond, get-devices.py still lists 'bond0' as unrealized.  I created a bond0 master connection and a single "bond0 slave 1" for an ethernet device on my system and connected it to a switch.  I then ran "nmcli con up 'bond0 slave 1'" and then checked get-devices.py and bond0 was still listed as unrealized.  Maybe we're forgetting to update the internal variable?

Comment 36 Thomas Haller 2015-09-26 12:08:05 UTC

(rebased on master again)

Comment 37 Dan Williams 2015-10-13 14:44:50 UTC

Any chance you could debug the #2 a bit more and make sure that when a virtual/software device is activated that it becomes realized?

Comment 38 Thomas Haller 2015-11-24 14:15:17 UTC

Rebased to master and fixup commits squashed. Previous version at dcbw/devices-for-all-7

Comment 39 Dan Williams 2015-11-24 20:11:58 UTC

The additional patches and everything LGTM, but on my machine there are still issues with activation.  For my bond0 test case, I see:

1) nmcli con up 'bond0' never returns
2) NM activates bond0 and the device gets to 'activated' state, but 'nmcli dev' doesn't show bond0 at all

Debugging a bit, this is because for software devices we don't ever call nm_device_setup_finish(), because virtual devices are created with nM_device_create_and_realize(), which doesn't call nm_device_setup_finish() but instead directly calls device_class->setup_finish().

Pushed a fixup for that.  Needs more testing though.

Comment 40 Beniamino Galvani 2015-12-02 18:04:51 UTC

Pushed some fixups, otherwise LGTM.

Comment 41 Thomas Haller 2015-12-04 11:29:08 UTC

I merged the branch to master as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=153aba0662460c3b03eb4051ac82f99468362183


It might not yet 100% be correct, but this is such a fundamental change that I keep constantly rebasing it (destabilizing it again), making the whole branch a moving target hard to test.

I hope that it works well enough and now on master we get more test-coverage and it should be easy to fix potential issues later.