After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 747628 - [review] dcbw/bgo747628-platform-udev-excise: don't rely on udev
[review] dcbw/bgo747628-platform-udev-excise: don't rely on udev
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks: nm-review
 
 
Reported: 2015-04-10 15:08 UTC by Lubomir Rintel
Modified: 2015-05-04 16:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
linux-platform: drop link_is_announceable() check from link_get() (1.28 KB, patch)
2015-04-10 15:08 UTC, Lubomir Rintel
none Details | Review

Description Lubomir Rintel 2015-04-10 15:08:09 UTC
Created attachment 301295 [details] [review]
linux-platform: drop link_is_announceable() check from  link_get()

We announce and activate the devices we create ourselves even before
noticed by udev and link_get() is called from various link_get_*()
functions during the activation path.

link_get_master() not working causes an assertion failre to fire:

  # nmcli c add type bond

  file platform/nm-linux-platform.c: line 2917 (master_category): should not be reached

_nm_platform_link_get() is guarded by the same check already, so this is
redundant anyway.
Comment 1 Dan Williams 2015-04-13 20:20:46 UTC
I think this got triggered by the udev unmanaged rules thing, where now every device has to wait for udev.  Previously only hardware devices did, so this assertion never hit.  But now that everything waits for udev, this issue showed up.  I'm somewhat hesitant to start creating two "views" of devices, one for link_get() that bypasses udev, and one for _nm_platform_link_get() (which NMManager sees) that does wait for udev.  I think that either NM should wait for udev before exposing a link anywhere, or it shouldn't and udev properties should come later?

What if we made NM just expose all rtnl link objects and create NMDevice objects for them, but kept them unmanaged until we get the udev notification for the device?  Then at least we'd always have an NMDevice for every rtnllink object (or at least, nothing blocking us from creating the NMDevice for it).

The platform currently uses udev for:

1) reading the driver name (which is mostly informational, but does get used to identify Intel WiMAX devices at creation time)
2) Identifying OLPC Mesh interfaces - we should just hardcode this since it will never ever change since this hardware stopped being made long ago, but instead of looking at udev attributes we should just look for a /sys/class/net/<ifname>/mesh_ie/ directory
3) reading DEVTYPE from /sys/class/net/<ifname>/uevent, this is easy to do without udev
4) reading the UDI; this isn't really used internally, and we should actually remove the restriction that some code has (ADSL, BT) that wants it for nm_device_*_new() functions, and just update it later when we have it
Comment 2 Dan Williams 2015-04-13 22:24:19 UTC
So I decided to prototype this, and ended up with dcbw/platform-udev-excise.  Do you think going farther with that branch would work?  I haven't run-time tested it, but it should ensure that any rtnllink-backed NMDevice will stay in UNAVAILABLE until udev has initialized the link.

I just realized (now that I've done the work) that we need the link to stay in UNMANAGED to wait for udevs managed/unmanaged rules.  I don't think it would too hard to fix that up in the branch though.

Thoughts?  I think we should merge "platform: don't use udev for link type determination" anyway, since it's a nice cleanup and removed udev waiting from some critical paths.
Comment 3 Lubomir Rintel 2015-04-14 12:49:51 UTC
(In reply to Dan Williams from comment #2)
> So I decided to prototype this, and ended up with dcbw/platform-udev-excise.
> Do you think going farther with that branch would work?  I haven't run-time
> tested it, but it should ensure that any rtnllink-backed NMDevice will stay
> in UNAVAILABLE until udev has initialized the link.
> 
> I just realized (now that I've done the work) that we need the link to stay
> in UNMANAGED to wait for udevs managed/unmanaged rules.  I don't think it
> would too hard to fix that up in the branch though.

Yes. Replacing NM_PLATFORM_SIGNAL_CHANGED with NM_PLATFORM_SIGNAL_ADDED udev_device_added() seems to do the trick as announce_object() called from the rtnetlink callback will not fire the signal since device.driver is not set yet.

Not sure if that's the correct thing to do; just seems to work well enough for testing.

> Thoughts?  I think we should merge "platform: don't use udev for link type
> determination" anyway, since it's a nice cleanup and removed udev waiting
> from some critical paths.

I like the idea.

Pushed one cleanup too.
Comment 4 Dan Williams 2015-04-14 14:30:23 UTC
I rebased on master and fixed up the conflicts with Thomas' platform cleanups, and fixed the unavailable/unmanaged thing I talked about earlier.  I think the branch is feature-complete now, just needs testing.

Repushed now as dcbw/bgo747628-platform-udev-excise.

Also, I didn't really mean to take over the bug entirely with a branch, but I've just wanted to kill the whole "wait-for-udev" thing for a long long time so I decided to prototype it yesterday...
Comment 5 Thomas Haller 2015-04-15 11:03:58 UTC
>> platform: rework link type detection for better fallback (bgo #743209)

+    for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
+         if (type == linktypes[i].nm_type)
+              return linktypes[i].rtnl_type;
     }
+    g_warning ("Wrong/unhandled type: %d", type);
+    return NULL;

g_warning() is wrong. Either this condition must not ever happen (then g_return_val_if_reached()), or it can actually happen (then just silently return NULL). Seems to me it's the former.
Comment 6 Dan Williams 2015-04-15 14:20:01 UTC
Fixed that up to use g_return_val_if_reached(NULL);, repushed.
Comment 7 Lubomir Rintel 2015-04-16 20:32:47 UTC
Generally seems fine to me; addition of veths doesn't work:

announce_object() doesn't emit the link changed signal, as device.driver is always NULL. It should probably be either initialized properly. To be on the safe side, device.initialized should probably be used in place of device.driver.

Also, there's no need to filter out NM_PLATFORM_SIGNAL_ADDED even though the device is not initialized yet, since it stays unmanaged:

-                       switch (change_type) {
-                       case NM_PLATFORM_SIGNAL_ADDED:
-                       case NM_PLATFORM_SIGNAL_CHANGED:
-                               if (!device.driver)
-                                       return;
-                               break;
-                       default:
-                               break;
-                       }
+                       if (NM_PLATFORM_SIGNAL_CHANGED && !device.initialized)
+                               return;

Another issue is that nm_platform_link_get_unmanaged() will not return meaningful results at add_device() time when the device is known to rtnetlink, but not to udev yet. Maybe something like this in link_changed_cb() in response to device being announced by udev might may sense:

        if (nm_platform_link_get_unmanaged (nm_device_get_ifindex (self), &platform_unmanaged))
                nm_device_set_unmanaged (self, NM_UNMANAGED_DEFAULT, platform_unmanaged, NM_DEVICE_STATE_REASON_USER_REQUESTED);
Comment 8 Dan Williams 2015-04-16 21:19:02 UTC
Ok, fixed those issues up too.(In reply to Lubomir Rintel from comment #7)
> Generally seems fine to me; addition of veths doesn't work:
> 
> announce_object() doesn't emit the link changed signal, as device.driver is
> always NULL. It should probably be either initialized properly. To be on the
> safe side, device.initialized should probably be used in place of
> device.driver.

I think we can just remove the whole block there, since we want to announce devices whenever we know about them from netlink, with or without the driver.  device.initialized will be FALSE before udev finds the device, so if we gate on that, we won't announce the device until udev knows about it, which contradicts the purpose of the patch.  Thus I think we can kill the block.

> Another issue is that nm_platform_link_get_unmanaged() will not return
> meaningful results at add_device() time when the device is known to
> rtnetlink, but not to udev yet. Maybe something like this in
> link_changed_cb() in response to device being announced by udev might may
> sense:
> 
>         if (nm_platform_link_get_unmanaged (nm_device_get_ifindex (self),
> &platform_unmanaged))
>                 nm_device_set_unmanaged (self, NM_UNMANAGED_DEFAULT,
> platform_unmanaged, NM_DEVICE_STATE_REASON_USER_REQUESTED);

I cleaned that up a bit too.  Basically, if the device has an ifindex (which all devices that can be platform_unmanaged do) then the device must stay in the UNMANAGED state until udev has found it, due to NM_UNMANAGED_PLATFORM_INIT.  So, we can consolidate the platform_unmanaged handling into nm_device_finish_init() to capture the already-known-to-udev-at-construct-time case, and into device_link_changed() for the udev-happens-later case.

The nm-manager.c/NM_UNMANAGED_DEFAULT and the set_property/NM_UNMANAGED_PLATFORM_INIT bits can actually both be done in nm_device_finish_init().

Then to cover the udev-later case that you mention above, I took your suggestion and added those bits to device_link_changed().

Does it look/work better now?
Comment 9 Thomas Haller 2015-04-17 08:27:00 UTC
>> platform: rework link type detection for better fallback (bgo #743209)
    
this removes the previously added read_devtype() and reading of sysfs_path/uevent. That is probably correct, but could you explain in the commit message why this is no longer needed and how these cases are handled now differently?


Also, while at it, could you rename type_to_string() and type_to_rtnl_link_type() to something like "link_type_...".
My preference would be actually be "nm_link_type_to...".
Similarly, LinkDesc, to LinkTypeDesc.


+         for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
+              if (g_strcmp0 (type, linktypes[i].rtnl_type) == 0)
+                   return_type (linktypes[i].nm_type, type);
+         }

should return "linktypes[i].rtnl_type", not "type".





build_rtnl_link() called link_to_string() (now link_to_rtnl_type_string()).
Before it would return
-    case NM_LINK_TYPE_TAP:
-         return "tap";
-    case NM_LINK_TYPE_TUN:
-         return "tun";
now it would be NULL.
Also, differences for example for "openvswitch" type.

I guess, that is correct?




but maybe, above should be fixed and you should instead move the "tun" check:

     if (type) {
          for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
               if (g_strcmp0 (type, linktypes[i].rtnl_type) == 0) {
                    if (linktypes[i].nm_type == NM_LINK_TYPE_TUN) {
                        NMPlatformTunProperties props;
                        guint flags;
 
                        if (nm_platform_tun_get_properties ...
                             if (!g_strcmp0 (props.mode, "tap"))
                                 return_type (NM_LINK_TYPE_TAP, "tap");
                             if (!g_strcmp0 (props.mode, "tun"))
                                 return_type (NM_LINK_TYPE_TUN, "tun");
                        }
                        ...
                    }
                    return_type (linktypes[i].nm_type, type);
               }
          }


and maybe move the entire tun-detection in a separate function -- in a previous commit.



I split the last commit in two, I feel it's simpler to understand separately. Pushed on top of your branch -- needs adjusting of commit message.
Comment 10 Lubomir Rintel 2015-04-17 16:25:49 UTC
Looks better to me now.

Pushed two fixups.
Comment 11 Dan Williams 2015-04-17 16:56:21 UTC
Rebased on top of git master after the wimax removal merge.
Comment 12 Thomas Haller 2015-04-20 11:58:15 UTC
(In reply to Dan Williams from comment #11)
> Rebased on top of git master after the wimax removal merge.

needs squashing, it doesn't `git rebase --autosquash` without errors.
Comment 13 Thomas Haller 2015-04-23 22:32:08 UTC
>> core: add generic NMDevice function to recheck availability

strange whitespace in 
          guint               call_id;
+         NMDeviceStateReason available_reason;
+         NMDeviceStateReason unavailable_reason;
+    }                   recheck_available;
                ^^^^^^^^^



+nm_device_queue_recheck_available (NMDevice *self,
+                                   NMDeviceStateReason available_reason,
+                                   NMDeviceStateReason unavailable_reason)
+{

shouldn't a new queue request always reset the state reasons?

recheck_available() should combine the two debug loglines and possibly print state&now_available?



>> platform: don't wait for udev before announcing links

init_link() now does the 
     if (!info->driver)
          info->driver = g_intern_string (rtnl_link_get_type (rtnllink));
guessing first. The downside is additional work, and possibly interning strings that are not needed. Just move it after
    udev_device = g_hash_table_lookup (priv->udev_devices,
-- also, now info->driver might end up NULL, before it would never be NULL.



+    if (info->driver && g_strcmp0 (info->driver, priv->driver)) {
+         /* Update driver to what udev gives us */
+         g_free (priv->driver);
+         priv->driver = g_strdup (info->driver);
+         g_object_notify (G_OBJECT (self), NM_DEVICE_DRIVER);
+    }

why the check for info->driver!=NULL first? You don't want to clear the driver field? As said above, I think info->driver should used not to be NULL. But if you allow that, we can also clear priv->driver.



+         if (nm_platform_link_get_unmanaged (NM_PLATFORM_GET, priv->ifindex, &platform_unmanaged)) {
+              nm_device_set_unmanaged (self,
+                                       NM_UNMANAGED_DEFAULT,
+                                       platform_unmanaged,
+                                       NM_DEVICE_STATE_REASON_USER_REQUESTED);
+         }

if nm_platform_link_get_unmanaged() returns false, it should also clear the NM_UNMANAGED_DEFAULT default.





+         nm_device_set_initial_unmanaged_flag (self, NM_UNMANAGED_PLATFORM_INIT, FALSE);
+
+         if (nm_platform_link_get_unmanaged (NM_PLATFORM_GET, priv->ifindex, &platform_unmanaged)) {
+              nm_device_set_unmanaged (self,
+                                       NM_UNMANAGED_DEFAULT,
+                                       platform_unmanaged,
+                                       NM_DEVICE_STATE_REASON_USER_REQUESTED);
+         }

better to first set NM_UNMANAGED_DEFAULT before clearing NM_UNMANAGED_PLATFORM_INIT. Otherwise, the device becomes managed for a short moment.





+    nm_device_set_initial_unmanaged_flag (device, NM_UNMANAGED_PLATFORM_INIT, TRUE);
+
     nm_device_dbus_export (device);
     nm_device_finish_init (device);
 
     if (try_assume) {
          connection_assumed = recheck_assume_connection (device, self);
          g_signal_connect (device, NM_DEVICE_RECHECK_ASSUME,
                            G_CALLBACK (recheck_assume_connection), self);
     }

can we even assume a connection if the device is unmanaged?

recheck_assumed() has 
»···if (nm_device_get_unmanaged_flag (device, NM_UNMANAGED_USER) ||
»···    nm_device_get_unmanaged_flag (device, NM_UNMANAGED_INTERNAL) ||
»···    nm_device_get_unmanaged_flag (device, NM_UNMANAGED_EXTERNAL_DOWN) ||
»···    nm_device_get_unmanaged_flag (device, NM_UNMANAGED_PARENT))
»···»···return FALSE;
does it miss NM_UM_PLATFROM_INIT?




> olpc: remove now-unused udev rules for OLPC Mesh devices

could you explain the the commit message why this is now unused? Not clear to me.



> platform: refactor link-type to string conversion

+nm_type_to_string (NMLinkType type)

could we call it nm_link_type_*()? Because there is already ObjectType, which gets a more prominent roles in platform-refactoring-branch.




+    /* IFLA_INFO_KIND / rtnl_link_get_type() where applicable; the rtnl type
+     * should only be specificed if it is a direct mapping.  eg, tun/tap
+     * should not be specified since both tun and tap devices use "tun".
+     * Drivers set this value from their 'struct rtnl_link_ops' structure.
+     */
+    const char *rtnl_type;

+    { NM_LINK_TYPE_TAP,           "tap",         NULL,          NULL },
+    { NM_LINK_TYPE_TUN,           "tun",         NULL,          NULL },

should't they both map to rtnl_type="tun" instead? You might do this because you make use of it later in link_extract_type, but you could avoid that by swaping

+         for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
+              if (g_strcmp0 (rtnl_type, linktypes[i].rtnl_type) == 0)
+                   return_type (linktypes[i].nm_type, rtnl_type);
+         }
 
+         if (!strcmp (rtnl_type, "tun")) {
+              NMPlatformTunProperties props;
+              guint flags;




pushed 3 commits on top of your branch
Comment 14 Dan Williams 2015-04-24 16:46:05 UTC
(In reply to Thomas Haller from comment #13)
> >> core: add generic NMDevice function to recheck availability
> 
> strange whitespace in 
>           guint               call_id;
> +         NMDeviceStateReason available_reason;
> +         NMDeviceStateReason unavailable_reason;
> +    }                   recheck_available;
>                 ^^^^^^^^^

I matched what was done for the dispatcher state stuff just below.  I assume the dispatcher was done like this to align the struct variable name to the rest of the NMDevicePrivate arguments.

> +nm_device_queue_recheck_available (NMDevice *self,
> +                                   NMDeviceStateReason available_reason,
> +                                   NMDeviceStateReason unavailable_reason)
> +{
> 
> shouldn't a new queue request always reset the state reasons?

Yeah, fixed.

> recheck_available() should combine the two debug loglines and possibly print
> state&now_available?

Ok, fixed that up and put a more descriptive log message in.

> >> platform: don't wait for udev before announcing links
> 
> init_link() now does the 
>      if (!info->driver)
>           info->driver = g_intern_string (rtnl_link_get_type (rtnllink));
> guessing first. The downside is additional work, and possibly interning
> strings that are not needed. Just move it after
>     udev_device = g_hash_table_lookup (priv->udev_devices,
> -- also, now info->driver might end up NULL, before it would never be NULL.

True; driver should only be NULL at first, but after it's set it should never be cleared.  The driver of a device will never change over its lifetime.  Fixed as you suggest, and now info->driver will never be NULL.

> +    if (info->driver && g_strcmp0 (info->driver, priv->driver)) {
> +         /* Update driver to what udev gives us */
> +         g_free (priv->driver);
> +         priv->driver = g_strdup (info->driver);
> +         g_object_notify (G_OBJECT (self), NM_DEVICE_DRIVER);
> +    }
> 
> why the check for info->driver!=NULL first? You don't want to clear the
> driver field? As said above, I think info->driver should used not to be
> NULL. But if you allow that, we can also clear priv->driver.

Right, it shouldn't ever be NULL once it's been set in the platform.  Fixed now.

> +         if (nm_platform_link_get_unmanaged (NM_PLATFORM_GET,
> priv->ifindex, &platform_unmanaged)) {
> +              nm_device_set_unmanaged (self,
> +                                       NM_UNMANAGED_DEFAULT,
> +                                       platform_unmanaged,
> +                                      
> NM_DEVICE_STATE_REASON_USER_REQUESTED);
> +         }
> 
> if nm_platform_link_get_unmanaged() returns false, it should also clear the
> NM_UNMANAGED_DEFAULT default.

Fixed; this is only correct because we've made sure that either the plink is initialized, or that we don't care about initialization at this stage.  

> +         nm_device_set_initial_unmanaged_flag (self,
> NM_UNMANAGED_PLATFORM_INIT, FALSE);
> +
> +         if (nm_platform_link_get_unmanaged (NM_PLATFORM_GET,
> priv->ifindex, &platform_unmanaged)) {
> +              nm_device_set_unmanaged (self,
> +                                       NM_UNMANAGED_DEFAULT,
> +                                       platform_unmanaged,
> +                                      
> NM_DEVICE_STATE_REASON_USER_REQUESTED);
> +         }
> 
> better to first set NM_UNMANAGED_DEFAULT before clearing
> NM_UNMANAGED_PLATFORM_INIT. Otherwise, the device becomes managed for a
> short moment.

Actually it won't, because we're calling nm_device_set_initial_unmanaged_flag (NM_UNMANAGED_PLATFORM_INIT) which doesn't change any state.  But that shows  me that we also want to use nm_device_set_initial_unmanaged_flag(NM_UNMANAGED_DEFAULT) since the decision of whether not to change state shouldn't be done until after nm_device_finish_init().

> +    nm_device_set_initial_unmanaged_flag (device,
> NM_UNMANAGED_PLATFORM_INIT, TRUE);
> +
>      nm_device_dbus_export (device);
>      nm_device_finish_init (device);
>  
>      if (try_assume) {
>           connection_assumed = recheck_assume_connection (device, self);
>           g_signal_connect (device, NM_DEVICE_RECHECK_ASSUME,
>                             G_CALLBACK (recheck_assume_connection), self);
>      }
> 
> can we even assume a connection if the device is unmanaged?

Yes, not all unmanaged flags prevent assumption, as you see in recheck_assume_connection().  But with my fixups, connection assumption still works for both software devices (virbr0) and hardware ones (eth0).

> recheck_assumed() has 
> »···if (nm_device_get_unmanaged_flag (device, NM_UNMANAGED_USER) ||
> »···    nm_device_get_unmanaged_flag (device, NM_UNMANAGED_INTERNAL) ||
> »···    nm_device_get_unmanaged_flag (device, NM_UNMANAGED_EXTERNAL_DOWN) ||
> »···    nm_device_get_unmanaged_flag (device, NM_UNMANAGED_PARENT))
> »···»···return FALSE;
> does it miss NM_UM_PLATFROM_INIT?

We want to ignore PLATFORM_INIT there, because the connection assumption has to happen in the UNMANAGED state, otherwise we might touch the device when moving to UNAVAILABLE (setting up userspace IPv6LL, bringing the device up, etc).

> > olpc: remove now-unused udev rules for OLPC Mesh devices
> 
> could you explain the the commit message why this is now unused? Not clear
> to me.

I've decided to squash this into "platform: don't use udev for link type determination" instead, since that commit is what makes this OLPC Mesh stuff obsolete.

> > platform: refactor link-type to string conversion
> 
> +nm_type_to_string (NMLinkType type)
> 
> could we call it nm_link_type_*()? Because there is already ObjectType,
> which gets a more prominent roles in platform-refactoring-branch.

Sure, done.

> +    /* IFLA_INFO_KIND / rtnl_link_get_type() where applicable; the rtnl type
> +     * should only be specificed if it is a direct mapping.  eg, tun/tap
> +     * should not be specified since both tun and tap devices use "tun".
> +     * Drivers set this value from their 'struct rtnl_link_ops' structure.
> +     */
> +    const char *rtnl_type;
> 
> +    { NM_LINK_TYPE_TAP,           "tap",         NULL,          NULL },
> +    { NM_LINK_TYPE_TUN,           "tun",         NULL,          NULL },
> 
> should't they both map to rtnl_type="tun" instead? You might do this because
> you make use of it later in link_extract_type, but you could avoid that by
> swaping

I don't do this right now, because it would screw up tun/tap interface creation later when we add it, unless we do some additional magic.  I want to make sure that whoever adds tun/tap creation has to do the whole thing correctly, instead of leaving little land-mines for them :)  Our platform creation code calls rtnl_link_set_type (nm_type_to_rtnl_type()) and that would then create tap devices as 'tun'.

> +         for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
> +              if (g_strcmp0 (rtnl_type, linktypes[i].rtnl_type) == 0)
> +                   return_type (linktypes[i].nm_type, rtnl_type);
> +         }
>  
> +         if (!strcmp (rtnl_type, "tun")) {
> +              NMPlatformTunProperties props;
> +              guint flags;

Actually later I'd like to have an additional function pointer in link_types that would handle differentiation, so that we don't need to call the tun stuff directly here.  Instead it would go through a vptr.  But that's a later commit.

> pushed 3 commits on top of your branch

They look good, thanks!

Repushed, please review.
Comment 15 Thomas Haller 2015-04-25 16:20:16 UTC
(In reply to Dan Williams from comment #14)
> (In reply to Thomas Haller from comment #13)
> > >> core: add generic NMDevice function to recheck availability
> > 
> > strange whitespace in 
> >           guint               call_id;
> > +         NMDeviceStateReason available_reason;
> > +         NMDeviceStateReason unavailable_reason;
> > +    }                   recheck_available;
> >                 ^^^^^^^^^
> 
> I matched what was done for the dispatcher state stuff just below.  I assume
> the dispatcher was done like this to align the struct variable name to the
> rest of the NMDevicePrivate arguments.

recheck_available is not aligned with recheck_assume_id/dispatcher or anything else, AFAIS.

 

> > > platform: refactor link-type to string conversion
> > 
> > +nm_type_to_string (NMLinkType type)
> > 
> > could we call it nm_link_type_*()? Because there is already ObjectType,
> > which gets a more prominent roles in platform-refactoring-branch.
> 
> Sure, done.
> 
> > +    /* IFLA_INFO_KIND / rtnl_link_get_type() where applicable; the rtnl type
> > +     * should only be specificed if it is a direct mapping.  eg, tun/tap
> > +     * should not be specified since both tun and tap devices use "tun".
> > +     * Drivers set this value from their 'struct rtnl_link_ops' structure.
> > +     */
> > +    const char *rtnl_type;
> > 
> > +    { NM_LINK_TYPE_TAP,           "tap",         NULL,          NULL },
> > +    { NM_LINK_TYPE_TUN,           "tun",         NULL,          NULL },
> > 
> > should't they both map to rtnl_type="tun" instead? You might do this because
> > you make use of it later in link_extract_type, but you could avoid that by
> > swaping
> 
> I don't do this right now, because it would screw up tun/tap interface
> creation later when we add it, unless we do some additional magic.  I want
> to make sure that whoever adds tun/tap creation has to do the whole thing
> correctly, instead of leaving little land-mines for them :)  Our platform
> creation code calls rtnl_link_set_type (nm_type_to_rtnl_type()) and that
> would then create tap devices as 'tun'.

A function to add a tap device could not use the generic nm_platform_link_add(NM_LINK_TYPE_TAP) anyway, but would need a special nm_platform_link_add_tun(tap=TRUE). Additional (simple) logic will be needed anyway.

Better handle special cases where needed then making wrong promises: nm_type_to_rtnl_type_string() for TAP would expect "tun".



> > +         for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
> > +              if (g_strcmp0 (rtnl_type, linktypes[i].rtnl_type) == 0)
> > +                   return_type (linktypes[i].nm_type, rtnl_type);
> > +         }
> >  
> > +         if (!strcmp (rtnl_type, "tun")) {
> > +              NMPlatformTunProperties props;
> > +              guint flags;
> 
> Actually later I'd like to have an additional function pointer in link_types
> that would handle differentiation, so that we don't need to call the tun
> stuff directly here.  Instead it would go through a vptr.  But that's a
> later commit.

Seems over-engineered (me saying!! :) ). Are there any other cases then tun/tap you have in mind?





nm_type_to_rtnl_type_string (NMLinkType type)
nm_type_to_string (NMLinkType type)

did you forget the renaming to nm_link_type*()?
Comment 16 Dan Williams 2015-04-27 18:59:07 UTC
(In reply to Thomas Haller from comment #15)
> (In reply to Dan Williams from comment #14)
> > (In reply to Thomas Haller from comment #13)
> > > >> core: add generic NMDevice function to recheck availability
> > > 
> > > strange whitespace in 
> > >           guint               call_id;
> > > +         NMDeviceStateReason available_reason;
> > > +         NMDeviceStateReason unavailable_reason;
> > > +    }                   recheck_available;
> > >                 ^^^^^^^^^
> > 
> > I matched what was done for the dispatcher state stuff just below.  I assume
> > the dispatcher was done like this to align the struct variable name to the
> > rest of the NMDevicePrivate arguments.
> 
> recheck_available is not aligned with recheck_assume_id/dispatcher or
> anything else, AFAIS.

You're right, it's aligned with tabs not spaces, which is wrong.  Fixed.

> > > > platform: refactor link-type to string conversion
> > > 
> > > +nm_type_to_string (NMLinkType type)
> > > 
> > > could we call it nm_link_type_*()? Because there is already ObjectType,
> > > which gets a more prominent roles in platform-refactoring-branch.
> > 
> > Sure, done.
> > 
> > > +    /* IFLA_INFO_KIND / rtnl_link_get_type() where applicable; the rtnl type
> > > +     * should only be specificed if it is a direct mapping.  eg, tun/tap
> > > +     * should not be specified since both tun and tap devices use "tun".
> > > +     * Drivers set this value from their 'struct rtnl_link_ops' structure.
> > > +     */
> > > +    const char *rtnl_type;
> > > 
> > > +    { NM_LINK_TYPE_TAP,           "tap",         NULL,          NULL },
> > > +    { NM_LINK_TYPE_TUN,           "tun",         NULL,          NULL },
> > > 
> > > should't they both map to rtnl_type="tun" instead? You might do this because
> > > you make use of it later in link_extract_type, but you could avoid that by
> > > swaping
> > 
> > I don't do this right now, because it would screw up tun/tap interface
> > creation later when we add it, unless we do some additional magic.  I want
> > to make sure that whoever adds tun/tap creation has to do the whole thing
> > correctly, instead of leaving little land-mines for them :)  Our platform
> > creation code calls rtnl_link_set_type (nm_type_to_rtnl_type()) and that
> > would then create tap devices as 'tun'.
> 
> A function to add a tap device could not use the generic
> nm_platform_link_add(NM_LINK_TYPE_TAP) anyway, but would need a special
> nm_platform_link_add_tun(tap=TRUE). Additional (simple) logic will be needed
> anyway.
>
> Better handle special cases where needed then making wrong promises:
> nm_type_to_rtnl_type_string() for TAP would expect "tun".

If we map TYPE_TAP -> rtnl_type="tun" then anything could call nm_platform_link_add(NM_LINK_TYPE_TAP) and get a 'tun' link back.  That's what I'm preventing by keeping the rtnl_type NULL until the point when somebody adds proper tun support.  I just don't think setting them to "tun" is worth it just for completeness because nothing in the code uses them right now, and anything that would use them right now would be broken.  Best to do prevent that now, and let it be done correctly later.

> > > +         for (i = 0; i < G_N_ELEMENTS (linktypes); i++) {
> > > +              if (g_strcmp0 (rtnl_type, linktypes[i].rtnl_type) == 0)
> > > +                   return_type (linktypes[i].nm_type, rtnl_type);
> > > +         }
> > >  
> > > +         if (!strcmp (rtnl_type, "tun")) {
> > > +              NMPlatformTunProperties props;
> > > +              guint flags;
> > 
> > Actually later I'd like to have an additional function pointer in link_types
> > that would handle differentiation, so that we don't need to call the tun
> > stuff directly here.  Instead it would go through a vptr.  But that's a
> > later commit.
> 
> Seems over-engineered (me saying!! :) ). Are there any other cases then
> tun/tap you have in mind?

True, I think at this point tun/tap is the only one that made this mistake in the kernel.

> nm_type_to_rtnl_type_string (NMLinkType type)
> nm_type_to_string (NMLinkType type)
> 
> did you forget the renaming to nm_link_type*()?

I did, pushed a fixup for that.

Repushed!
Comment 17 Thomas Haller 2015-04-27 20:38:27 UTC
>> platform: rework link type detection for better fallback (bgo #743209)

already before, link_extract_type() can return the string from 
  type = rtnl_link_get_type (rtnllink);
as
  return_type (NM_LINK_TYPE_UNKNOWN, type);


this string is owned by the rtnllink instance.


Can we just drop that and return "unknown"? I don't see why we want here anything else. 

Actually, I would drop the @out_name argument entirely as we now have nm_link_type_to_rtnl_type_string().




Otherwise LGTM (this refactoring is an improvement. I like it)
Comment 18 Thomas Haller 2015-04-28 07:11:07 UTC
(In reply to Thomas Haller from comment #17)
> >> platform: rework link type detection for better fallback (bgo #743209)
> 
> already before, link_extract_type() can return the string from 
>   type = rtnl_link_get_type (rtnllink);
> as
>   return_type (NM_LINK_TYPE_UNKNOWN, type);
> 
> 
> this string is owned by the rtnllink instance.
> 
> 
> Can we just drop that and return "unknown"? I don't see why we want here
> anything else. 
> 
> Actually, I would drop the @out_name argument entirely as we now have
> nm_link_type_to_rtnl_type_string().
> 

I did that now, pushed patches on your branch.

How do you like that?
Comment 19 Thomas Haller 2015-04-28 08:30:57 UTC
(In reply to Thomas Haller from comment #18)
> (In reply to Thomas Haller from comment #17)
> > >> platform: rework link type detection for better fallback (bgo #743209)
> > 
> > already before, link_extract_type() can return the string from 
> >   type = rtnl_link_get_type (rtnllink);
> > as
> >   return_type (NM_LINK_TYPE_UNKNOWN, type);
> > 
> > 
> > this string is owned by the rtnllink instance.
> > 
> > 
> > Can we just drop that and return "unknown"? I don't see why we want here
> > anything else. 
> > 
> > Actually, I would drop the @out_name argument entirely as we now have
> > nm_link_type_to_rtnl_type_string().
> > 
> 
> I did that now, pushed patches on your branch.
> 
> How do you like that?

I reworked it again. The previous version droped @type_name entirely (see dcbw/bgo747628-platform-udev-excise-drop-type-name).
I think it's better to keep it, see dcbw/bgo747628-platform-udev-excise.
Comment 20 Thomas Haller 2015-04-28 09:52:03 UTC
(In reply to Dan Williams from comment #16)
> (In reply to Thomas Haller from comment #15)
> > (In reply to Dan Williams from comment #14)
> > > (In reply to Thomas Haller from comment #13)
> > > > >> core: add generic NMDevice function to recheck availability
> > > I don't do this right now, because it would screw up tun/tap interface
> > > creation later when we add it, unless we do some additional magic.  I want
> > > to make sure that whoever adds tun/tap creation has to do the whole thing
> > > correctly, instead of leaving little land-mines for them :)  Our platform
> > > creation code calls rtnl_link_set_type (nm_type_to_rtnl_type()) and that
> > > would then create tap devices as 'tun'.
> > 
> > A function to add a tap device could not use the generic
> > nm_platform_link_add(NM_LINK_TYPE_TAP) anyway, but would need a special
> > nm_platform_link_add_tun(tap=TRUE). Additional (simple) logic will be needed
> > anyway.
> >
> > Better handle special cases where needed then making wrong promises:
> > nm_type_to_rtnl_type_string() for TAP would expect "tun".
> 
> If we map TYPE_TAP -> rtnl_type="tun" then anything could call
> nm_platform_link_add(NM_LINK_TYPE_TAP) and get a 'tun' link back.  That's
> what I'm preventing by keeping the rtnl_type NULL until the point when
> somebody adds proper tun support.  I just don't think setting them to "tun"
> is worth it just for completeness because nothing in the code uses them
> right now, and anything that would use them right now would be broken.  Best
> to do prevent that now, and let it be done correctly later.

You cannot blindly call nm_platform_link_add(NM_LINK_TYPE_SOME_TYPE) and expect it to work. It will also not work if we implement proper tun support -- at least, it will not work generically, without treating certain type special.

With this reasoning you could also remove
+    { NM_LINK_TYPE_INFINIBAND,    "infiniband",  "ipoib",       NULL },
because nm_platform_link_add(NM_LINK_TYPE_INFINIBAND) doesn't work either.


Note that all calls to nm_platform_link_add() pass a compile time constant: 
  $ git grep '\<nm_platform_link_add\|build_rtnl_link\>'





Seems you want later:

nm_platform_link_add(NMLinkType type)
{
   rtnl_link_type = nm_link_type_to_string(type);
   if (rtnl_link_type) {
       /* add generically. */
   } else if (type == NM_LINK_TYPE_TAP || type == NM_LINK_TYPE_TUN)
       /* add tun/tap. */
   } else
       g_return_if_reached();
}


You could just do:

nm_platform_link_add(NMLinkType type)
{
   if (type == NM_LINK_TYPE_TAP || type == NM_LINK_TYPE_TUN) {
       /* add tun/tap. */
       return;
   }
   rtnl_link_type = nm_link_type_to_string(type);
   if (rtnl_link_type) {
       /* add generically. */
   } else
       g_return_if_reached();
}
Comment 21 Dan Williams 2015-04-30 15:37:19 UTC
(In reply to Thomas Haller from comment #20)
> (In reply to Dan Williams from comment #16)
> > (In reply to Thomas Haller from comment #15)
> > > (In reply to Dan Williams from comment #14)
> > > > (In reply to Thomas Haller from comment #13)
> > > > > >> core: add generic NMDevice function to recheck availability
> > > > I don't do this right now, because it would screw up tun/tap interface
> > > > creation later when we add it, unless we do some additional magic.  I want
> > > > to make sure that whoever adds tun/tap creation has to do the whole thing
> > > > correctly, instead of leaving little land-mines for them :)  Our platform
> > > > creation code calls rtnl_link_set_type (nm_type_to_rtnl_type()) and that
> > > > would then create tap devices as 'tun'.
> > > 
> > > A function to add a tap device could not use the generic
> > > nm_platform_link_add(NM_LINK_TYPE_TAP) anyway, but would need a special
> > > nm_platform_link_add_tun(tap=TRUE). Additional (simple) logic will be needed
> > > anyway.
> > >
> > > Better handle special cases where needed then making wrong promises:
> > > nm_type_to_rtnl_type_string() for TAP would expect "tun".
> > 
> > If we map TYPE_TAP -> rtnl_type="tun" then anything could call
> > nm_platform_link_add(NM_LINK_TYPE_TAP) and get a 'tun' link back.  That's
> > what I'm preventing by keeping the rtnl_type NULL until the point when
> > somebody adds proper tun support.  I just don't think setting them to "tun"
> > is worth it just for completeness because nothing in the code uses them
> > right now, and anything that would use them right now would be broken.  Best
> > to do prevent that now, and let it be done correctly later.
> 
> You cannot blindly call nm_platform_link_add(NM_LINK_TYPE_SOME_TYPE) and
> expect it to work. It will also not work if we implement proper tun support
> -- at least, it will not work generically, without treating certain type
> special.
> 
> With this reasoning you could also remove
> +    { NM_LINK_TYPE_INFINIBAND,    "infiniband",  "ipoib",       NULL },
> because nm_platform_link_add(NM_LINK_TYPE_INFINIBAND) doesn't work either.

Ok, I removed that and changed the comment in a fixup, then rebased/reordered.  All your changes look awesome.  Any better now?
Comment 22 Thomas Haller 2015-05-01 14:06:13 UTC
Seems you want 9e2c991559b0fec952f170b60a9bd74ec5688c18 to be
  "fixup! platform: refactor link-type to string conversion"
instead
  "fixup! platform: rework link type detection for better fallback (bgo #743209)"




>>  platform: refactor extraction of type-name for link
    
spell error in commit message:
    
    Split up this behavior and treat those value independently.
                                           ^^^^^



I pushed one squash! commit. After that, it all LGTM.
I nice cleanup!!
Comment 23 Thomas Haller 2015-05-01 14:07:49 UTC
(In reply to Thomas Haller from comment #22)

> spell error in commit message:

spelling error in "spell error" :)
Comment 24 Dan Williams 2015-05-01 20:31:28 UTC
Merged to master:

2599dadc2859262de784567384ba72ab92204d55
388b7830f322b60960884328ff51f7b4df0ef3d3
b484b03acf66f99ef59cc688de291f07f1cc5603
(and a few more minor patches)
Comment 25 Thomas Haller 2015-05-04 13:58:23 UTC
hm, this broke the platform route tests...


  sudo make check -C src/platform


  NMTST_DEBUG="no-debug,sudo-cmd=$PWD/tools/test-sudo-wrapper.sh" \
    make -C src/platform/tests/ check
Comment 26 Thomas Haller 2015-05-04 16:42:34 UTC
(In reply to Thomas Haller from comment #25)
> hm, this broke the platform route tests...
> 
> 
>   sudo make check -C src/platform
> 
> 
>   NMTST_DEBUG="no-debug,sudo-cmd=$PWD/tools/test-sudo-wrapper.sh" \
>     make -C src/platform/tests/ check

tests fixed now:

0731da1 valgrind: add libnl suppression
f614ebe platform: re-enable the platform link test
b22bf15 platform: fix root-tests after adding link detection without udev
7572837 platform: keep udev-device in udev_device_added() even if there is...