After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 710216 - [review] dcbw/ignore-carrier-fix: fix handling of ignore-carrier for non-static connections
[review] dcbw/ignore-carrier-fix: fix handling of ignore-carrier for non-stat...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-10-15 21:19 UTC by Dan Williams
Modified: 2013-11-07 02:15 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dan Williams 2013-10-15 21:19:08 UTC
I observed a bug where, with NetworkManager-config-server installed, when I had both a non-autoconnect bond slave connection and a DHCP connection configured for my eth0, and eth0 had no cable plugged in, NetworkManager would activate the DHCP connection on startup.

The problem is two-fold.

First, since the device was set to ignore-carrier by NetworkManager-config-server, nm-device.c::is_available() would return TRUE because it would match the non-autoconnect bond slave connection as "static" and allow the device to move from UNAVAILABLE -> DISCONNECTED on startup.

Second, since the device was in DISCONNECTED state now, nm_device_can_activate() would return TRUE for the DHCP connection, because it assumes that if a device is in DISCONNECTED state, that it can activate any connection.  Only when the device is in UNAVAILABLE state does it do special logic for handling ignore-carrier.

To fix this, make is_available() stop caring about ignore-carrier, and return it to "can this device activate stuff at a layer-2 level".  For the generic implementation, that means whether the carrier is on/off.  The only place that actually cared about the ignore-carrier logic here was the "default unmanaged" sections of nm_device_can_activate(), so it there.

Next, since we ever want to activate an ignore-carrier device when either (a) the user manually requests it, or (b) when there's an autoconnect static connection we can use.  For case (a), nm_device_can_activate() will be called with a connection, and thus the "if (connection && connection_is_static(connection))" will return TRUE which is what we expect.  For case (b), change has_static_connection() to only return TRUE for autoconnect connections.

Finally, when an ignore-carrier device is deactivated, it should return to the UNAVAILABLE state if it has no carrier, like it was in before it was activated without carrier.  This fixes a problem where the device would return to DISCONNECTED state and allow the user to activate a DHCP/automatic connection on the interface when that shouldn't be  possible due to lack of the carrier.
Comment 1 Dan Winship 2013-10-21 20:13:06 UTC
from email while bugzilla was down...

--- danw ---

> core: fix handling of ignore-carrier for slaves and !autoconnect static connections

Most of this looks good, but:

>     Instead,
>     deal with no-carrier in the only place that actually cares, which is
>     nm_device_can_activate() for the default_unmanaged case.

That's wrong; default_unmanaged is orthogonal to ignore-carrier. It's
the state used by NMDeviceGeneric, to say that the device should remain
unmanaged unless the user explicitly activates a connection that's
explicitly tied to that device.


The other patch looks right.



--- dcbw ---

Hmm.  Ok, so what *used* to happen is this:

if (priv->default_unmanaged) {
    if (!nm_device_is_available (self))
        return FALSE;
}

except that NMDeviceGeneric implements is_available() as always
returning TRUE.  So maybe we can rework and simplify this.  So, in your
opinion, what *should* the semantics be here, and then I'll go implement
that instead of trying to duplicate out the old (possibly broken)
behavior.



--- danw ---

> except that NMDeviceGeneric implements is_available() as always
> returning TRUE.

That seems wrong... it should probably be removed.

> So maybe we can rework and simplify this.  So, in your
> opinion, what *should* the semantics be here, and then I'll go implement
> that instead of trying to duplicate out the old (possibly broken)
> behavior.

I started trying to figure that out before, and it got complicated, so I
decided to see if I could trick you into figuring it out instead. :)

The first two checks in nm_device_can_activate() (manager_managed and
connection_compatible) are correct as-is.

At that point, if the device is DISCONNECTED or above, then we know it
can activate.

Otherwise, there are three remaining cases where we might be able to
activate:

  a) the device is UNMANAGED, is default_unmanaged, and is_available()
  b) the device is UNAVAILABLE, is ignore_carrier, and would be
     is_available() if it had carrier
  c) the combination of the above (ie, default_unmanaged +
     ignore_carrier)

The catch here is "would be is_available() if it had carrier". I think
we need to change is_available() to take a "gboolean except_for_carrier"
arg, so you can ask "would the device be available if it had carrier",
and then if that, and ignore_carrier, then you're good to go.
Comment 2 Thomas Haller 2013-10-22 13:08:29 UTC
I don't have an opinion on comment #1, so, for me the patches look good.

Only question, why does nm_device_can_activate return FALSE, if priv->carrier? -- at least in the "default_unmanaged" or "state < NM_DEVICE_STATE_DISCONNECTED" case.
Comment 3 Dan Winship 2013-10-22 13:24:42 UTC
In general, a device can only be connected if it's managed and available. If a device is managed, then its state should be >= UNAVAILABLE, and if it's managed and available, its state should be >= DISCONNECTED.

So in general, if state < DISCONNECTED, then it's not activatable.

But there are two special cases: if the device is default_unmanaged (ie, NMDeviceGeneric), then it's activatable directly from UNMANAGED, as long as it's available.

Or, if the device is managed but UNAVAILABLE, but the only reason it's UNAVAILABLE is because it doesn't have carrier, and we're ignoring carrier on this device, and there's a connection that doesn't need carrier, then it's activatable.
Comment 4 Thomas Haller 2013-10-22 14:36:31 UTC
(In reply to comment #3)

Shouldn't then the expression be:

         /* If the device is not available, perhaps it can become available
          * and activate a connection if its carrier should be ignored.
          */
-        return (   !nm_device_is_available (self)
+        return (   nm_device_is_available (self)
                 && !priv->carrier
                 && priv->ignore_carrier
                 && has_static_autoconnect_connection (self));




And the check about the carrier I still don't understand. Why not:


          * and activate a connection if its carrier should be ignored.
          */
         return (   !nm_device_is_available (self)
-                && !priv->carrier
-                && priv->ignore_carrier
+                && (priv->carrier || priv->ignore_carrier)
                 && has_static_autoconnect_connection (self));
     } else if (priv->state < NM_DEVICE_STATE_DISCONNECTED) {
-        if (priv->state != NM_DEVICE_STATE_UNAVAILABLE || priv->carrier || !priv->ignore_carrier)
+        if (priv->state != NM_DEVICE_STATE_UNAVAILABLE || !(priv->carrier || priv->ignore_carrier))
             return FALSE;
 
         /* @self is UNAVAILABLE because it doesn't have carrier, but
Comment 5 Dan Winship 2013-10-22 14:56:26 UTC
(In reply to comment #4)
> -        return (   !nm_device_is_available (self)
> +        return (   nm_device_is_available (self)
>                  && !priv->carrier

With dcbw's patch, if the device doesn't have carrier, then nm_device_is_available() will always return FALSE.

>          return (   !nm_device_is_available (self)
> -                && !priv->carrier
> -                && priv->ignore_carrier
> +                && (priv->carrier || priv->ignore_carrier)

Likewise, if is_available() returned FALSE, and we do have carrier, then that means there's some other reason it's unavailable (eg, no firmware), so it can't be activated.


This basically gets back to what I said at the end of comment 1; we need to be able to ask "is the device available, ignoring carrier", so that we can make these checks simpler / more obvious / actually fully-correct.
Comment 6 Thomas Haller 2013-10-23 09:27:30 UTC
(In reply to comment #5)

> Likewise, if is_available() returned FALSE, and we do have carrier, then that
> means there's some other reason it's unavailable (eg, no firmware), so it can't
> be activated.

I understand now. As you point out, from (!is_available() && !priv->carrier) it does not logically follow that is_available() is FALSE because it has no carrier (and it does not follow, that it would be is_available() if it only had the carrier).

I strongly agree with your suggested 'except_for_carrier' parameter. Or call the parameter "ignore_carrier", to use the same name here.
Comment 7 Jiri Klimes 2013-10-25 10:17:32 UTC
> core: remove NMDevice's is_up class function
It should not include libndp update.
Comment 8 Dan Williams 2013-11-05 16:46:45 UTC
Back up for review.
Comment 9 Dan Winship 2013-11-05 17:11:05 UTC
wow... vastly simpler, and seems correct

> 	if (nm_device_check_connection_compatible (self, connection, NULL)) {
>-		/* Let subclasses implement additional checks on the connection */
>-		if (   NM_DEVICE_GET_CLASS (self)->check_connection_available
>-		    && NM_DEVICE_GET_CLASS (self)->check_connection_available (self, connection, NULL)) {
>-
>+		if (NM_DEVICE_GET_CLASS (self)->check_connection_available (self, connection, NULL)) {
>+			/* Let subclasses implement additional checks on the connection */
> 			g_hash_table_insert (NM_DEVICE_GET_PRIVATE (self)->available_connections,

Was the moving of the comment accidental? It makes more sense to put it before the code it's describing, not after.

Or just get rid of it.

Other than that, the commit looks fine.


> core: add device-generated connection to settings

was this commit supposed to be here? Why? What does it do wrt ignore-carrier?
Comment 10 Jiri Klimes 2013-11-06 11:12:04 UTC
> core: rework ignore-carrier device behavior
+	/* Final connection must be available on device */
+	if (!nm_device_connection_is_available (device, connection)) {
+		g_set_error_literal (error, NM_MANAGER_ERROR, NM_MANAGER_ERROR_UNKNOWN_CONNECTION,
+		                     "Connection is not available on the device at this time.");
We could possibly include device and connection names in the message.

src/device/nm-device.h:
gboolean        nm_device_has_carrier    (NMDevice *dev);
gboolean		nm_device_ignore_carrier (NMDevice *dev);

Bad indentation, you may fix that when you are there.

Other than that, the branch looks good.
Comment 11 Dan Williams 2013-11-06 23:54:03 UTC
(In reply to comment #9)
> > core: add device-generated connection to settings
> 
> was this commit supposed to be here? Why? What does it do wrt ignore-carrier?

Updated the commit message; was cherry-picked from pavlix/runtime.  Basically, *any* connection that gets activated must be an instance of NMSettingsConnection.  But generate_connection() wasn't doing that, it was just creating an NMConnection and returning it.  But if we expose a connection externally (which activation does) then that connection needs to be added to NMSettings so clients can see it.
Comment 12 Dan Williams 2013-11-07 02:15:57 UTC
Fixed up for review comments and merged.

Added one fix for master interfaces to ignore carrier when determining connection availability to match pre-ignore-carrier-fix behavior.  The master devices (bridge, bond, team, etc) shouldn't care about carrier when filtering which connections they can activate, since the carrier state is dependent on the slaves which are added, which is only known during activation.