After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 725261 - [review] dcbw/carrier-fixes: fix carrier handling for master devices and IPv4 method matching
[review] dcbw/carrier-fixes: fix carrier handling for master devices and IPv4...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
unspecified
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2014-02-26 19:07 UTC by Dan Williams
Modified: 2014-03-04 22:02 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dan Williams 2014-02-26 19:07:56 UTC
Recently we broke carrier handling during IP setup for master devices when ignore-carrier is set, such that DHCP would be attempted even if the device had no carrier, which is guaranteed to fail.  Only static configuration should be allowed to proceed when the master has no carrier.

Second, if DHCP or IPv4 addressing fails (due to the above problem) and thus the interface has no IPv4 address, and NetworkManager is restarted, the generated connection would read the IPv4 method as "disabled" because the interface had no IPv4 addresses.  This caused match failures to existing connections where the IPv4 method was 'auto', which was in fact the pre-restart method.  So if the interface has no carrier, allow matching between IPv4 disabled and auto methods.
Comment 1 Dan Winship 2014-02-27 16:20:21 UTC
> core: match IPv4 'disabled' method to 'auto' when device has no link

>    If IPv4 configuration did not succeed or the device has no IPv4 addresses
>    when NM restarts, it will detect the existing device configuration as
>    'disabled'.

Hm... should we worry about the "did not succeed" case too?

Also, if ignore-carrier is false then we should allow matching a static connection against disabled as well, right?

> nm_utils_match_connection (GSList *connections,
>                            NMConnection *original,
>+                           gboolean device_has_carrier,

seems like it would be better to just pass the NMDevice instead. (Eg, then we could check nm_device_ignore_carrier() too.)
Comment 2 Dan Williams 2014-02-27 19:58:57 UTC
(In reply to comment #1)
> > core: match IPv4 'disabled' method to 'auto' when device has no link
> 
> >    If IPv4 configuration did not succeed or the device has no IPv4 addresses
> >    when NM restarts, it will detect the existing device configuration as
> >    'disabled'.
> 
> Hm... should we worry about the "did not succeed" case too?

Not quite sure what you mean here...

> Also, if ignore-carrier is false then we should allow matching a static
> connection against disabled as well, right?

I'm not sure we should, actually; if it's a static configuration then the link/carrier doesn't matter, because the device will be configured with the IP.  The on restart we'll see it's a static connection and match it just fine.

> > nm_utils_match_connection (GSList *connections,
> >                            NMConnection *original,
> >+                           gboolean device_has_carrier,
> 
> seems like it would be better to just pass the NMDevice instead. (Eg, then we
> could check nm_device_ignore_carrier() too.)

Yeah, but that's a lot harder to unit test since we'd have to make a fake NMDevice object and pass that in.
Comment 3 Dan Winship 2014-02-27 20:26:13 UTC
(In reply to comment #2)
> > Hm... should we worry about the "did not succeed" case too?
> 
> Not quite sure what you mean here...

Like, if you have a static IPv4 config with can-fail=TRUE, and there's an error applying it, and then you restart NM, it will capture the device state as being METHOD_DISABLED, so it won't match the METHOD_MANUAL connection. Should we deal with that?

> > Also, if ignore-carrier is false then we should allow matching a static
> > connection against disabled as well, right?
> 
> I'm not sure we should, actually; if it's a static configuration then the
> link/carrier doesn't matter

Not if ignore-carrier is FALSE.

Er, wait, your last patch on this branch breaks that and makes ignore-carrier always be in effect for activating static connections...

If ignore-carrier is FALSE, then you are not supposed to be able to activate any connection on a device that doesn't have carrier.
Comment 4 Dan Williams 2014-02-28 18:06:00 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > > Hm... should we worry about the "did not succeed" case too?
> > 
> > Not quite sure what you mean here...
> 
> Like, if you have a static IPv4 config with can-fail=TRUE, and there's an error
> applying it, and then you restart NM, it will capture the device state as being
> METHOD_DISABLED, so it won't match the METHOD_MANUAL connection. Should we deal
> with that?

I'm inclined to say no, because if some static IPv4 configuration fails then we either have a programming bug (EINVAL when adding routes/addresses) or NM got killed during setup, right?  One we can/should fix, the other I'm inclined to ignore...

What would be really nice is to have some interface attribute into which we could dump the connection UUID that would persist, and then read it back when starting NM and if anything is materially different, create a new connection from the interface.  Essentially, invert our current handling and assume the given connection is correct unless it's not.  Maybe I'm dreaming.

> > > Also, if ignore-carrier is false then we should allow matching a static
> > > connection against disabled as well, right?
> > 
> > I'm not sure we should, actually; if it's a static configuration then the
> > link/carrier doesn't matter
> 
> Not if ignore-carrier is FALSE.

So to be sure I understand; you are advocating that if a device has no carrier, and that ignore-carrier for that device is false, that NM should match the generated 'disabled' connection to an existing 'manual' connection?

If ignore-carrier=FALSE for a hardware device, then if it has no IPv4 address on NM restart, it will never have gotten to IP configuration state and thus could not have previously been activated, and we don't want to match disabled=manual.

If ignore-carrier=FALSE for a software device, and it has no IPv4 address on NM restart, then it also would not have gotten to IP configuration state.  In this case however, it might have been waiting for slaves before applying the static IP configuration, so yes, in this case we do wish to match disabled=manual.

So here you are 100% correct for software devices.  I guess the question is whether we want to treat software/hardware devices the same for consistency?

> Er, wait, your last patch on this branch breaks that and makes ignore-carrier
> always be in effect for activating static connections...

I'll assume we're talking about "core: postpone non-static master IP configuration until carrier".

That patch only changes behavior for DHCP connections, and the only thing it does is remove ignore_carrier from the equation (thus loosening the if() condition), because for DHCP on master interfaces we always want to respect carrier.  I don't see how the last patch changes static config at all?

> If ignore-carrier is FALSE, then you are not supposed to be able to activate
> any connection on a device that doesn't have carrier.

That's true for hardware devices (ethernet) but not true for software devices, because bridge/bond/etc don't have a carrier until they have slaves, and we start activating the master before any slaves show up.  Master devices can begin activation, but they wait in IP config until they have a carrier (which might involve waiting for slaves).  Ethernet obviously respects ignore_carrier completely.

So to summarize, when ignore-carrier=FALSE, for hardware devices we block any activation without carrier.  But for software devices we allow activation but block IP configuration until there is a carrier.
Comment 5 Dan Winship 2014-02-28 21:01:01 UTC
(In reply to comment #4)
> > Er, wait, your last patch on this branch breaks that and makes ignore-carrier
> > always be in effect for activating static connections...
> 
> I'll assume we're talking about "core: postpone non-static master IP
> configuration until carrier".
> 
> That patch only changes behavior for DHCP connections

Ah, true.

I was confused because we're barely checking ignore_carrier anywhere any more...

In fact, it looks like at activation time, the sole effect of ignore_carrier is that it allows a device to transition from UNAVAILABLE to DISCONNECTED regardless of whether it has carrier.

However, check_connection_available() will then consider any non-static connections to be unavailable if there is no carrier. Except that bond/bridge/team override it to claim they're available regardless, and so they need to check carrier again during the activation process.

OK, so this seems correct I guess, though I wonder if we could clean it up somehow. And maybe we should just get rid of the UNAVAILABLE state entirely, and move all of its logic into check_connection_available()...

Also, while investigating, I noticed that nm_device_can_interrupt_activation() is now unused, and can just go away.
Comment 6 Dan Winship 2014-02-28 21:04:42 UTC
(In reply to comment #4)
> If ignore-carrier=FALSE for a hardware device, then if it has no IPv4 address
> on NM restart, it will never have gotten to IP configuration state and thus
> could not have previously been activated

Right. I'd confused myself.
Comment 7 Thomas Haller 2014-03-03 13:17:02 UTC
This branch looks good to me (but I didn't test it)
Comment 8 Dan Williams 2014-03-03 22:19:32 UTC
(In reply to comment #5)
> OK, so this seems correct I guess, though I wonder if we could clean it up
> somehow. And maybe we should just get rid of the UNAVAILABLE state entirely,
> and move all of its logic into check_connection_available()...

If we got rid of UNAVAILABLE, then what used to be UNAVAILABLE state would be (DISCONNECTED && zero available connections).  But that doesn't let a client differentiate between (a) could activate with a new connection, eg hidden WiFi network or unconfigured WWAN connection, or (b) actually unusable and nothing can be done until some physical thing is fixed (carrier, missing SIM, rfkill).

We could make 'unavailable' a device property instead of a state, but that seems like trading one thing for another without much benefit.  But perhaps that would clean up internal code, which would be a benefit?  Worth investigating.
Comment 9 Dan Winship 2014-03-04 15:29:17 UTC
(In reply to comment #8)
> We could make 'unavailable' a device property instead of a state, but that
> seems like trading one thing for another without much benefit.  But perhaps
> that would clean up internal code, which would be a benefit?  Worth
> investigating.

It might be nice to have an "unavailable" bitfield, which can independently indicate NO_FIRMWARE, NO_CARRIER, NO_SIM, HARDWARE_DISABLED, SOFTWARE_DISABLED, PLUGIN_NOT_INSTALLED, SUPPORT_SOFTWARE_NOT_INSTALLED, UNSUPPORTED_INTERFACE_TYPE, ...

We could also mix "managed" into that: "unmanaged" would just be "unavailable" due to ADMIN_DISABLED (and then we could distinguish a device that was unmanaged-and-disconnected from a device that was unmanaged-and-connected).
Comment 10 Dan Williams 2014-03-04 22:02:28 UTC
Merged the branch.

We could have the unavailable bitfield even without getting rid of the UNAVAILABLE state, seems like a win.  Mind filing a bug for that?