GNOME Bugzilla – Bug 725261
[review] dcbw/carrier-fixes: fix carrier handling for master devices and IPv4 method matching
Last modified: 2014-03-04 22:02:28 UTC
Recently we broke carrier handling during IP setup for master devices when ignore-carrier is set, such that DHCP would be attempted even if the device had no carrier, which is guaranteed to fail. Only static configuration should be allowed to proceed when the master has no carrier. Second, if DHCP or IPv4 addressing fails (due to the above problem) and thus the interface has no IPv4 address, and NetworkManager is restarted, the generated connection would read the IPv4 method as "disabled" because the interface had no IPv4 addresses. This caused match failures to existing connections where the IPv4 method was 'auto', which was in fact the pre-restart method. So if the interface has no carrier, allow matching between IPv4 disabled and auto methods.
> core: match IPv4 'disabled' method to 'auto' when device has no link > If IPv4 configuration did not succeed or the device has no IPv4 addresses > when NM restarts, it will detect the existing device configuration as > 'disabled'. Hm... should we worry about the "did not succeed" case too? Also, if ignore-carrier is false then we should allow matching a static connection against disabled as well, right? > nm_utils_match_connection (GSList *connections, > NMConnection *original, >+ gboolean device_has_carrier, seems like it would be better to just pass the NMDevice instead. (Eg, then we could check nm_device_ignore_carrier() too.)
(In reply to comment #1) > > core: match IPv4 'disabled' method to 'auto' when device has no link > > > If IPv4 configuration did not succeed or the device has no IPv4 addresses > > when NM restarts, it will detect the existing device configuration as > > 'disabled'. > > Hm... should we worry about the "did not succeed" case too? Not quite sure what you mean here... > Also, if ignore-carrier is false then we should allow matching a static > connection against disabled as well, right? I'm not sure we should, actually; if it's a static configuration then the link/carrier doesn't matter, because the device will be configured with the IP. The on restart we'll see it's a static connection and match it just fine. > > nm_utils_match_connection (GSList *connections, > > NMConnection *original, > >+ gboolean device_has_carrier, > > seems like it would be better to just pass the NMDevice instead. (Eg, then we > could check nm_device_ignore_carrier() too.) Yeah, but that's a lot harder to unit test since we'd have to make a fake NMDevice object and pass that in.
(In reply to comment #2) > > Hm... should we worry about the "did not succeed" case too? > > Not quite sure what you mean here... Like, if you have a static IPv4 config with can-fail=TRUE, and there's an error applying it, and then you restart NM, it will capture the device state as being METHOD_DISABLED, so it won't match the METHOD_MANUAL connection. Should we deal with that? > > Also, if ignore-carrier is false then we should allow matching a static > > connection against disabled as well, right? > > I'm not sure we should, actually; if it's a static configuration then the > link/carrier doesn't matter Not if ignore-carrier is FALSE. Er, wait, your last patch on this branch breaks that and makes ignore-carrier always be in effect for activating static connections... If ignore-carrier is FALSE, then you are not supposed to be able to activate any connection on a device that doesn't have carrier.
(In reply to comment #3) > (In reply to comment #2) > > > Hm... should we worry about the "did not succeed" case too? > > > > Not quite sure what you mean here... > > Like, if you have a static IPv4 config with can-fail=TRUE, and there's an error > applying it, and then you restart NM, it will capture the device state as being > METHOD_DISABLED, so it won't match the METHOD_MANUAL connection. Should we deal > with that? I'm inclined to say no, because if some static IPv4 configuration fails then we either have a programming bug (EINVAL when adding routes/addresses) or NM got killed during setup, right? One we can/should fix, the other I'm inclined to ignore... What would be really nice is to have some interface attribute into which we could dump the connection UUID that would persist, and then read it back when starting NM and if anything is materially different, create a new connection from the interface. Essentially, invert our current handling and assume the given connection is correct unless it's not. Maybe I'm dreaming. > > > Also, if ignore-carrier is false then we should allow matching a static > > > connection against disabled as well, right? > > > > I'm not sure we should, actually; if it's a static configuration then the > > link/carrier doesn't matter > > Not if ignore-carrier is FALSE. So to be sure I understand; you are advocating that if a device has no carrier, and that ignore-carrier for that device is false, that NM should match the generated 'disabled' connection to an existing 'manual' connection? If ignore-carrier=FALSE for a hardware device, then if it has no IPv4 address on NM restart, it will never have gotten to IP configuration state and thus could not have previously been activated, and we don't want to match disabled=manual. If ignore-carrier=FALSE for a software device, and it has no IPv4 address on NM restart, then it also would not have gotten to IP configuration state. In this case however, it might have been waiting for slaves before applying the static IP configuration, so yes, in this case we do wish to match disabled=manual. So here you are 100% correct for software devices. I guess the question is whether we want to treat software/hardware devices the same for consistency? > Er, wait, your last patch on this branch breaks that and makes ignore-carrier > always be in effect for activating static connections... I'll assume we're talking about "core: postpone non-static master IP configuration until carrier". That patch only changes behavior for DHCP connections, and the only thing it does is remove ignore_carrier from the equation (thus loosening the if() condition), because for DHCP on master interfaces we always want to respect carrier. I don't see how the last patch changes static config at all? > If ignore-carrier is FALSE, then you are not supposed to be able to activate > any connection on a device that doesn't have carrier. That's true for hardware devices (ethernet) but not true for software devices, because bridge/bond/etc don't have a carrier until they have slaves, and we start activating the master before any slaves show up. Master devices can begin activation, but they wait in IP config until they have a carrier (which might involve waiting for slaves). Ethernet obviously respects ignore_carrier completely. So to summarize, when ignore-carrier=FALSE, for hardware devices we block any activation without carrier. But for software devices we allow activation but block IP configuration until there is a carrier.
(In reply to comment #4) > > Er, wait, your last patch on this branch breaks that and makes ignore-carrier > > always be in effect for activating static connections... > > I'll assume we're talking about "core: postpone non-static master IP > configuration until carrier". > > That patch only changes behavior for DHCP connections Ah, true. I was confused because we're barely checking ignore_carrier anywhere any more... In fact, it looks like at activation time, the sole effect of ignore_carrier is that it allows a device to transition from UNAVAILABLE to DISCONNECTED regardless of whether it has carrier. However, check_connection_available() will then consider any non-static connections to be unavailable if there is no carrier. Except that bond/bridge/team override it to claim they're available regardless, and so they need to check carrier again during the activation process. OK, so this seems correct I guess, though I wonder if we could clean it up somehow. And maybe we should just get rid of the UNAVAILABLE state entirely, and move all of its logic into check_connection_available()... Also, while investigating, I noticed that nm_device_can_interrupt_activation() is now unused, and can just go away.
(In reply to comment #4) > If ignore-carrier=FALSE for a hardware device, then if it has no IPv4 address > on NM restart, it will never have gotten to IP configuration state and thus > could not have previously been activated Right. I'd confused myself.
This branch looks good to me (but I didn't test it)
(In reply to comment #5) > OK, so this seems correct I guess, though I wonder if we could clean it up > somehow. And maybe we should just get rid of the UNAVAILABLE state entirely, > and move all of its logic into check_connection_available()... If we got rid of UNAVAILABLE, then what used to be UNAVAILABLE state would be (DISCONNECTED && zero available connections). But that doesn't let a client differentiate between (a) could activate with a new connection, eg hidden WiFi network or unconfigured WWAN connection, or (b) actually unusable and nothing can be done until some physical thing is fixed (carrier, missing SIM, rfkill). We could make 'unavailable' a device property instead of a state, but that seems like trading one thing for another without much benefit. But perhaps that would clean up internal code, which would be a benefit? Worth investigating.
(In reply to comment #8) > We could make 'unavailable' a device property instead of a state, but that > seems like trading one thing for another without much benefit. But perhaps > that would clean up internal code, which would be a benefit? Worth > investigating. It might be nice to have an "unavailable" bitfield, which can independently indicate NO_FIRMWARE, NO_CARRIER, NO_SIM, HARDWARE_DISABLED, SOFTWARE_DISABLED, PLUGIN_NOT_INSTALLED, SUPPORT_SOFTWARE_NOT_INSTALLED, UNSUPPORTED_INTERFACE_TYPE, ... We could also mix "managed" into that: "unmanaged" would just be "unavailable" due to ADMIN_DISABLED (and then we could distinguish a device that was unmanaged-and-disconnected from a device that was unmanaged-and-connected).
Merged the branch. We could have the unavailable bitfield even without getting rid of the UNAVAILABLE state, seems like a win. Mind filing a bug for that?