GNOME Bugzilla – Bug 733641
Cannot bring up a bridge via ifup without causing an error ('waiting for slaves before proceeding')
Last modified: 2014-08-19 13:21:37 UTC
Since this commit:
it has not been possible to bring up a bridged connection using ifup without causing an error.
Say you have a bridge device 'br0' and a slaved interface 'eth0'. If you try and bring up the slaved interface first:
# ifup eth0
Error: Connection activation failed: Master connection not found or invalid
so obviously you have to bring up the bridge device first. But if you do that:
# ifup br0
You get this error:
Error: Device 'br0' is waiting for slaves before proceeding with activation.
But that's not really an error, is it? It's more just a status note. Really, ifup/NM *did* bring up the bridge - you can see it there in 'brctl show' and 'nmcli con show', just waiting for slaves to be brought up. And indeed, if you *now* run:
# ifup eth0
then the slave device comes up (no "Master connection not found") error and all is working. But you cannot avoid the error.
This is just an annoyance for interactive work, but it's critical for non-interactive use. I found this while debugging creation of bridges in virt-manager, see https://bugzilla.redhat.com/show_bug.cgi?id=1122729 . virt-manager uses ifup to initialize connections, and it bails out when ifup returns an error, so you actually cannot successfully bring up a bridge from virt-manager.
It's fine for NM to provide this information, but it's not really an error, and shouldn't be treated as one.
I tested with nmcli direct, as well. If you bring up the bridge connection before bringing up the slave connection, you see the same error.
Different from the ifup case, nmcli will allow you to bring up the slave connection before the bridge - it does not throw an equivalent of the "Master connection not found or invalid" error if you do so. But I found that bringing up the slave device and then the bridge in quick succession still showed the error message; it seems it takes a bit of time for everything to come up properly (it's like 20-25 seconds before br0 actually gets an IP address), and nmcli times out during that period.
Actually, if you bring the slave profile up first with 'nmcli' and then the bridge, something rather odd happens. Filed that as https://bugzilla.gnome.org/show_bug.cgi?id=733644 .
When you have autoconnect birdge-slaves, at certain instances they might autoconnect. For example when another connection gets deactivated, NM searches for activatable connections, and might to decide to up the bridge-slave.
Upping a bridge slave does also tear up the master (if it is not yet active).
On the other hand, when you up the master instead, no bridges are activated (even if they are autoconnect). This is on purpose.
So, upping a master does not actually fully activate the bridge, hence we print a warning (and even fail nmcli).
Given the above, I think that upping a master *cannot* ever succeed, and nmcli will always fail after timeout. If that is correct, nmcli should behave differently for master devices and not wait until they are fully activated, but return success once they reach state "connecting (getting IP configuration)".
Give the above, upping a master does not lead up to a fully activated device (ever), hence the behavior should be different.
I guess, that is also relevant to other UI clients (nm-applet). Probably they should show a special icon for such slave-less masters -- not the "connecting" icon.
yeah, that's more or less what I figured. I think it'd be fine (even good) to still print the *message* "is waiting for slaves before proceeding with activation", it's just the fact that it's treated as an error condition that's the problem. As you say it should be treated as success with an informational message.
Yeah, nmcli should probably just exit once a master is set up instead of waiting a long time.
I'll note that the Fedora intiscripts *also* don't automatically bring up slaves when you start a bridge, at least if I'm reading ifup-eth correctly. Which is one reason NM never did that either. But for bonding, initscripts *do* bring up slaves, except for NM I/we assumed back then that intiscripts did the same thing for both bridge/bond and didn't bother to confirm, so NM doesn't bring up any slaves. We've also got a task on the agenda to add an option to bring up slaves when the master is started, since this seems to trip people up.
one case that might be interesting there is the case that's actually fairly typical if you just deploy a system then set up a bridge. Let's say you've just set up the bridge. You now have:
1. the bridge connection
2. the original, 'normal', non-bridge slave connection for the ethernet adapter
3. the new bridge-slave connection for the ethernet adapter
now let's say profile 2 is active on the ethernet adapter, and you now bring up the bridge connection (with nmcli or whatever) - "hey, I just created my bridge, let's bring it up". what should NM do? I guess my instinctive answer is 'as well as bringing up the bridge connection, also take connection 2 down and bring connection 3 up' (i.e. switch the ethernet adapter from being directly connected to being the bridge slave), but I guess there may be reasons you might not want to do that? just trying to consider possibilities...
oh, one more somewhat-related note - another thing I saw when poking into this is that the GNOME Network control panel does not display inactive bridge slave adapter connections. That's https://bugzilla.gnome.org/show_bug.cgi?id=733634 . It may be that GNOME expects NM to bring up the slave connection when you bring up the bridge. As things stand, you can't actually bring up a bridge fully using the GUI.
Created attachment 281865 [details] [review]
cli: let activation of master connection succeed when device state reaches IP_CONFIG
When connecting a master connection, no slave devices will be activated
automatically. The user is supposed to activate them individually.
Hence nmcli should not wait for the connection to be fully activated
because that is not going to happen (unless the user connects a slave
connection from another terminal).
Instead, for master connections behave differently and signal success
once the master device reaches IP_CONFIG state.
This revises behavior introduced by commit
Signed-off-by: Thomas Haller <email@example.com>
(In reply to comment #6)
> one case that might be interesting there is the case that's actually fairly
> typical if you just deploy a system then set up a bridge. Let's say you've just
> set up the bridge. You now have:
> 1. the bridge connection
> 2. the original, 'normal', non-bridge slave connection for the ethernet adapter
> 3. the new bridge-slave connection for the ethernet adapter
> now let's say profile 2 is active on the ethernet adapter, and you now bring up
> the bridge connection (with nmcli or whatever) - "hey, I just created my
> bridge, let's bring it up". what should NM do? I guess my instinctive answer is
> 'as well as bringing up the bridge connection, also take connection 2 down and
> bring connection 3 up' (i.e. switch the ethernet adapter from being directly
> connected to being the bridge slave), but I guess there may be reasons you
> might not want to do that? just trying to consider possibilities...
I think it is important that the behavior of initscripts stays identical regardless of whether the device is controlled by NM.
Regardless of that, I think it is reasonable not to activate any slaves by default. Usually you have more interfaces that you want to bridge/bond, so I think NM should not guess which (of possibly several) to activate. This is especially the case, when already another connection is active on a device. In that case, activating a master device potentially breaks your connectivity by attaching an unwanted device.
Still, the behavior seems indeed useful. We should add this possibility (at least client-side to nmcli), so that we could also behave correctly for initscripts backward compatibility.
So I just ran some tests comparing NM+nmcli, NM+ifup, and network+ifup in a very basic config (ifcfg-bridge , ifcfg-slave , no other connections). This bug is confirmed. Interestingly, in this case, I don't hit the "Master connection not found or invalid" error when trying to bring up the slave first in the NM case. Both 'ifup slave' and 'nmcli con up slave' with NetworkManager in charge work, and bring up the entire bridge (both slave and bridge profiles). So I'll have to try and recreate exactly how I hit that error in the initial testing, as I guess it's a separate bug with a particular config.
FWIW the network.service behaviour is actually, well, rather worse than NM's. So if you want to be compatible, you're going to have break NM a bit. :P But that's outside the scope of this bug.
aha. I figured out the slightly subtle corner case that triggers the "Master connection not found" error and filed it as https://bugzilla.gnome.org/show_bug.cgi?id=733890 .
Fix looks good in a quick test; I did a scratch build with the patch (http://koji.fedoraproject.org/koji/taskinfo?taskID=7205403 ) and did a test as I've been testing so far, and now 'ifup bridge' works as the patch intends (returns successfully very fast, with an informational message that it's waiting for slaves). 'ifup slave' then brings up the slave correctly and the bridge functions.
Patch looks good to me.
And doesn't this patch actually provide the same behavior that the network scripts used to have anyway? ISTR they just started the bridge and exited without waiting for anything else to happen. Then the ports would get up-ed in a second round in network.service.
For bonds though, the initscripts do bring up the slaves when bringing up the master, so NM still isn't the same as the initscripts here.
Created attachment 283862 [details] [review]
[PATCH] the same patch as in comment #8; except a change to apply for current master
The patch in comment #8 looks good to me. I have just changed it to apply for current master, and tested.
(In reply to comment #14)
> Created an attachment (id=283862) [details] [review]
> [PATCH] the same patch as in comment #8; except a change to apply for current
> The patch in comment #8 looks good to me. I have just changed it to apply for
> current master, and tested.
I think this bug is fixed, for the remaining question about differences in initscripts with bonds, I opened bug 735052.