GNOME Bugzilla – Bug 671752
NetworkManager should not apply configuration upon config file modification
Last modified: 2016-02-23 15:37:23 UTC
There needs to be a safe way to switch from NetworkManger to using the old "network" system. Currently, as soon as you say "NM_CONTROLLED=no" in your ifcfg-eth? file and save it, NetworkManager shuts the NIC down IMMEDIATELY. If you happen to be doing this remotely, it means you are officially borked and you cannot regain access to that system until someone physically gets on the console and stops NetworkMangager and starts the network service. The only way around this is to write a script that changes your ifcfg file, shuts down NetworkManager and starts network all in one go. Logically, if I remove control from NetworkManager, it shouldn't be able to shut the NIC down, because it shouldn't have control to do that anymore. Starting the network service before doing this won't do any good, because that directive states that the NIC is controlled by NetworkManager, so network won't touch it. You can't start network after making the change because the NIC drops AS SOON AS YOU SAVE THE FILE. Yeah yeah, it's all great and wonderful for laptops - woot. It's also great for rewriting configurations that are entered manually, something it shouldn't be doing. Manual overrides should not be overridden by automated processes, but that's an entirely different conversation - as is the future of what's going to be included in distros.
NM shouldn't be regenerating the configuration on a whim; only if it's edited through nm-connection-editor or the KDE editor whatever. Basically, if you change the network config file and NM immediately rewrites bits of it, that's a bug and we'll fix that. It should only be rewritten if you edit it through the GUI tools or the NM D-Bus interface. I think it's also probably reasonable not to take an interface down when it becomes unmanaged; NM already leaves ethernet device configuration alone when it quits, and I think it's reasonable to do that when the interface is unmanaged too. However, even if the interface is unmanaged, NM will *still* change the routing table and DNS entries to the default device that it does manage, removing the default route for the now unmanaged device (if it had the default route), for a few reasons. First, the unmanaged device is no longer managed by NM, and therefore any further updates to devices managed by NM will change the routing table. Second, NM doesn't know anything about the unmanaged device anymore and therefore has no idea when that device is up, down, has an IP address, needs routing table updates, etc. /etc/resolv.conf and the default route are shared resources, so if an interface is removed from NM's control, there's no way to arbitrate those shared resources with an interface NM is told to completely ignore. So within those constraints, I think it's fine to leave an interface's IP configuration and non-default routes alone when unmanaging the interface.
I can live with, and would expect, NM to update routes, DNS info and so forth so long as it's still running. But, if it's going to take immediate action on configuration files, the most important thing it should be looking at is whether or not it should be allowed to modify a device any further - and if it's not, it shouldn't do anything that has anything to do with that device, certainly not shut it down. I'd take this a step further too though: If there's only 1 device on the system and all of a sudden NM isn't in control of it any more, it shouldn't go nuke all of your network configs because it doesn't have information about routing, dns and so forth any more. So, before it goes into its routines to rewrite system configs, it should be counting the number of devices under it's control and there should be a big "if( device_count > 0 ){" wrapped around that whole routine. Presently, if NM is still running and you suddenly remove control of your devices from it, you get a nice note about how NM doesn't know what to write here in your /etc/resolv.conf. I'd say the routine that writes that should be rewritten to just leave /etc/resolv.conf alone. It's far more likely it'll keep working with it's previous directives than it working with a note that says "I don't know what to do". I mean, if you're sane, you don't go into any of your system configs and erase everything in there and replace it with a note that says "I didn't know how to configure Sendmail, so I'm just going to put this here", right? You'd either look for more information or you'd leave it alone because you were scared you might break something. On another note, why doesn't NM know about unmanaged devices? IMHO just because NM isn't in control of a device, it should still be able to look at what's going on and operate accordingly with regard to DNS settings, routing and so forth. If this is to be the end-all be-all of network management, it's going to have to do this at some point, isn't it? I mean, we're not writing net configs into different files at different locations - it's still the same ifcfg-* files it's been as long as I can remember. NM should be able to still read these very same files whether it controls the devices or not and plan it's actions accordingly in my view. These files still contain DNS info, routing info and so on and if NM is going to take control of things like /etc/resolv.conf, *especially* when it's not in complete control of all devices on the system, it should be scanning all of these files for things like DNS directives and routes - especially when it doesn't have this info coming from anywhere else.
> But, if it's going to take immediate action on > configuration files, If it's touching the config files, it's a bug (as Dan stated earlier). Is it still the case? > the most important thing it should be looking at is > whether or not it should be allowed to modify a device any further - and if > it's not, it shouldn't do anything that has anything to do with that device, > certainly not shut it down. There are various opinions on this. Feel free to file a new bugreport specifically on this with a decent rationale. > I'd take this a step further too though: If there's only 1 device on the system > and all of a sudden NM isn't in control of it any more, it shouldn't go nuke > all of your network configs because it doesn't have information about routing, > dns and so forth any more. So, before it goes into its routines to rewrite > system configs, it should be counting the number of devices under it's control > and there should be a big "if( device_count > 0 ){" wrapped around that whole > routine. Presently, if NM is still running and you suddenly remove control of > your devices from it, you get a nice note about how NM doesn't know what to > write here in your /etc/resolv.conf. I'd say the routine that writes that > should be rewritten to just leave /etc/resolv.conf alone. It's far more likely > it'll keep working with it's previous directives than it working with a note > that says "I don't know what to do". I mean, if you're sane, you don't go into > any of your system configs and erase everything in there and replace it with a > note that says "I didn't know how to configure Sendmail, so I'm just going to > put this here", right? You'd either look for more information or you'd leave it > alone because you were scared you might break something. I'd suggest spliting this request too. As what NM should do when it's still running but loses control over all interfaces, is another topic. And please dont compare runtime configuration to on-disk configuration. These don't work the same. > On another note, why doesn't NM know about unmanaged devices? IMHO just because > NM isn't in control of a device, it should still be able to look at what's > going on and operate accordingly with regard to DNS settings, routing and so > forth. If this is to be the end-all be-all of network management, it's going to > have to do this at some point, isn't it? I mean, we're not writing net configs > into different files at different locations - it's still the same ifcfg-* files > it's been as long as I can remember. NM should be able to still read these very > same files whether it controls the devices or not and plan it's actions > accordingly in my view. These files still contain DNS info, routing info and so > on and if NM is going to take control of things like /etc/resolv.conf, > *especially* when it's not in complete control of all devices on the system, it > should be scanning all of these files for things like DNS directives and routes > - especially when it doesn't have this info coming from anywhere else. This is just another topic. File it separately if you're still interested. You can link the bug reports from here. Further discussion should be mostly about NM rewriting configs which would be a major problem (keeping importance at that level).
--8<-- > the most important thing it should be looking at is > whether or not it should be allowed to modify a device any further - and if > it's not, it shouldn't do anything that has anything to do with that device, > certainly not shut it down. There are various opinions on this. Feel free to file a new bugreport specifically on this with a decent rationale. --8<-- The WHOLE reason I filed THIS bug report was this issue. Set up a remote machine and edit /etc/sysconfig/network-scripts/ifcfg-eth0 to set NM_CONTROLLED=no and you have no access to that machine again unless you physically go to it and get on the terminal as soon as you save the file. My "opinion" on this is that you should still be able to access your system. Everything I posted deals with this issue and nothing else. Your attempt to separate the issue negates the overbearing problem of lost connectivity due to, not restarting anything or shutting anything down, but changing a line in a configuration file. --8<-- I'd suggest spliting this request too. As what NM should do when it's still running but loses control over all interfaces, is another topic. And please dont compare runtime configuration to on-disk configuration. These don't work the same. --8<-- Isn't this exactly what I'm talking about? I don't want on-disk configuration to instantly have effect on runtime configuration. It should wait until I restart something before nuking my network configurations and not act immediately when I save the file. Saving the file makes NM lose control of the device. Please do tell me what you thought this bug report was about. Did you even read my OP?
> Isn't this exactly what I'm talking about? I don't want on-disk configuration > to instantly have effect on runtime configuration. It was not clear at the beginning what your report was about. This is a feature of NetworkManager. I don't how it get there and what was the rationale but it's part of how NetworkManager currently works. > Please do tell me what you thought this bug report was about. Did you > even read my OP? Assume good will and avoid personal attacks. I'm once again changing the summary to what I believe your bugreport is about. Please confirm. Feel free to file separate bugreports about any other issues you have. @dcbw: What is the rationale for configuration file re-reading magic?
NM actively rewriting its configuration is covered by bug 667874.
I'm more and more convinced that NetworkManager should *not* use inotify to follow network configuration changes for keyfile configuration. The preferred way to set up a network connection at runtime will be 'nmcli' and if someone wants to run a customly saved autoconfiguration, he could just ask NetworkManager to 'reload' (re-read all configuration). Note: When you create a file in /etc/NetworkManager/system-connections and *then* chmod 600 it, it's *not* loaded.
(In reply to comment #7) > I'm more and more convinced that NetworkManager should *not* use inotify to > follow network configuration changes for keyfile configuration. See also https://bugzilla.redhat.com/show_bug.cgi?id=754677
(In reply to comment #0) > There needs to be a safe way to switch from NetworkManger to using the old > "network" system. Currently, as soon as you say "NM_CONTROLLED=no" in your > ifcfg-eth? file and save it, NetworkManager shuts the NIC down IMMEDIATELY. If > you happen to be doing this remotely, it means you are officially borked and > you cannot regain access to that system until someone physically gets on the > console and stops NetworkMangager and starts the network service. The only way > around this is to write a script that changes your ifcfg file, shuts down > NetworkManager and starts network all in one go. (In reply to comment #1) > I think it's also probably reasonable not to take an interface down when it > becomes unmanaged; NM already leaves ethernet device configuration alone when > it quits, and I think it's reasonable to do that when the interface is > unmanaged too. > The following patches tries to solve this, i.e. devices shouldn't be downed when a device becomes NM-unmanaged. Also, when device transitions back from unmanaged to managed, NM tries to assume previous connection. Any thoughts, suggestions, comments are welcome. (In reply to comment #8) > (In reply to comment #7) > > I'm more and more convinced that NetworkManager should *not* use inotify to > > follow network configuration changes for keyfile configuration. > > See also https://bugzilla.redhat.com/show_bug.cgi?id=754677 The problem in that bug is related but not exactly the same. The issue there is that some editors (in certain configuration) deletes file on changes and then creates a new one instead of modifying current file. That leads to removing connection and thus deactivating device.
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > > > I'm more and more convinced that NetworkManager should *not* use inotify to > > > follow network configuration changes for keyfile configuration. > > > > See also https://bugzilla.redhat.com/show_bug.cgi?id=754677 > > The problem in that bug is related but not exactly the same. Sure, my point was just that it's another bug that would be fixed if we stopped using inotify on ifcfg files.
Created attachment 224621 [details] [review] Don't disconnect device when going to unmanaged
Created attachment 224622 [details] [review] Don't remove connections when NM_CONTROLLED=no
Created attachment 224623 [details] [review] Assume connection on unmanaged->managed transition
I'll most probably be removing the 'assume' feature altogether as it only breaks things and we can continue connections without 'assume' magic. Sure it's possible to 'not to perform full disconnection' on device unmanage events. But I still believe it would be much better to remove the autoreload feature altogether. In the common use cases, D-Bus will be used to dynamically change stuff and NM will manage this itself. If anyone really wants to edit configuration files at runtime, he should be ready to ask NM manually for the reload. I'm ready to hear counterarguments, though.
Created attachment 231719 [details] [review] [PATCH] Assume connection on unmanaged->managed transition Rebase the third patch to apply for current master branch.
(In reply to comment #15) > Created an attachment (id=231719) [details] [review] > [PATCH] Assume connection on unmanaged->managed transition > > > Rebase the third patch to apply for current master branch. Looks good. Great work!
For which NetworkManager is the change targeted? I'm asking because disabling automatic re-read and application of configuration files is a potentially surprising change in user experience.
Plus I'm curious about futher actions that may be needed: * Transition to the new configuration: Do we need a special configuration option to explicitly disable the old behavior or can we just switch? Explicit reload support: * We will need to compare all connections and somehow react to the changes * CLI/API for individual connections * CLI/API for all connections * CLI/API for more than just connections * Reactions to OS signals
I guess we probably want an NM.conf option to enable/disable this behavior...
(In reply to comment #19) > I guess we probably want an NM.conf option to enable/disable this behavior... Sure. Then it won't constitute a backwards incompatible change.
Posted a related bug 699843.
danw has a branch for this, danw/reload, bug 701311.
Solved in master, for backwards compatibility reasons the default behavior stays as is: danw/reload adds the "monitor-connection-files" setting to NetworkManager.conf, to let you disable automatic reloads of connection files on modification, and then adds ReloadConnections to the Settings interface, to manually reload them in that case. *** This bug has been marked as a duplicate of bug 701311 ***
Created attachment 321991 [details] [review] [patch] unmanage-devices branch Jirka had an upstream branch "unmanage-devices" related to this bug. The branch no longer applies and the issues seem to resolved differently. I attach the patch here for historical reasons. It applies on commit 1b840127620e27b78311032499746efe60d4fdcd