GNOME Bugzilla – Bug 556134
VPN disconnects on openvpn soft-restarts
Last modified: 2014-04-23 12:24:34 UTC
OpenVPN soft-restarts itself by sending itself a SIGUSR1 on several cases. The most common case is when using a udp port, 120 seconds of inactivity will by default trigger a soft-restart. Since n-m-openvpn calls OpenVPN with the --up-restart parameter, OpenVPN calls back the nm-openvpn-service-openvpn-helper when it is restarted. In that case, nm-openvpn-service-openvpn-helper fails with exit code 1, resulting in bringing down the VPN connection : nm-openvpn[5682]: [server] Inactivity timeout (--ping-restart), restarting nm-openvpn[5682]: SIGUSR1[soft,ping-restart] received, process restarting [...] nm-openvpn[5682]: UDPv4 link remote: 192.168.122.1:1194 nm-openvpn[5682]: [server] Peer Connection Initiated with 192.168.122.1:1194 nm-openvpn[5682]: Preserving previous TUN/TAP instance: tun0 nm-openvpn[5682]: /usr/lib/network-manager-openvpn/nm-openvpn-service-openvpn-helper tun0 1500 1541 10.8.0.6 10.8.0.5 restart NetworkManager: <info> VPN plugin failed: 2 nm-openvpn[5682]: script failed: external program exited with error status: 1 nm-openvpn[5682]: Exiting NetworkManager: <info> VPN plugin failed: 1 NetworkManager: <info> VPN plugin state changed: 6 NetworkManager: <info> VPN plugin state change reason: 0 NetworkManager: <WARN> connection_state_changed(): Could not process the request because no VPN connection was active. NetworkManager: nm_system_device_flush_ip4_routes_with_iface: assertion `iface_idx >= 0' failed NetworkManager: nm_system_device_flush_ip4_addresses_with_iface: assertion `iface_idx >= 0' failed NetworkManager: <info> Policy set (eth0) as default device for routing and DNS. nm-dispatcher.action: Script '/etc/NetworkManager/dispatcher.d/01ifupdown' exited with error status 1. NetworkManager: <debug> [1223468236.827482] ensure_killed(): waiting for vpn service pid 5674 to exit NetworkManager: <debug> [1223468236.828433] ensure_killed(): vpn service pid 5674 cleaned up The reason for this is that OpenVPN doesn't set the same environment variables when calling the --up script at init and at restart. In particular, it doesn't set ifconfig_remote, route_gateway_1, ifconfig_local, route_network_1, route_net_gateway, route_vpn_gateway and route_netmask_1. The net result is that nm-openvpn-service-openvpn-helper fails to get ifconfig_local and calls helper_failed, resulting in the connection breakage.
OpenVPN calls the --up script with a few parameters so it is possible to get the ifconfig_local and ifconfig_remote from argv[], however we'd still miss some information. A workaround is to disable --up-restart, though this parameter was certainly there to pick up some connection info change. For users, a workaround is to use TCP rather than UDP to avoid the inactivity soft-restart. That will not save you from other soft-restarts (decryption errors, network trouble, etc..) but that should cover the most common case.
We should probably just handle up-restart by caching the IP4 config in the service itself and then updating the cached IP4 config through a new D-Bus method call from the helper, and then in NM itself do the equivalent of a DHCP renew sort of thing to re-set the IP address and routes.
Ping... Any progress on this bug? Two years on, and I still have this problem. My network connection can stall momentarily every few hours, so the standard openvpn config of 'keepalive 10 120' results in many soft restarts -- and therefore makes network manager drop my connection frequently and makes network manager unusable for me. This bug seems rather urgent to me: it makes makes 'keepalive' do the opposite!
3 more years on... This seems like a fairly critical bug. Does no one use openvpn with NetworkManager? In the past I've been using openvpn standalone, but with the great improvements to NetworkManager I decided to try it now. Unfortunately, I ran into this. There's a workaround at: https://bugs.launchpad.net/ubuntu/+source/network-manager-openvpn/+bug/280160/comments/19 I tried it out and it seems to work. I'll just have to watch out for updates that would wipe that file out.
(In reply to comment #2) > We should probably just handle up-restart by caching the IP4 config in the > service itself and then updating the cached IP4 config through a new D-Bus > method call from the helper, and then in NM itself do the equivalent of a DHCP > renew sort of thing to re-set the IP address and routes. Does an up-restart indicate any change that we actually care about? It seems to me like we could just do: if (!strcmp (argv[argc - 1], "restart")) exit (0); (In reply to comment #4) > 3 more years on... This seems like a fairly critical bug. Does no one use > openvpn with NetworkManager? OpenVPN servers can be configured in billions of different ways, so any bug that only affects certain configuration modes will only affect a tiny percentage of users...
Yes, I'm sure as well this only affects a tiny percentage of users because, after having had this happen to me twice in two days after only two days of even using the NetworkManager openvpn plugin, I'm about to stop being a user. So whoever continues being a user is probably not being affected. It doesn't mean that the bug didn't affect a large group of potential users, however. Dropping the connection because it invoked a helper with an argument it doesn't support is the kind of bug that is easily reproducible, and your suggestion to fix it sounds reasonable, no?
I think we should just implement the workaround from https://bugs.launchpad.net/ubuntu/+source/network-manager-openvpn/+bug/280160/comments/19 in the helper executable itself. Please review the branch th/bgo556134_up_restart
> helper: improve parsing of command line arguments You could use GOption for the plugin-specific arguments; it automatically stops parsing at "--". >+ /* shift the arguments to the right leaving only those provided by openvpn >+ * (including the terminating NULL at argv[argc]). >+ */ Could also just do: argv[shift] = argv[0]; argv += shift; argc -= shift; > helper: log the command line argument for "--helper-debug" argumentS > helper: pass device type to helper script as command line argument Seems like it should be --tun / --tap rather than "tun" / "tap" > helper: gracefully handle missing environment variables in --up-restart invocations Is there any reason to not just get those values from the command line arguments in all cases? It would make things simpler.
(In reply to comment #8) > > helper: improve parsing of command line arguments > > You could use GOption for the plugin-specific arguments; it automatically stops > parsing at "--". Hm, I couldn't come up with something that is actually simpler. Also, --tap/--tun toggle each other, I don't know how to do that with GOption. > >+ /* shift the arguments to the right leaving only those provided by openvpn > >+ * (including the terminating NULL at argv[argc]). > >+ */ > > Could also just do: > > argv[shift] = argv[0]; > argv += shift; > argc -= shift; Neat. > > helper: log the command line argument for "--helper-debug" > > argumentS Done > > helper: pass device type to helper script as command line argument > > Seems like it should be --tun / --tap rather than "tun" / "tap" Done > > helper: gracefully handle missing environment variables in --up-restart invocations > > Is there any reason to not just get those values from the command line > arguments in all cases? It would make things simpler. I did not want to change the behaviour at all (except for the case that is already broken). I think as it's now, is reasonably simple. Added a commit "set G_LOG_DOMAIN to nm-openvpn" Now it looks like: > nm-openvpn-Message: command line: "./src/nm-openvpn-service-openvpn-helper" "--helper-debug"
this all looks good now
The openvpn side stuff looks fine to me. What NM-side improvements do we need here to detect restart and handle it? We'd at least need to potentially update IP addressing/routing if anything changed, right?
Merged to master: https://git.gnome.org/browse/network-manager-openvpn/commit/?id=cefb89ccfb1732391e30512ed524a398320d3d56 I tried to test it, and it seems to work correctly. However, I was not able to reproduce to original error, so I am not sure, that it really got fixed. Closing this bug for now, if the problem persists, please reopen.