After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 792252 - NetworkManager only trying the first gateway in redundant remote Gateways list
NetworkManager only trying the first gateway in redundant remote Gateways list
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: VPN: openvpn
1.10.x
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2018-01-05 18:41 UTC by Britt Yazel
Modified: 2018-01-19 16:24 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] utils: fix parsing of empty remote (1.65 KB, patch)
2018-01-08 09:32 UTC, Beniamino Galvani
none Details | Review
utils: fix failure return value for nmovpn_remote_parse() (1.86 KB, patch)
2018-01-08 19:33 UTC, Thomas Haller
none Details | Review
properties: fix parsing remptes when checking gateway entry (1.29 KB, patch)
2018-01-08 20:45 UTC, Thomas Haller
none Details | Review
[PATCH] service: keep reference to plugin while main loop is running (3.14 KB, patch)
2018-01-15 11:01 UTC, Beniamino Galvani
none Details | Review
[PATCH] service,properties: add support for connect-timeout (12.33 KB, patch)
2018-01-16 09:44 UTC, Beniamino Galvani
none Details | Review
[PATCH v2] service,properties: add support for connect-timeout (13.63 KB, patch)
2018-01-16 14:16 UTC, Beniamino Galvani
none Details | Review
[PATCH v3 1/2] service,properties: add support for connect-timeout (13.65 KB, patch)
2018-01-19 10:01 UTC, Beniamino Galvani
none Details | Review
[PATCH v3 2/2] service: lower the connect timeout if there are multiple remotes (1.69 KB, patch)
2018-01-19 10:02 UTC, Beniamino Galvani
none Details | Review

Description Britt Yazel 2018-01-05 18:41:38 UTC
My university provides us with a client.ovpn profile that we use to connect to the VPN. In the file it has a configuration section that looks like this:

setenv FORWARD_COMPATIBLE 1
client
server-poll-timeout 4
nobind
remote vpn.dss.ucdavis.edu 1194 udp
remote vpn.dss.ucdavis.edu 1194 udp
remote vpn.dss.ucdavis.edu 443 tcp
remote vpn.dss.ucdavis.edu 1194 udp
remote vpn.dss.ucdavis.edu 1194 udp
remote vpn.dss.ucdavis.edu 1194 udp
remote vpn.dss.ucdavis.edu 1194 udp
remote vpn.dss.ucdavis.edu 1194 udp
dev tun
dev-type tun
ns-cert-type server
setenv opt tls-version-min 1.0 or-highest
reneg-sec 604800
sndbuf 100000
rcvbuf 100000
auth-user-pass
# NOTE: LZO commands are pushed by the Access Server at connect time.
# NOTE: The below line doesn't disable LZO.
comp-lzo no
verb 3
setenv PUSH_PEER_INFO


As you can see, there are 8 redundant remote Gateways (I don't know why they are all the same, but I have zero control over it). Now, if I evoke the client.ovpn file from the openvpn command line, I can get a connection ONLY through the port 443 tcp gateway. This usually wouldn't bother me, however, if I try to connect through NetworkManager after importing the file it consistently fails.

The Gateway box is populated as follows:
vpn.dss.ucdavis.edu:1194:udp, vpn.dss.ucdavis.edu:1194:udp, vpn.dss.ucdavis.edu:443:tcp, vpn.dss.ucdavis.edu:1194:udp, vpn.dss.ucdavis.edu:1194:udp, vpn.dss.ucdavis.edu:1194:udp, vpn.dss.ucdavis.edu:1194:udp, vpn.dss.ucdavis.edu:1194:udp

Which looks fine, meaning that it's not parsing weird.

On trying to connect and failing, journalctl reports:

>> <warn>  [1515177413.9186] vpn-connection[0x555fd1c24550,c29b6aa3-27f2-41d2-9389-b0450f623fea,"DSS",0]: VPN plugin: failed: connect-failed (1)
>> SIGTERM[hard,] received, process exiting
event_wait : Interrupted system call (code=4)
>> <warn>  [1515177413.9179] vpn-connection[0x555fd1c24550,c29b6aa3-27f2-41d2-9389-b0450f623fea,"DSS",0]: VPN connection: connect timeout exceeded.
>> Connect timer expired, disconnecting.
>> NOTE: UID/GID downgrade will be delayed because of --client, --pull, or --up-delay
>> UDP link remote: [AF_INET]169.237.160.75:1194
>> UDP link local: (not bound)
>> TCP/UDP: Preserving recently used remote address: [AF_INET]169.237.160.75:1194
>> NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
>> WARNING: --ns-cert-type is DEPRECATED.  Use --remote-cert-tls instead.
>> library versions: OpenSSL 1.1.0g  2 Nov 2017, LZO 2.10
>> OpenVPN 2.4.4 x86_64-unknown-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Sep 26 2017


Which seems to show that it is only trying a single of the multiple remote gateways. Further, if I switch the 443 tcp gateway to be the first of the list of 8, I can get a successful connection.

Thus, I think there's an issue with it trying all the connections in series.

I'm running this on ArchLinux btw.
Comment 1 Beniamino Galvani 2018-01-08 09:32:41 UTC
Created attachment 366480 [details] [review]
[PATCH] utils: fix parsing of empty remote
Comment 2 Thomas Haller 2018-01-08 19:33:36 UTC
Created attachment 366509 [details] [review]
utils: fix failure return value for nmovpn_remote_parse()

Fixes: 3c5c7efba75ffd121be3b0ac179c36ca9aa772b0
Comment 3 Thomas Haller 2018-01-08 19:37:51 UTC
(In reply to Beniamino Galvani from comment #1)
> Created attachment 366480 [details] [review] [review]
> [PATCH] utils: fix parsing of empty remote

I think this is not the right fix. If you follow the code, then an empty @str should be handled just right.

How about comment 2?


Btw, I don't understand how this fixes the original report (comment 0).
Comment 4 Thomas Haller 2018-01-08 20:45:29 UTC
Created attachment 366516 [details] [review]
properties: fix parsing remptes when checking gateway entry

Fixes: 3c5c7efba75ffd121be3b0ac179c36ca9aa772b0
Comment 5 Beniamino Galvani 2018-01-09 09:54:08 UTC
(In reply to Thomas Haller from comment #3)
> (In reply to Beniamino Galvani from comment #1)
> > Created attachment 366480 [details] [review] [review] [review]
> > [PATCH] utils: fix parsing of empty remote
> 
> I think this is not the right fix. If you follow the code, then an empty
> @str should be handled just right.
> 
> How about comment 2?

Yes, looks good.

> Btw, I don't understand how this fixes the original report (comment 0).

As explained in the commit message of comment 1, empty strings (resulting from splitting the gateway string at spaces and commas) are not ignored as they should, and thus a NULL remote is passed as second argument to the openvpn binary:

 /usr/sbin/openvpn --remote valid.remote.n1 1194 udp --remote 1194 udp ...


Britt, can you please run this command:

 nmcli general logging level KEEP domains VPN_PLUGIN:DEBUG

then try to connect and paste here the openvpn command line from the system journal found, for example, in this way:

 journalctl -u NetworkManager -b | grep nm-openvpn | grep EXEC
Comment 7 Britt Yazel 2018-01-10 17:14:14 UTC
Here is the output from the command you asked for:

Jan 10 09:05:45 Michelangelo NetworkManager[380]: nm-openvpn[2211] <debug> EXEC: '/usr/sbin/openvpn --remote vpn.dss.ucdavis.edu 1194 udp --remote vpn.dss.ucdavis.edu 1194 udp --remote vpn.dss.ucdavis.edu 443 tcp-client --remote vpn.dss.ucdavis.edu 1194 udp --remote vpn.dss.ucdavis.edu 1194 udp --remote vpn.dss.ucdavis.edu 1194 udp --remote vpn.dss.ucdavis.edu 1194 udp --remote vpn.dss.ucdavis.edu 1194 udp --comp-lzo no --nobind --dev tun --dev-type tun --auth-nocache --tls-auth /home/britt/.cert/nm-openvpn/client-tls-auth.pem 1 --ns-cert-type server --reneg-sec 604800 --verb 5 --syslog nm-openvpn --script-security 2 --up /usr/lib/nm-openvpn-service-openvpn-helper --debug 6 2211 --bus-name org.freedesktop.NetworkManager.openvpn.Connection_2 --tun -- --up-restart --persist-key --persist-tun --management /var/run/NetworkManager/nm-openvpn-c29b6aa3-27f2-41d2-9389-b0450f623fea unix --management-client-user root --management-client-group root --management-query-passwords --auth-retry interact --route-noexec --ifconfig-noexec --client --ca /home/britt/.cert/nm-openvpn/client-ca.pem --cert /home/britt/.cert/nm-openvpn/client-cert.pem --key /home/britt/.cert/nm-openvpn/client-key.pem --auth-user-pass --user nm-openvpn --group nm-openvpn'


The output for just nm-openvpn (not EXEC) is:
https://pastebin.com/ZjKWC9SR
Comment 8 Beniamino Galvani 2018-01-15 11:00:47 UTC
So, it seems the patch above is not related to your problem. This needs more investigation... in the meantime, the other following patch should fix the failed assertion seen in your log.
Comment 9 Beniamino Galvani 2018-01-15 11:01:16 UTC
Created attachment 366832 [details] [review]
[PATCH] service: keep reference to plugin while main loop is running
Comment 10 Thomas Haller 2018-01-15 11:43:05 UTC
(In reply to Beniamino Galvani from comment #9)
> Created attachment 366832 [details] [review] [review]
> [PATCH] service: keep reference to plugin while main loop is running

lgtm. Merged: https://git.gnome.org/browse/network-manager-openvpn/commit/?id=9f45977058f7c8cf1fb0c67636cab43e2c85f228
Comment 11 Beniamino Galvani 2018-01-15 13:57:44 UTC
VPN plugins use a timer to ensure that the connection is established within 60 seconds and so there is not enough time to try other servers in the list. I'm not sure how to fix this, though...
Comment 12 Britt Yazel 2018-01-15 18:29:57 UTC
Weird, when I connect directly in the command line with openvpn it doesn't take 60 seconds to connect, it tries the servers in order within just a couple seconds before finding the one that works
Comment 13 Beniamino Galvani 2018-01-16 08:37:35 UTC
(In reply to Britt Yazel from comment #12)
> Weird, when I connect directly in the command line with openvpn it doesn't
> take 60 seconds to connect, it tries the servers in order within just a
> couple seconds before finding the one that works

Right, it's because of the:

 server-poll-timeout 4

directive, which is not supported by NM at the moment. I think we should implement it.
Comment 14 Beniamino Galvani 2018-01-16 09:44:14 UTC
Created attachment 366877 [details] [review]
[PATCH] service,properties: add support for connect-timeout
Comment 15 Thomas Haller 2018-01-16 13:14:46 UTC
(In reply to Beniamino Galvani from comment #14)
> Created attachment 366877 [details] [review] [review]
> [PATCH] service,properties: add support for connect-timeout

The files 
  properties/tests/conf/server-poll-timeout.ovpn
  properties/tests/conf/connect-timeout.ovpn
are missing


can you add tooltips to the GTK GUI? Especially, like we commonly do mention how this option is called in openvpn manual.


+    <child>
+      <placeholder/>
+    </child>
   </object>

drop this?


rest lgtm
Comment 16 Beniamino Galvani 2018-01-16 14:16:28 UTC
Created attachment 366887 [details] [review]
[PATCH v2] service,properties: add support for connect-timeout

(In reply to Thomas Haller from comment #15)
> The files 
>   properties/tests/conf/server-poll-timeout.ovpn
>   properties/tests/conf/connect-timeout.ovpn
> are missing

> can you add tooltips to the GTK GUI? Especially, like we commonly do mention
> how this option is called in openvpn manual.

> drop this?

All fixed, thanks.
Comment 17 Thomas Haller 2018-01-16 14:34:47 UTC
(In reply to Beniamino Galvani from comment #16)
> Created attachment 366887 [details] [review] [review]
> [PATCH v2] service,properties: add support for connect-timeout

lgtm.



This patch allows the user to get the configuration accordingly. But it doesn't make it work by default, with the 60 seconds timeout. Do you think we should do something about that? Maybe if
 - user didn't configure --connect-timeout
 - and there are multiple remotes,
automatically specify a connect-timeout of ... e.g. 30 seconds?
Comment 18 Beniamino Galvani 2018-01-19 10:01:45 UTC
Created attachment 367066 [details] [review]
[PATCH v3 1/2] service,properties: add support for connect-timeout
Comment 19 Beniamino Galvani 2018-01-19 10:02:18 UTC
Created attachment 367067 [details] [review]
[PATCH v3 2/2] service: lower the connect timeout if there are multiple remotes
Comment 20 Beniamino Galvani 2018-01-19 10:03:09 UTC
(In reply to Thomas Haller from comment #17)
> This patch allows the user to get the configuration accordingly. But it
> doesn't make it work by default, with the 60 seconds timeout. Do you think
> we should do something about that? Maybe if
>  - user didn't configure --connect-timeout
>  - and there are multiple remotes,
> automatically specify a connect-timeout of ... e.g. 30 seconds?

Sounds like a good idea to me, added in patch 2.
Comment 21 Thomas Haller 2018-01-19 10:08:38 UTC
+         g_ptr_array_add (args,
+                         (gpointer) g_strdup_printf ("%u", NM_MAX (60 / num_remotes, 20U)));

indentation? Also, maybe in this case you don't need the (gpointer) cast?


Otherwise, v3 lgtm
Comment 22 Beniamino Galvani 2018-01-19 16:24:33 UTC
(In reply to Thomas Haller from comment #21)
> +         g_ptr_array_add (args,
> +                         (gpointer) g_strdup_printf ("%u", NM_MAX (60 /
> num_remotes, 20U)));
> 
> indentation? Also, maybe in this case you don't need the (gpointer) cast?

Right. Applied to master:

https://git.gnome.org/browse/network-manager-openvpn/commit/?id=665ba502e27c962867b338a255957b6a9b9003ec

https://git.gnome.org/browse/network-manager-openvpn/commit/?id=b7195649ffdb66c8b9f4dbae6b868d797dd29406