GNOME Bugzilla – Bug 765153
dns=dnsmasq doesn't play well with multiple connections
Last modified: 2016-05-18 09:22:18 UTC
NM will configure dnsmasq with all available upstream servers from all active connections, however it will do nothing to ensure that connection X is being used to transport the DNS requests sent to DNS server X. That is, if connection Y is more preferred, DNS server X will be contacted through connection Y. In that case, DNS server X might refuse the request, because it only allows requests arriving from a certain whitelisted networks. Dnsmasq forwards an incoming request to all configured DNS servers simultaneously, and will respond back with the first answer that comes back. Thus, if upstream server's refusal is the first response, the stub resolver will also receive a refusal, and the user will see a "hostname not found" error or something similar. In the event of cold caches, it is very likely that the refusal is the first answer that comes back, because it does not require any further processing. A server that's willing to actually process the request will by comparsion need to contact the authoritative servers for the domain in order to do so, which takes time. Below you'll find a concrete example of this happening. I have two active connections, wifi and wwan. /var/run/NetworkManager/dnsmasq.conf contains the following: server=10.0.79.14 server=fe80::120d:7fff:fe6d:a593%wlo1 server=2001:4600:4:fff::52 The first two servers are learned from the wifi, while the third is learned from the wwan. The connection metrics are at their defaults, so the wifi is preferred for outgoing packets. The two connections are served by different ISPs, and crucially, 2001:4600:4:fff::52 is not usable for customers of the wwan ISP. Reproducing the problem: $ host -t CNAME foobar.github.io. Host foobar.github.io not found: 5(REFUSED) If you look at a packet capture, this is what happens: 111 49.897601 127.0.0.1 -> 127.0.0.1 DNS 78 Standard query 0xc81a CNAME foobar.github.io The above packet is /usr/bin/host's request to the locally running dnsmasq process. 112 49.897745 2a02:fe0:c420:1c97:95e4:8bd5:c947:9f40 -> 2001:4600:4:fff::52 DNS 98 Standard query 0x340b CNAME foobar.github.io 113 49.897794 fe80::eed8:dfa6:862e:3e32 -> fe80::120d:7fff:fe6d:a593 DNS 98 Standard query 0x340b CNAME foobar.github.io 114 49.897841 10.0.79.155 -> 10.0.79.14 DNS 78 Standard query 0x340b CNAME foobar.github.io The dnsmasq process forwards the request to all its configured upstream servers more-or-less simultaneously. It is important to note that all of these packets went out the wifi interface; the wwan one was not used at all. 115 49.906245 2001:4600:4:fff::52 -> 2a02:fe0:c420:1c97:95e4:8bd5:c947:9f40 DNS 98 Standard query response 0x340b Refused The response from the wwan ISP is the first one, and it's a refusal - because it originated from a source IP address belonging to the wifi ISP. 116 49.906421 127.0.0.1 -> 127.0.0.1 DNS 78 Standard query response 0xc81a Refused Since dnsmsaq now has an answer, it returns that answer to the "host" command, and the failure was returned back to the user. 117 49.931072 10.0.79.14 -> 10.0.79.155 DNS 113 Standard query response 0x340b CNAME github.map.fastly.net 118 49.934423 fe80::120d:7fff:fe6d:a593 -> fe80::eed8:dfa6:862e:3e32 DNS 133 Standard query response 0x340b CNAME github.map.fastly.net Not long after, the wifi ISP's DNS servers finished resolving the request (actually it was handled by forwarders in the wireless home gateway, but that's not really relevant here). However it is too late to "fix" the failure that was already returned to the caller. If however I retry the request a few seconds later, it succeeds: $ host -t CNAME foobar.github.io. foobar.github.io is an alias for github.map.fastly.net. 119 52.911891 127.0.0.1 -> 127.0.0.1 DNS 78 Standard query 0x7c68 CNAME foobar.github.io Request from /usr/bin/host to the local dnsmasq above. 120 52.912013 2a02:fe0:c420:1c97:95e4:8bd5:c947:9f40 -> 2001:4600:4:fff::52 DNS 98 Standard query 0x6e72 CNAME foobar.github.io 121 52.912063 fe80::eed8:dfa6:862e:3e32 -> fe80::120d:7fff:fe6d:a593 DNS 98 Standard query 0x6e72 CNAME foobar.github.io 122 52.912105 10.0.79.155 -> 10.0.79.14 DNS 78 Standard query 0x6e72 CNAME foobar.github.io dnsmasq again forwards the request to all three upstream servers simultaneously: 123 52.921235 10.0.79.14 -> 10.0.79.155 DNS 113 Standard query response 0x6e72 CNAME github.map.fastly.net This time, the first reponse to come back is from the wifi ISP. It's marginally faster than the wwan ISP, presumably because the previous request seeded its cache. 124 52.921532 2001:4600:4:fff::52 -> 2a02:fe0:c420:1c97:95e4:8bd5:c947:9f40 DNS 98 Standard query response 0x6e72 Refused Here's the refusal from the wwan ISP, just 297 µsec too late to cause any problems. 125 52.921532 127.0.0.1 -> 127.0.0.1 DNS 113 Standard query response 0x7c68 CNAME github.map.fastly.net And here we can see dnsmasq returning the expected response to /usr/bin/host. 126 52.923477 fe80::120d:7fff:fe6d:a593 -> fe80::eed8:dfa6:862e:3e32 DNS 133 Standard query response 0x6e72 CNAME github.map.fastly.net And this is the last response we're expecting (not that it matters at this point). I've done some testing, and one way to fix the problem is to suffix "@<interface>" to the dnsmasq.conf server=foo statements. In my case, this means changing /var/run/NetworkManager/dnsmasq.conf to contain the following: server=10.0.79.14@wlo1 # not really necessary since wlo1 is already preferred, but doesn't hurt and seems more consistent server=fe80::120d:7fff:fe6d:a593%wlo1 # ipv6 link-locals already provide interface so these need no change server=2001:4600:4:fff::52@wwp0s26u1u5i6 # ensures wwan interface, and crucially wwan source ip address, is used Other approaches include setting up explicit host routes to the DNS servers out their appropriate interface, or to only configure dnsmasq with the DNS servers belonging to the primary selected connection (for each address family).
(In reply to Tore Anderson from comment #0) > server=10.0.79.14@wlo1 # not really necessary since wlo1 is already > preferred, but doesn't hurt and seems more consistent > server=fe80::120d:7fff:fe6d:a593%wlo1 # ipv6 link-locals already provide > interface so these need no change > server=2001:4600:4:fff::52@wwp0s26u1u5i6 # ensures wwan interface, and > crucially wwan source ip address, is used How does this work in the face of https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=c328cf52f2b219feb5ae03fd40175f6b6ca73975 ? We should do this, but it only fixes the problem for dnsmasq. > Other approaches include setting up explicit host routes to the DNS servers > out their appropriate interface, or to only configure dnsmasq with the DNS > servers belonging to the primary selected connection (for each address > family). We probably should do this too.
IPv6 link-locals need to be specified with the '%' delimiter to work with current versions of dnsmasq (as explained in c328cf5). This issue impacts non-link-locals only. So you'll want something that works like this: dnsmasq_conf = sprintf("server=%s%s%s", addr, IN6_IS_ADDR_LINKLOCAL(addr) ? "%" : "@", iface) The issue doesn't really affect the glibc stub resolver because it checks with each server in resolv.conf in the order given. NM sorts that list according to the associated connection preference, so most likely the server tried first is a usable server.
(In reply to Tore Anderson from comment #0) > I've done some testing, and one way to fix the problem is to suffix > "@<interface>" to the dnsmasq.conf server=foo statements. The main problem I see with this approach is that multiple connections can have the same DNS server, but we can only put a single "server=IP@iface" entry in dnsmasq configuration for that address. In case of duplicate entries probably we should look at routes to decide which interface is the best for a given name server (however this is not trivial to do right), or simply omit the "@interface" suffix (but then the original issue would be still present).
(In reply to Beniamino Galvani from comment #3) > (In reply to Tore Anderson from comment #0) > > I've done some testing, and one way to fix the problem is to suffix > > "@<interface>" to the dnsmasq.conf server=foo statements. > > The main problem I see with this approach is that multiple connections > can have the same DNS server, but we can only put a single > "server=IP@iface" entry in dnsmasq configuration for that address. > > In case of duplicate entries probably we should look at routes to > decide which interface is the best for a given name server (however > this is not trivial to do right), or simply omit the "@interface" > suffix (but then the original issue would be still present). Reenacting what kernel routing does would be much too complicated. There is RTM_GETROUTE netlink request for that though... But that still seems complicated to do, because routing can change, and then it's unclear what do to about that (and how to detect that). Adding a direct route to the DNS server might be a good idea. But not trival either: what would be the gateway for that route? Guess for that we could use RTM_GETROUTE...
(In reply to Beniamino Galvani from comment #3) > The main problem I see with this approach is that multiple connections > can have the same DNS server, but we can only put a single > "server=IP@iface" entry in dnsmasq configuration for that address. Actually it seems to work. Well, sort of. For example: $ dnsmasq -d -R -S 84.208.20.110@wlo1 -S 84.208.20.110@enp0s25 ...will send out forwarded requests to the server via both interfaces simultaneously. It only seems to process responses coming in via the interface belonging to the primary connection for some reason (maybe rp_filter?), but at least that's no worse than what we have today.
Created attachment 326327 [details] [review] dns: specify egress interface for each dnsmasq upstream server (In reply to Tore Anderson from comment #5) > ...will send out forwarded requests to the server via both interfaces > simultaneously. It only seems to process responses coming in via the > interface belonging to the primary connection for some reason (maybe > rp_filter?), but at least that's no worse than what we have today. This sounds fine. I think we should always specify an interface to avoid the issues you reported. How about this patch? I did some basic tests, but probably more are needed.
I've tested that patch on top of git master and functionality-wise it looks good to me, the REFUSED lookups are completely gone. (I have no opinion of the code itself though.)
Looks good. Just some minor nits: iface = g_object_get_data (G_OBJECT (ip4), IP_CONFIG_IFACE_TAG); g_assert (iface); The assert is quite harsh. I'd drop the assert (also from the v6 case). Or alternatively: keep the assert but add a g_return_if_fail(iface) to nm_dns_manager_add_ip6_config(), to catch the missing interface name when it is set, not later when it is used. There should be the assert early when the wrong thing happens (not when we later want to use it). Btw, there is also nm_sprintf_buf(), but g_snprintf is fine too. buf = nm_utils_inet4_ntop (addr, NULL); /*use buf later*/ IMO using the static-buffer is error prone when the result is not used right away. preferable would be char buf[NM_UTILS_INET_ADDRSTRLEN]; nm_utils_inet4_ntop (addr, buf); /*use buf later*/ or + snprintf (buf, sizeof (buf), "%s%s%s", + nm_utils_inet4_ntop (addr, NULL); + iface[0] ? "@" : "", + iface[0] ? iface : ""); (as it applies). overall, lgtm
(In reply to Thomas Haller from comment #8) > The assert is quite harsh. I'd drop the assert (also from the v6 case). > Or alternatively: keep the assert but add a g_return_if_fail(iface) to > nm_dns_manager_add_ip6_config(), to catch the missing interface name when it > is set, not later when it is used. There should be the assert early when the > wrong thing happens (not when we later want to use it). > Btw, there is also nm_sprintf_buf(), but g_snprintf is fine too. > IMO using the static-buffer is error prone when the result is not used right > away. Fixed and re-pushed (with additional commit) to bg/dnsmasq-interfaces-bgo765153.
lgtm
+1
Merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=6280839c16022c2cbd8358cbdea670f3fb638f7a