GNOME Bugzilla – Bug 671767
IPv6 - lots of host routes added
Last modified: 2012-08-31 13:10:33 UTC
The latest NetworkManager adds a separate host route for every single IPv6 host I access. Even `ping`ing the host causes a route to appear, marked as "proto static". For example, the following output from `ip -6 route`, when using NM 0.9.3.995: <<EOF 2001:470:1f0b:614::/64 dev wlan0 proto kernel metric 256 2001:41d0:2:a128:dead:beef:84f3:3a85 via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto static metric 32661 2600:3c03::f03c:91ff:fe96:23d via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto static metric 32661 2a00:1450:400c:c01::69 via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto static metric 32661 2a00:1450:400c:c01::7d via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto static metric 1024 rtt 81ms rttvar 80ms cwnd 3 2a00:1450:400c:c01::7d via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto static metric 32661 2a01:7e00::f03c:91ff:fe96:53e1 via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto static metric 32661 fe80::/64 dev wlan0 proto kernel metric 256 default via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto kernel metric 1024 expires 35935sec EOF Compare this to the same command's output when using NM 0.9.2.0 (or when not using NM at all): <<EOF 2001:470:1f0b:614::/64 dev wlan0 proto kernel metric 256 fe80::/64 dev wlan0 proto kernel metric 256 default via fe80::5ed9:98ff:fef7:ad6d dev wlan0 proto kernel metric 1024 expires 35818sec EOF According to `git bisect`, the following commit introduced this: <<EOF ae34fa900b0a8410002f6f96b0bd27d807489dfa is the first bad commit commit ae34fa900b0a8410002f6f96b0bd27d807489dfa Author: Dan Williams <dcbw@redhat.com> Date: Mon Feb 13 13:06:52 2012 -0600 core: fix erroneous IPv6 routes by making route addition typesafe Add two helper functions, one for IPv4 and one for IPv6, to ensure that the core code benefits from compiler type checking when adding routes. Previously nm_netlink_route_add() took a void* which meant we messed up adding IPv6 routes sometimes due to confusion over what was supposed to be passed to it. Also fixes what appears to be a C&P error with add_ip6_route_to_gateway(). Reported by Tomáš Trnka <tomastrnka@gmx.com> EOF I'm curious if this is a bug or a "feature".
I don't believe this is caused by NM specifically. To test I stopped NetworkManager entirely and performed the following test. The machine is on a subnet with an IPv6 router sending Router Advertisements. $ ip -f inet6 route list 3ffe:b80:17e2::/64 dev eth0 proto kernel metric 256 expires 86366sec unreachable fe80::/64 dev lo proto kernel metric 256 error -101 fe80::/64 dev eth0 proto kernel metric 256 default via fe80::214:22ff:fefd:6e7 dev eth0 proto kernel metric 1024 expires 53sec $ ping6 3ffe:b80:17e2::1 PING 3ffe:b80:17e2::1(3ffe:b80:17e2::1) 56 data bytes 64 bytes from 3ffe:b80:17e2::1: icmp_seq=1 ttl=64 time=0.407 ms 64 bytes from 3ffe:b80:17e2::1: icmp_seq=2 ttl=64 time=0.230 ms ^C --- 3ffe:b80:17e2::1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.230/0.318/0.407/0.090 ms $ ip -f inet6 route list 3ffe:b80:17e2::1 via 3ffe:b80:17e2::1 dev eth0 metric 0 cache 3ffe:b80:17e2::/64 dev eth0 proto kernel metric 256 expires 86371sec unreachable fe80::/64 dev lo proto kernel metric 256 error -101 fe80::/64 dev eth0 proto kernel metric 256 default via fe80::214:22ff:fefd:6e7 dev eth0 proto kernel metric 1024 expires 57sec The router is, of course, 3ffe:b80:17e2::1. Given that during this NM was *not* running I'd say this is expected (if odd) behavior by the kernel networking stack. It may be that some IPv6 sysctl option that NM sets triggers it, but I find that hard to believe given that the only IPv6 related sysctls NetworkManager touches are "accept_ra" and "ip6_privacy_tempaddr".
By "ip6_privacy_tempaddr" I actually mean "use_tempaddr". Setting that to 0 (disabled) has no effect so sysctls are not the culprit.
Dan: It's possible, but I actually tested switching between stable and beta versions of NM, and this just doesn't happen with anything other than NM 0.9.3.x... Also, your example shows the host route labelled "cache" (which is normal), while the same `ip -f inet6 route list` on my system shows all routes as "proto static" (statically added).
Same problem here, running NM 0.9.4.0-3 on Debian amd64. Every time a route is added the following is logged into /var/log/syslog: May 16 14:44:42 schleppi NetworkManager[1454]: <info> Policy set 'eduroam' (wlan0) as default for IPv4 routing and DNS. May 16 14:44:42 schleppi NetworkManager[1454]: <info> Policy set 'eduroam' (wlan0) as default for IPv6 routing and DNS. The 'cache' entries are okay and can be safely flushed with 'ip -6 route flush cache', but I see real static /128 in the routing table that don't go away and prevent more-specific VPN routes from working.
This is what one sees in debugging (log-domains=IP6,DHCP6) NetworkManager[10079]: <debug> [1337173238.390024] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message NetworkManager[10079]: <debug> [1337173238.390126] [nm-ip6-manager.c:1110] netlink_notification(): (wlan0): syncing device with netlink changes NetworkManager[10079]: <debug> [1337173238.390150] [nm-ip6-manager.c:431] nm_ip6_device_sync_from_netlink(): (wlan0): syncing with netlink (ra_flags 0x800000B0) (state/target 'got-address'/'got-address') NetworkManager[10079]: <debug> [1337173238.390175] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: 2001:4ca0:0:f200:21d:e0ff:fe29:5b69/64 NetworkManager[10079]: <debug> [1337173238.390195] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: fe80::21d:e0ff:fe29:5b69/64 NetworkManager[10079]: <debug> [1337173238.390213] [nm-ip6-manager.c:473] nm_ip6_device_sync_from_netlink(): (wlan0): addresses synced (state got-address) NetworkManager[10079]: <debug> [1337173238.390230] [nm-ip6-manager.c:487] nm_ip6_device_sync_from_netlink(): router advertisement requests parallel DHCPv6 NetworkManager[10079]: <debug> [1337173238.391897] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message NetworkManager[10079]: <debug> [1337173238.391945] [nm-ip6-manager.c:619] process_route(): (wlan0): route cache unchanged, ignoring message NetworkManager[10079]: <debug> [1337173238.391991] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message NetworkManager[10079]: <debug> [1337173238.392025] [nm-ip6-manager.c:1110] netlink_notification(): (wlan0): syncing device with netlink changes NetworkManager[10079]: <debug> [1337173238.392045] [nm-ip6-manager.c:431] nm_ip6_device_sync_from_netlink(): (wlan0): syncing with netlink (ra_flags 0x800000B0) (state/target 'got-address'/'got-address') NetworkManager[10079]: <debug> [1337173238.392067] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: 2001:4ca0:0:f200:21d:e0ff:fe29:5b69/64 NetworkManager[10079]: <debug> [1337173238.392183] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: fe80::21d:e0ff:fe29:5b69/64 NetworkManager[10079]: <debug> [1337173238.392201] [nm-ip6-manager.c:473] nm_ip6_device_sync_from_netlink(): (wlan0): addresses synced (state got-address) NetworkManager[10079]: <debug> [1337173238.392217] [nm-ip6-manager.c:487] nm_ip6_device_sync_from_netlink(): router advertisement requests parallel DHCPv6 NetworkManager[10079]: <debug> [1337173239.525443] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message NetworkManager[10079]: <debug> [1337173239.525608] [nm-ip6-manager.c:1110] netlink_notification(): (wlan0): syncing device with netlink changes NetworkManager[10079]: <debug> [1337173239.525652] [nm-ip6-manager.c:431] nm_ip6_device_sync_from_netlink(): (wlan0): syncing with netlink (ra_flags 0x800000B0) (state/target 'got-address'/'got-address') NetworkManager[10079]: <debug> [1337173239.525699] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: 2001:4ca0:0:f200:21d:e0ff:fe29:5b69/64 NetworkManager[10079]: <debug> [1337173239.525736] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: fe80::21d:e0ff:fe29:5b69/64 NetworkManager[10079]: <debug> [1337173239.525769] [nm-ip6-manager.c:473] nm_ip6_device_sync_from_netlink(): (wlan0): addresses synced (state got-address) NetworkManager[10079]: <debug> [1337173239.525801] [nm-ip6-manager.c:487] nm_ip6_device_sync_from_netlink(): router advertisement requests parallel DHCPv6 It only happens with mode=auto, not with the very same SSID/network with mode=ignore.
Created attachment 214479 [details] Debugging output from NM The cached host routes are as, Dan pointed out, automatically added by the kernel (it does so for IPv4 as well, but it puts them in a separate routing table). However - it seems NM will delete the kernel-added host route, and replace it with a copy of its own - only that it has proto static and metric 1024. Here's output from "ip monitor" when doing a ping towards labs.ripe.net (2001:67c:2e8:22::c100:699): > $ ip monitor > [...] > 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0 metric 0 > cache > 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0 proto static > metric 1024 > Deleted 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0 metric 0 > cache > Deleted default via fe80::ca6c:87ff:feab:d027 dev wlan0 proto static metric 1 > default via fe80::ca6c:87ff:feab:d027 dev wlan0 proto static metric 1 Also, the default route undergoes such replacement. My routing table currently looks like this (other cache entries removed): > $ ip -6 r > 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0 proto static metric 1024 > unreachable 2a02:fe0:cf16:10:: dev lo proto kernel metric 256 error -101 > 2a02:fe0:cf16:10::/64 dev wlan0 proto kernel metric 256 expires 2147157sec > unreachable fe80::/64 dev lo proto kernel metric 256 error -101 > fe80::/64 dev wlan0 proto kernel metric 256 > default via fe80::ca6c:87ff:feab:d027 dev wlan0 proto static metric 1 > default via fe80::ca6c:87ff:feab:d027 dev wlan0 proto kernel metric 1024 expires 1703sec Also note how I have two default routes with different protocols and metrics. If I disable NM's IPv6 support (setting it to mode=ignore), reconnect and again ping labs.ripe.net, my routing table looks like this: > $ ip -6 r > 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0 metric 0 > cache > unreachable 2a02:fe0:cf16:10:: dev lo proto kernel metric 256 error -101 > 2a02:fe0:cf16:10::/64 dev wlan0 proto kernel metric 256 expires 2147157sec > unreachable fe80::/64 dev lo proto kernel metric 256 error -101 > fe80::/64 dev wlan0 proto kernel metric 256 > default via fe80::ca6c:87ff:feab:d027 dev wlan0 proto kernel metric 1024 expires 1770sec Note how the cached host route to 2001:67c:2e8:22::c100:699 now have metric 0, no protocol specified, and with a cache tag. So this is the "pristine" cached host route that NM would otherwise remove and replace with one of its own. Also note how there's only one default route now, the one with "proto static" is gone. So we can conclude that one was added by NM in the first output. With NM in ignore mode, "ip monitor" only shows the initial addition of the cached route entry (same as the first line from the output above). This bug causes problem for the Chromium/Chrome web browser, as it will cancel all in-progress HTTP requests whenever it detects a "network change". As it turns out, the route addition/deletions performed by NM is considered such a "network change". Which means that if you instruct it to go visit labs.ripe.net, for example, the kernel will add the cached route entry, which immediately kicks off the NM route removal/readdition bug, at the exact same time that browser is in the process of retrieving the page. The removal/readdition is then noticed by the browser as a network change, which proceeds to cancel the request, causing the page to fail loading. For more information on this bug, see: http://code.google.com/p/chromium/issues/detail?id=128509 The problem was reproduced with NetworkManager-0.9.4-4.git20120403.fc16.x86_64 on Fedora 16. I'm attaching a debug log of what was output from NetworkManager at the exact same time as the "ip monitor" output above (when I did "ping6 -c 1 labs.ripe.net"). Tore
I forgot to mention, except for the initial addition of static routes (if any), is there any reason at all why NM should monitor and modify the IPv6 routing table? As far as I can tell, the default route and any routes to on-link prefixes are maintained by the kernel's Router Advertisement processing, and the cached host routes are also handled internally by the kernel's IPv6 stack. I don't think there's anything left that NM needs to deal with? Tore
Actually it seems it's not the routes being removed and re-added that causes problems for Chrome, it is the fact that /etc/resolv.conf gets re-written - this causes it to cancel all outstanding DNS requests. I guess NM re-activates the entire IPv6 configuration whenever the kernel adds a routing cache entry, which cannot be right. Tore
(In reply to comment #7) > I forgot to mention, except for the initial addition of static routes (if any), > is there any reason at all why NM should monitor and modify the IPv6 routing > table? > > As far as I can tell, the default route and any routes to on-link prefixes are > maintained by the kernel's Router Advertisement processing, and the cached host > routes are also handled internally by the kernel's IPv6 stack. I don't think > there's anything left that NM needs to deal with? > > Tore What if you have IPv6 auto-configured on more than one interface? In that case, NetworkManager manages the default route choice.
(In reply to comment #9) > What if you have IPv6 auto-configured on more than one interface? In that case, > NetworkManager manages the default route choice. Fair point, but in this case the only sane way to do for NM it is to disable the kernel's RA processing w.r.t. managing the default router (the accept_ra_defrtr sysctl) and handle installing the default route all on its own. Otherwise NM would have to play catch-up and modify whatever the kernel is doing. The kernel will add back the default route on the not-chosen interface every time an unsolicited RA arrive there, and NM will have to remove it again. Not a good approach... Tore
(In reply to comment #8) > Actually it seems it's not the routes being removed and re-added that causes > problems for Chrome, it is the fact that /etc/resolv.conf gets re-written - > this causes it to cancel all outstanding DNS requests. I guess NM re-activates > the entire IPv6 configuration whenever the kernel adds a routing cache entry, > which cannot be right. New bug filed for this: https://bugzilla.gnome.org/show_bug.cgi?id=676778
The unnecessary routing table activity is still happening in 0.9.6-rc1. However, #676778 seems fixed, so Chrome is usable again.
> Fair point, but in this case the only sane way to do for NM it is to disable > the kernel's RA processing w.r.t. managing the default router (the > accept_ra_defrtr sysctl) and handle installing the default route all on its > own. This can be done if NetworkManager to the router advertisements itself or if the kernel gives out information about router advertisements. > Otherwise NM would have to play catch-up and modify whatever the kernel is > doing. Not really. It should not do anything with any routes except the default gateway. And it should never ever remove kernel routes as that's pretty ugly. Currently the only correct IPv6 route writing in NM is installing/replacing just one prioritized default route. Any other direct route handling is a bug (if I haven't forgotten something). I'm maintaining a list of problems related to kernel: https://fedoraproject.org/wiki/Tools/NetworkManager/Integration#Kernel
I should note that this is NOT merely a cosmetic bug. It can actually break networking in confusing ways, every time the default route is changed or removed but NM's static routes keep pointing to the old gateway. (My laptop's IPv6 connectivity is not permanent – a generic desktop PC is used as a gateway to Tunnelbroker, and advertises short-lived routes [a few minutes]. When I turn it off, the advertised routes expire and new connections should fall back to IPv4. Unfortunately, NM's static routes never expire, so this doesn't work the way it's supposed to...) Using 0.9.6.0, which seems to be latest, but still found almost 4k of host routes after a BitTorrent download...
I'll keep this on my checklist.
*** Bug 682616 has been marked as a duplicate of this bug. ***
Cached/cloned routes are now ignored in git master via these two commits: 46e0af2942e23fb3cf1c313e58e4081877d4f289 3ca3120e4a01ea4a86fd052311c977e7ec136365 which should fix this problem. These routes are temporary ones added by the kernel and we shouldn't really do anything with them, since they aren't part of the interface's permanent routing configuration (as delivered by DHCP or set by the user).
Thanks.
I have applied those two patches to Debian 0.9.4.0 and it solves the problem for me. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686328 Thanks a lot