After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 671767 - IPv6 - lots of host routes added
IPv6 - lots of host routes added
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
0.9.x
Other Linux
: Normal normal
: ---
Assigned To: Pavel Simerda
Dan Williams
: 682616 (view as bug list)
Depends on:
Blocks: nm-0.9.8
 
 
Reported: 2012-03-10 12:06 UTC by Mantas Mikulėnas (grawity)
Modified: 2012-08-31 13:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Debugging output from NM (10.02 KB, text/plain)
2012-05-20 08:17 UTC, Tore Anderson
Details

Description Mantas Mikulėnas (grawity) 2012-03-10 12:06:27 UTC
The latest NetworkManager adds a separate host route for every single IPv6 host I access. Even `ping`ing the host causes a route to appear, marked as "proto static". For example, the following output from `ip -6 route`, when using NM 0.9.3.995:

<<EOF
2001:470:1f0b:614::/64 dev wlan0
  proto kernel  metric 256
2001:41d0:2:a128:dead:beef:84f3:3a85 via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto static  metric 32661
2600:3c03::f03c:91ff:fe96:23d via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto static  metric 32661
2a00:1450:400c:c01::69 via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto static  metric 32661
2a00:1450:400c:c01::7d via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto static  metric 1024  rtt 81ms rttvar 80ms cwnd 3
2a00:1450:400c:c01::7d via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto static  metric 32661
2a01:7e00::f03c:91ff:fe96:53e1 via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto static  metric 32661
fe80::/64 dev wlan0
  proto kernel  metric 256
default via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto kernel  metric 1024  expires 35935sec
EOF

Compare this to the same command's output when using NM 0.9.2.0 (or when not using NM at all):

<<EOF
2001:470:1f0b:614::/64 dev wlan0
  proto kernel  metric 256
fe80::/64 dev wlan0
  proto kernel  metric 256
default via fe80::5ed9:98ff:fef7:ad6d dev wlan0
  proto kernel  metric 1024  expires 35818sec
EOF

According to `git bisect`, the following commit introduced this:

<<EOF
ae34fa900b0a8410002f6f96b0bd27d807489dfa is the first bad commit
commit ae34fa900b0a8410002f6f96b0bd27d807489dfa
Author: Dan Williams <dcbw@redhat.com>
Date:   Mon Feb 13 13:06:52 2012 -0600

    core: fix erroneous IPv6 routes by making route addition typesafe
    
    Add two helper functions, one for IPv4 and one for IPv6, to ensure
    that the core code benefits from compiler type checking when adding
    routes.  Previously nm_netlink_route_add() took a void* which meant
    we messed up adding IPv6 routes sometimes due to confusion over
    what was supposed to be passed to it.  Also fixes what appears to
    be a C&P error with add_ip6_route_to_gateway().
    
    Reported by Tomáš Trnka <tomastrnka@gmx.com>
EOF

I'm curious if this is a bug or a "feature".
Comment 1 Dan Williams 2012-03-22 03:49:30 UTC
I don't believe this is caused by NM specifically.  To test I stopped NetworkManager entirely and performed the following test.  The machine is on a subnet with an IPv6 router sending Router Advertisements.

$ ip -f inet6 route list
3ffe:b80:17e2::/64 dev eth0  proto kernel  metric 256  expires 86366sec
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev eth0  proto kernel  metric 256 
default via fe80::214:22ff:fefd:6e7 dev eth0  proto kernel  metric 1024  expires 53sec

$ ping6 3ffe:b80:17e2::1
PING 3ffe:b80:17e2::1(3ffe:b80:17e2::1) 56 data bytes
64 bytes from 3ffe:b80:17e2::1: icmp_seq=1 ttl=64 time=0.407 ms
64 bytes from 3ffe:b80:17e2::1: icmp_seq=2 ttl=64 time=0.230 ms
^C
--- 3ffe:b80:17e2::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.230/0.318/0.407/0.090 ms

$ ip -f inet6 route list
3ffe:b80:17e2::1 via 3ffe:b80:17e2::1 dev eth0  metric 0 
    cache 
3ffe:b80:17e2::/64 dev eth0  proto kernel  metric 256  expires 86371sec
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev eth0  proto kernel  metric 256 
default via fe80::214:22ff:fefd:6e7 dev eth0  proto kernel  metric 1024  expires 57sec

The router is, of course, 3ffe:b80:17e2::1.  Given that during this NM was *not* running I'd say this is expected (if odd) behavior by the kernel networking stack.  It may be that some IPv6 sysctl option that NM sets triggers it, but I find that hard to believe given that the only IPv6 related sysctls NetworkManager touches are "accept_ra" and "ip6_privacy_tempaddr".
Comment 2 Dan Williams 2012-03-22 03:53:03 UTC
By "ip6_privacy_tempaddr" I actually mean "use_tempaddr".  Setting that to 0 (disabled) has no effect so sysctls are not the culprit.
Comment 3 Mantas Mikulėnas (grawity) 2012-03-22 15:49:02 UTC
Dan: It's possible, but I actually tested switching between stable and beta versions of NM, and this just doesn't happen with anything other than NM 0.9.3.x...

Also, your example shows the host route labelled "cache" (which is normal), while the same `ip -f inet6 route list` on my system shows all routes as "proto static" (statically added).
Comment 4 Bernhard Schmidt 2012-05-16 12:47:37 UTC
Same problem here, running NM 0.9.4.0-3 on Debian amd64.

Every time a route is added the following is logged into /var/log/syslog:
May 16 14:44:42 schleppi NetworkManager[1454]: <info> Policy set 'eduroam' (wlan0) as default for IPv4 routing and DNS.
May 16 14:44:42 schleppi NetworkManager[1454]: <info> Policy set 'eduroam' (wlan0) as default for IPv6 routing and DNS.

The 'cache' entries are okay and can be safely flushed with 'ip -6 route flush cache', but I see real static /128 in the routing table that don't go away and prevent more-specific VPN routes from working.
Comment 5 Bernhard Schmidt 2012-05-16 13:03:30 UTC
This is what one sees in debugging (log-domains=IP6,DHCP6)

NetworkManager[10079]: <debug> [1337173238.390024] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message
NetworkManager[10079]: <debug> [1337173238.390126] [nm-ip6-manager.c:1110] netlink_notification(): (wlan0): syncing device with netlink changes
NetworkManager[10079]: <debug> [1337173238.390150] [nm-ip6-manager.c:431] nm_ip6_device_sync_from_netlink(): (wlan0): syncing with netlink (ra_flags 0x800000B0) (state/target 'got-address'/'got-address')
NetworkManager[10079]: <debug> [1337173238.390175] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: 2001:4ca0:0:f200:21d:e0ff:fe29:5b69/64
NetworkManager[10079]: <debug> [1337173238.390195] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: fe80::21d:e0ff:fe29:5b69/64
NetworkManager[10079]: <debug> [1337173238.390213] [nm-ip6-manager.c:473] nm_ip6_device_sync_from_netlink(): (wlan0): addresses synced (state got-address)
NetworkManager[10079]: <debug> [1337173238.390230] [nm-ip6-manager.c:487] nm_ip6_device_sync_from_netlink(): router advertisement requests parallel DHCPv6
NetworkManager[10079]: <debug> [1337173238.391897] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message
NetworkManager[10079]: <debug> [1337173238.391945] [nm-ip6-manager.c:619] process_route(): (wlan0): route cache unchanged, ignoring message
NetworkManager[10079]: <debug> [1337173238.391991] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message
NetworkManager[10079]: <debug> [1337173238.392025] [nm-ip6-manager.c:1110] netlink_notification(): (wlan0): syncing device with netlink changes
NetworkManager[10079]: <debug> [1337173238.392045] [nm-ip6-manager.c:431] nm_ip6_device_sync_from_netlink(): (wlan0): syncing with netlink (ra_flags 0x800000B0) (state/target 'got-address'/'got-address')
NetworkManager[10079]: <debug> [1337173238.392067] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: 2001:4ca0:0:f200:21d:e0ff:fe29:5b69/64
NetworkManager[10079]: <debug> [1337173238.392183] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: fe80::21d:e0ff:fe29:5b69/64
NetworkManager[10079]: <debug> [1337173238.392201] [nm-ip6-manager.c:473] nm_ip6_device_sync_from_netlink(): (wlan0): addresses synced (state got-address)
NetworkManager[10079]: <debug> [1337173238.392217] [nm-ip6-manager.c:487] nm_ip6_device_sync_from_netlink(): router advertisement requests parallel DHCPv6
NetworkManager[10079]: <debug> [1337173239.525443] [nm-ip6-manager.c:596] process_route(): processing netlink new/del route message
NetworkManager[10079]: <debug> [1337173239.525608] [nm-ip6-manager.c:1110] netlink_notification(): (wlan0): syncing device with netlink changes
NetworkManager[10079]: <debug> [1337173239.525652] [nm-ip6-manager.c:431] nm_ip6_device_sync_from_netlink(): (wlan0): syncing with netlink (ra_flags 0x800000B0) (state/target 'got-address'/'got-address')
NetworkManager[10079]: <debug> [1337173239.525699] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: 2001:4ca0:0:f200:21d:e0ff:fe29:5b69/64
NetworkManager[10079]: <debug> [1337173239.525736] [nm-ip6-manager.c:451] nm_ip6_device_sync_from_netlink(): (wlan0): netlink address: fe80::21d:e0ff:fe29:5b69/64
NetworkManager[10079]: <debug> [1337173239.525769] [nm-ip6-manager.c:473] nm_ip6_device_sync_from_netlink(): (wlan0): addresses synced (state got-address)
NetworkManager[10079]: <debug> [1337173239.525801] [nm-ip6-manager.c:487] nm_ip6_device_sync_from_netlink(): router advertisement requests parallel DHCPv6

It only happens with mode=auto, not with the very same SSID/network with mode=ignore.
Comment 6 Tore Anderson 2012-05-20 08:17:32 UTC
Created attachment 214479 [details]
Debugging output from NM

The cached host routes are as, Dan pointed out, automatically added by the kernel (it does so for IPv4 as well, but it puts them in a separate routing table).

However - it seems NM will delete the kernel-added host route, and replace it with a copy of its own - only that it has proto static and metric 1024. Here's output from "ip monitor" when doing a ping towards labs.ripe.net (2001:67c:2e8:22::c100:699):

> $ ip monitor
> [...]
> 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0  metric 0
>     cache
> 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0  proto static  > metric 1024
> Deleted 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0  metric 0
>     cache
> Deleted default via fe80::ca6c:87ff:feab:d027 dev wlan0  proto static  metric 1
> default via fe80::ca6c:87ff:feab:d027 dev wlan0  proto static  metric 1

Also, the default route undergoes such replacement. My routing table currently looks like this (other cache entries removed):

> $ ip -6 r
> 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0  proto static  metric 1024 
> unreachable 2a02:fe0:cf16:10:: dev lo  proto kernel  metric 256  error -101
> 2a02:fe0:cf16:10::/64 dev wlan0  proto kernel  metric 256  expires 2147157sec
> unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
> fe80::/64 dev wlan0  proto kernel  metric 256 
> default via fe80::ca6c:87ff:feab:d027 dev wlan0  proto static  metric 1 
> default via fe80::ca6c:87ff:feab:d027 dev wlan0  proto kernel  metric 1024  expires 1703sec

Also note how I have two default routes with different protocols and metrics.

If I disable NM's IPv6 support (setting it to mode=ignore), reconnect and again ping labs.ripe.net, my routing table looks like this:

> $ ip -6 r
> 2001:67c:2e8:22::c100:699 via fe80::ca6c:87ff:feab:d027 dev wlan0  metric 0 
>     cache 
> unreachable 2a02:fe0:cf16:10:: dev lo  proto kernel  metric 256  error -101
> 2a02:fe0:cf16:10::/64 dev wlan0  proto kernel  metric 256  expires 2147157sec
> unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
> fe80::/64 dev wlan0  proto kernel  metric 256 
> default via fe80::ca6c:87ff:feab:d027 dev wlan0  proto kernel  metric 1024  expires 1770sec

Note how the cached host route to 2001:67c:2e8:22::c100:699 now have metric 0, no protocol specified, and with a cache tag. So this is the "pristine" cached host route that NM would otherwise remove and replace with one of its own.

Also note how there's only one default route now, the one with "proto static" is gone. So we can conclude that one was added by NM in the first output.

With NM in ignore mode, "ip monitor" only shows the initial addition of the cached route entry (same as the first line from the output above).

This bug causes problem for the Chromium/Chrome web browser, as it will cancel all in-progress HTTP requests whenever it detects a "network change". As it turns out, the route addition/deletions performed by NM is considered such a "network change". Which means that if you instruct it to go visit labs.ripe.net, for example, the kernel will add the cached route entry, which immediately kicks off the NM route removal/readdition bug, at the exact same time that browser is in the process of retrieving the page. The removal/readdition is then noticed by the browser as a network change, which proceeds to cancel the request, causing the page to fail loading. For more information on this bug, see: http://code.google.com/p/chromium/issues/detail?id=128509

The problem was reproduced with NetworkManager-0.9.4-4.git20120403.fc16.x86_64 on Fedora 16. I'm attaching a debug log of what was output from NetworkManager at the exact same time as the "ip monitor" output above (when I did "ping6 -c 1 labs.ripe.net").

Tore
Comment 7 Tore Anderson 2012-05-20 08:23:24 UTC
I forgot to mention, except for the initial addition of static routes (if any), is there any reason at all why NM should monitor and modify the IPv6 routing table?

As far as I can tell, the default route and any routes to on-link prefixes are maintained by the kernel's Router Advertisement processing, and the cached host routes are also handled internally by the kernel's IPv6 stack. I don't think there's anything left that NM needs to deal with?

Tore
Comment 8 Tore Anderson 2012-05-20 21:00:51 UTC
Actually it seems it's not the routes being removed and re-added that causes problems for Chrome, it is the fact that /etc/resolv.conf gets re-written - this causes it to cancel all outstanding DNS requests. I guess NM re-activates the entire IPv6 configuration whenever the kernel adds a routing cache entry, which cannot be right.

Tore
Comment 9 Pavel Simerda 2012-05-22 07:56:43 UTC
(In reply to comment #7)
> I forgot to mention, except for the initial addition of static routes (if any),
> is there any reason at all why NM should monitor and modify the IPv6 routing
> table?
> 
> As far as I can tell, the default route and any routes to on-link prefixes are
> maintained by the kernel's Router Advertisement processing, and the cached host
> routes are also handled internally by the kernel's IPv6 stack. I don't think
> there's anything left that NM needs to deal with?
> 
> Tore

What if you have IPv6 auto-configured on more than one interface? In that case, NetworkManager manages the default route choice.
Comment 10 Tore Anderson 2012-05-22 09:09:17 UTC
(In reply to comment #9)

> What if you have IPv6 auto-configured on more than one interface? In that case,
> NetworkManager manages the default route choice.

Fair point, but in this case the only sane way to do for NM it is to disable the kernel's RA processing w.r.t. managing the default router (the accept_ra_defrtr sysctl) and handle installing the default route all on its own.

Otherwise NM would have to play catch-up and modify whatever the kernel is doing. The kernel will add back the default route on the not-chosen interface every time an unsolicited RA arrive there, and NM will have to remove it again. Not a good approach...

Tore
Comment 11 Dan Williams 2012-05-24 21:08:00 UTC
(In reply to comment #8)
> Actually it seems it's not the routes being removed and re-added that causes
> problems for Chrome, it is the fact that /etc/resolv.conf gets re-written -
> this causes it to cancel all outstanding DNS requests. I guess NM re-activates
> the entire IPv6 configuration whenever the kernel adds a routing cache entry,
> which cannot be right.

New bug filed for this:  https://bugzilla.gnome.org/show_bug.cgi?id=676778
Comment 12 Tore Anderson 2012-07-15 07:13:03 UTC
The unnecessary routing table activity is still happening in 0.9.6-rc1. However, #676778 seems fixed, so Chrome is usable again.
Comment 13 Pavel Simerda 2012-07-26 14:47:59 UTC
> Fair point, but in this case the only sane way to do for NM it is to disable
> the kernel's RA processing w.r.t. managing the default router (the
> accept_ra_defrtr sysctl) and handle installing the default route all on its
> own.

This can be done if NetworkManager to the router advertisements itself or if the kernel gives out information about router advertisements.

> Otherwise NM would have to play catch-up and modify whatever the kernel is
> doing.

Not really. It should not do anything with any routes except the default gateway.
And it should never ever remove kernel routes as that's pretty ugly.

Currently the only correct IPv6 route writing in NM is installing/replacing just one prioritized default route.

Any other direct route handling is a bug (if I haven't forgotten something).

I'm maintaining a list of problems related to kernel:

https://fedoraproject.org/wiki/Tools/NetworkManager/Integration#Kernel
Comment 14 Mantas Mikulėnas (grawity) 2012-08-25 20:29:54 UTC
I should note that this is NOT merely a cosmetic bug. It can actually break networking in confusing ways, every time the default route is changed or removed but NM's static routes keep pointing to the old gateway.

(My laptop's IPv6 connectivity is not permanent – a generic desktop PC is used as a gateway to Tunnelbroker, and advertises short-lived routes [a few minutes]. When I turn it off, the advertised routes expire and new connections should fall back to IPv4. Unfortunately, NM's static routes never expire, so this doesn't work the way it's supposed to...)

Using 0.9.6.0, which seems to be latest, but still found almost 4k of host routes after a BitTorrent download...
Comment 15 Pavel Simerda 2012-08-28 10:17:16 UTC
I'll keep this on my checklist.
Comment 16 Pavel Simerda 2012-08-29 09:46:34 UTC
*** Bug 682616 has been marked as a duplicate of this bug. ***
Comment 17 Dan Williams 2012-08-30 21:03:07 UTC
Cached/cloned routes are now ignored in git master via these two commits:

46e0af2942e23fb3cf1c313e58e4081877d4f289
3ca3120e4a01ea4a86fd052311c977e7ec136365

which should fix this problem.  These routes are temporary ones added by the kernel and we shouldn't really do anything with them, since they aren't part of the interface's permanent routing configuration (as delivered by DHCP or set by the user).
Comment 18 Pavel Simerda 2012-08-31 13:08:37 UTC
Thanks.
Comment 19 Bernhard Schmidt 2012-08-31 13:10:33 UTC
I have applied those two patches to Debian 0.9.4.0 and it solves the problem for me.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686328

Thanks a lot