GNOME Bugzilla – Bug 692279
only one of multiple default gateways from IPv6 router advertisments is duplicated
Last modified: 2015-03-10 10:55:35 UTC
From Martin Jackson through NetworkManager mailing list: Hello, First, let me thank you for all of your hard work in developing and maintaining network-manager. It makes life with a Linux laptop much simpler. I have observed a behavior in network-manager in both Ubuntu (12.10) and Fedora (18). In the presence of multiple routers advertising defaults, n-m seems to insert a static default route for one of them, with metric 1. Here is example output from my Fedora system: default via fe80::5054:ff:fe01:b6dc dev eth0 proto static metric 1 default via fe80::5054:ff:fe01:b6dc dev eth0 proto ra metric 1024 expires 2sec default via fe80::5054:ff:fe2f:62c2 dev eth0 proto ra metric 1024 expires 9sec The Ubuntu system's output looks similar. I believe that the addition of the "static metric 1" route is incorrect, as static routes of this type persist even when the router in question is no longer doing router advertisements. In that case, the system will use that route in preference to the other router, which may still be doing router advertisements. This creates issues when multiple routers are correctly advertising on the same link, and the intention is to provide a failover capability, without the complications of a first hop redundancy protocol. I am in control of both of the advertising routers and can enable debug if needed on either the client systems or the advertising routers (which are virtual Ubuntu systems running Quagga). I have filed a bug with Ubuntu (https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1101825), but this seemed like a better place to address this - I see that recently there has been some discussion about IPv6 semantics on this list and some recent patches for IPv6, so this discussion might be timely. Thanks, Marty
(In reply to comment #0) > First, let me thank you for all of your hard work in developing and > maintaining network-manager. It makes life with a Linux laptop much > simpler. Thanks! > I have observed a behavior in network-manager in both Ubuntu (12.10) and > Fedora (18). In the presence of multiple routers advertising defaults, > n-m seems to insert a static default route for one of them, with metric > 1. Here is example output from my Fedora system: > > default via fe80::5054:ff:fe01:b6dc dev eth0 proto static metric 1 > default via fe80::5054:ff:fe01:b6dc dev eth0 proto ra metric 1024 > expires 2sec > default via fe80::5054:ff:fe2f:62c2 dev eth0 proto ra metric 1024 > expires 9sec Yes. This is the current behavior and it is necessary to do it (or something similar) to be able to choose the default routing interface. It might be better to duplicate all local default routes from the default route interface. Duplicate routes are the way NetworkManager marks the default routing interface in current kernels. > > The Ubuntu system's output looks similar. > > I believe that the addition of the "static metric 1" route is incorrect, > as static routes of this type persist even when the router in question > is no longer doing router advertisements. How did you test that? > In that case, the system will > use that route in preference to the other router, which may still be > doing router advertisements. This creates issues when multiple routers > are correctly advertising on the same link, and the intention is to > provide a failover capability, without the complications of a first hop > redundancy protocol. Agreed. > I am in control of both of the advertising routers and can enable debug > if needed on either the client systems or the advertising routers (which > are virtual Ubuntu systems running Quagga). > > I have filed a bug with Ubuntu > (https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1101825), but > this seemed like a better place to address this - I see that recently > there has been some discussion about IPv6 semantics on this list and > some recent patches for IPv6, so this discussion might be timely. > > Thanks, > Marty The current IPv6 processing is based in linux kernel shortcomings and the same applies to IPv6 standards. There is some information about the kernel part in redhat bugzilla: bugzilla.redhat.com/show_bug.cgi?id=891245
Created attachment 234272 [details] Syslog with NetworkManager debug turned on
To test it, I stopped Quagga on one of the hosts (I have seen the static metric 1 route appear to both of the hosts advertising). I was curious to see what would happen, since I figured the static metric 1 route would persist, and it does. I will attach the relevant zebra.conf pieces from my two router VMs, if it will help you. I recently went through some IPv6 training, and was motivated to see how a Linux client would react. I have tried a similar experiment without network-manager; in that scenario I only see the ra metric 1024 routes, not a static metric 1 route. As a sidebar comment (this probably is how the kernel manages it), but I noticed that on Windows 7, Windows applies a better metric to the route with higher preference, which I think might be useful in Linux. Apparently the bug linked in RH bugzi is not public as it won't let me view it. (Though I don't have a Red Hat bugzi account.)
(In reply to comment #3) > To test it, I stopped Quagga on one of the hosts (I have seen the static metric > 1 route appear to both of the hosts advertising). I was curious to see what > would happen, since I figured the static metric 1 route would persist, and it > does. Quagga tests are irrelevant. > I will attach the relevant zebra.conf pieces from my two router VMs, if it will > help you. I recently went through some IPv6 training, and was motivated to see > how a Linux client would react. I can very well imagine that :). > I have tried a similar experiment without network-manager; in that scenario I > only see the ra metric 1024 routes, not a static metric 1 route. Experiments without networkmanager are irrelevant. NetworkManager does that as a workaround for missing kernel API for choosing the default route interface. > As a sidebar comment (this probably is how the kernel manages it), but I > noticed that on Windows 7, Windows applies a better metric to the route with > higher preference, which I think might be useful in Linux. This is not possible with Linux until the kernel IPv6 API is useful enough. > Apparently the bug > linked in RH bugzi is not public as it won't let me view it. (Though I don't > have a Red Hat bugzi account.) Sorry for that. It should be public now.
I get the irrelevance comments, at least as far as quagga goes. Sometimes with things as *practically* new as IPv6, reproduction can be a problem. I can see the RH bug now, thank you. Do I understand you correctly that the static route with metric 1 is currently necessary for IPv6 routing to work at all? (At least in the presence of multiple interfaces - but that seems to be the norm these days.)
(In reply to comment #5) > Do I understand you correctly that the static route with metric 1 is currently > necessary for IPv6 routing to work at all? (At least in the presence of > multiple interfaces - but that seems to be the norm these days.) Correct. It's the way NetworkManager chosses the outgoing interface. We are talking about nicer solutions, though.
OK. Well, I hope this report helps motivate nicer solutions. Thanks for the time and explanations!
(In reply to comment #7) > OK. Well, I hope this report helps motivate nicer solutions. Thanks for the > time and explanations! It certainly will. First, it won't be easily forgotten. Second, it means some people do care about this use case.
NM bugzilla reorganization. Sorry for the bug spam.
(In reply to comment #0) > From Martin Jackson through NetworkManager mailing list: > default via fe80::5054:ff:fe01:b6dc dev eth0 proto ra metric 1024 > expires 2sec > default via fe80::5054:ff:fe2f:62c2 dev eth0 proto ra metric 1024 > expires 9sec From libnl source code (lib/route/route_obj.c): .oo_id_attrs = (ROUTE_ATTR_FAMILY | ROUTE_ATTR_TOS | ROUTE_ATTR_TABLE | ROUTE_ATTR_DST | ROUTE_ATTR_PRIO), I always understood libnl's oo_id_attrs as a unique key. Therefore I thought that two routes with the same values of id attributes should never exist. In both cases, address family is IPv6, table is main, destination is default and priority is 1024. The only remaining id argument is TOS which is not shown. I don't see any reason for that one to be different, so I can assume that there are two routes with duplicate netlink oo_id_attrs attributes. Fortunately, with libndp we will avoid the whole kernel autoconf mess.
This bug report should be solved by development done for bug #699772.
In current git master, userspace router discovery (src/rdisc) is being used, nm-device takes the list of gateways from it and only picks the first one (sorted by src/rdisc). Therefore there's no longer a need to duplicate kernel-autoconfigured addresses.
Pavel, https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1101825 links to this issue. However, that bug is about the fact that when NM receives an RA, it creates a static route that never expires, and when the RA expires, the route is not deleted. That's incorrect, and it will break connectivity if the network default gateway changes. In current git master, does receiving an RA also cause NM to create a static route that never expires? (I see this on 0.9.8.8, but don't have an easy way to run master.) Should I file another bug about it here?
(In reply to Lorenzo Colitti from comment #13) > Pavel, > https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1101825 links > to this issue. > > However, that bug is about the fact that when NM receives an RA, it creates > a static route that never expires, and when the RA expires, the route is not > deleted. That's incorrect, and it will break connectivity if the network > default gateway changes. > > In current git master, does receiving an RA also cause NM to create a static > route that never expires? (I see this on 0.9.8.8, but don't have an easy way > to run master.) Should I file another bug about it here? Since 0.9.10, NM does autoconf in userspace (using libndp). So it works quite differently there. NM-0.9.8 let kernel handle it. Routes by themselves don't expire in Linux. But since 0.9.10, NetworkManager will keep track of them and remove them. On NM master routes should expire. If they don't, it's a separate bug. 0.9.8 is quite old... in this case a downstream bug seems more appropriate. I don't think that somebody upstream will fix such an old version -- unless it's trivial or somebody wants to work on it.