GNOME Bugzilla – Bug 729203
fix connection matching with dynamic IPv6 routes
Last modified: 2014-06-06 14:28:06 UTC
If an IPv6 router sends us any non-link-local routes, then currently we fail to match the connection on restart, because the generated connection will include those routes as static routes. (https://bugzilla.redhat.com/show_bug.cgi?id=1086237) Attached patch fixes this by just ignoring all IPv6 routes when generating a connection if ipv6.method is 'auto'. It would be better if we could ignore only non-permanent routes, like we do with addresses, but libnl does not currently expose the lifetime information for routes.
Created attachment 275433 [details] [review] core: fix connection matching with IPv6 routes (rh #1086237) An IPv6 router may have sent us arbitrary routes, so don't consider them when comparing two method=auto connections. (A better fix would be to look at the route lifetime, to see if it's dynamic or static, but we don't currently track that.)
dcbw also suggests: > Long term fix: have NM set rtnl_route_set_proto(RTPROT_RA) on any routes > that are added from RA. Set RTPROT_DHCP for anything from DHCP. Then on > connecting matching, we'd discard these routes from the matching set so that > we could still capture static routes; or we could simply prefer a connection > whos routes matched more completely, but fall back to one where the > addressing matched but additional routes were present on the device.
(In reply to comment #1) > > (A better fix would be to look at the route lifetime, to see if it's > dynamic or static, but we don't currently track that.) I think, that alone does not suffice, because the routes/addresses announced via RA can have infinite lifetime. (and if address comparison currently relies on that, it is shaky there too)
+ if (!strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_AUTO)) I would use g_strcmp0 just to be sure. The is no problem in the current code, but if later someone set a "method" property beforehand then "if (!method && !nm_setting_ip6_config_get_method (setting))" can be FALSE and thus method can be NULL. For distinguishing static vs. dynamic routes we should use the rtm_protocol as suggested in comment #2. Lifetime is not determining, because it can be infinite both for static and dynamic routes. Also for addresses, I think, we should use IFA_F_PERMANENT flag instead of the lifetime.
(In reply to comment #4) > Also for addresses, I think, we should use IFA_F_PERMANENT flag instead of the > lifetime. I don't think, that helps, because when NM does SLAAC, it will configure the dynamic addresses in the same way as static addresses (and they are IFA_F_PERMANENT iff they have unlimited lifetime). We probably want to ignore IFA_F_TEMPORARY addresses for comparison though. And we might want to ignore those with IFA_F_NOPREFIXROUTE, because NM will set this flags for addresses from SLAAC. But this is a bit shaky again.
(In reply to comment #4) > + if (!strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_AUTO)) > I would use g_strcmp0 just to be sure. Fixed locally. Any other comments form anyone on the initial quick fix? I realized yesterday while talking with dcbw that this patch will break assumption of connections where you have method=auto, but also have explicit IPv6 routes. But that seems like a weird case. (However, if we wanted to, we could deal with it by adjusting nm_utils_match_connection() rather than adjusting nm_ip6_config_update_setting().) > For distinguishing static vs. dynamic routes we should use the rtm_protocol as > suggested in comment #2. So, we already have an NMPlatformSource in NMPlatformIP[46]Route. Can we translate that to/from rtm_protocol? NM_PLATFORM_SOURCE_UNKNOWN <-> RTPROC_UNSPEC NM_PLATFORM_SOURCE_KERNEL <-> RTPROC_KERNEL NM_PLATFORM_SOURCE_DHCP <-> RTPROC_DHCP NM_PLATFORM_SOURCE_RDISC <-> RTPROC_RA NM_PLATFORM_SOURCE_USER <-> RTPROC_STATIC NM_PLATFORM_SOURCE_KERNEL <- RTPROC_REDIRECT NM_PLATFORM_SOURCE_KERNEL <- RTPROC_BOOT NM_PLATFORM_SOURCE_UNKNOWN <- RTPROC_* (all other values) NM_PLATFORM_SOURCE_SHARED -> RTPROC_STATIC NM_PLATFORM_SOURCE_IP4LL -> RTPROC_STATIC NM_PLATFORM_SOURCE_PPP -> RTPROC_STATIC NM_PLATFORM_SOURCE_WWAN -> RTPROC_STATIC NM_PLATFORM_SOURCE_VPN -> RTPROC_STATIC I think this should work, as long as we make sure we're setting the source correctly on all routes we create, and we make sure to keep track of the NM-specific values (SHARED, IP4LL, etc) when merging in data from the kernel.
RTPROC -> RTPROT ? :) I didn't see this comment and created bug 729354 for the protocol thing and the RTM_F_CLONED. Maybe transfer the longer-term stuff there or combine the two bugs?
Sure, we can make this bug be just for the quick fix
*** Bug 729354 has been marked as a duplicate of this bug. ***
actually, since there's no reason to commit the quick fix to master if we know what we're doing for the slower fix, let's make this the real bug. from 729354: (In reply to comment #0) > NM needs to do a better job of excluding automatically-generated (RA, DHCP, > kernel) routes when matching connections at startup. To do that, we should do > a couple things: > > 1) track the route's "protocol" (rtnl_route_get_protocol() which returns > RTPROT_ defines) independently of the route's source (kernel/ra/etc) > > 2) set the right route protocol based on the internal route source (eg, > NM_PLATFORM_SOURCE_DHCP -> RTPROT_DHCP, NM_PLATFORM_SOURCE_RDISC -> RTPROT_RA) > where we have a mapping > > 3) maybe hijack an RTPROT define for stuff that doesn't have a mapping, so we > know it's been added dynamically by NetworkManager? > > 4) Then, when generating the connection in nm_device_generate_connection(), > when adding IP configuration, we somehow tell nm_ip4_config_update_setting() > not to add any routes that came from automatic sources like RA/DHCP/etc. > > We could refactor the 'source' flag to mean 'protocol' instead, and then simply > have a gboolean 'external' flag indicating that the route came from outside > NetworkManager. Thus we could differentiate between an externally-added > RA-protocol route (protocol=RA, source=external) and an NM-added RA-protocol > route (protocol=RA, source=internal). > > --- > > A second fix is to ignore IPv6 cache routes (flags & RTM_F_CLONED) at the same > place, when generating the connection. These routes are temporarily created by > the kernel when attempting to contact another host (even with ping6 when the > host doesn't exist) and are destroyed by the kernel shortly after. They > shouldn't be part of the initial connection generation. > > --- > > All this should happen in a generic manner to ensure that the "fake" platform > doesn't require rtnetlink #defines or dependencies.
see danw/bgo729203-ipv6-route-matching
pushed an updated branch with two additional patches: a1761f0 libnm-utils: remove INFERRABLE flag from properties we don't actually infer a8cd1a6 core: add debug logging to nm_utils_match_connection() (The last because AFAICT there is no reason why the downstream bug should still be failing.)
> >> platform: improve tracking of route sources > > in src/platform/tests/test-route.c, there are several no_error() calls at the > end of the line. This is not introduced by your patch, but could you move them > to a separate line? > > > > > Looking at source_to_rtprot()/source_to_rtprot(), I think we could encode all > or NMPlatformSource types to internal numbers. E.g. also > NM_PLATFORM_SOURCE_VPN. > > include/uapi/linux/rtnetlink.h says: > /* Values of protocol >= RTPROT_STATIC are not interpreted by kernel; > they are just passed from user and back as is. > It will be used by hypothetical multiple routing daemons. > Note that protocol values should be standardized in order to > avoid conflicts. > */ > > > > + switch (rtprot) { > + case RTPROT_UNSPEC: > + return NM_PLATFORM_SOURCE_UNKNOWN; > + case RTPROT_REDIRECT: > + case RTPROT_KERNEL: > + case RTPROT_BOOT: > + return NM_PLATFORM_SOURCE_KERNEL; > + case RTPROT_RA: > + return NM_PLATFORM_SOURCE_RDISC; > + case RTPROT_DHCP: > + return NM_PLATFORM_SOURCE_DHCP; > + > + default: > + return NM_PLATFORM_SOURCE_USER; > + } > > when you manually add a route with iproute2, it will set the protocol to > RTPRO_BOOT (unless you specify something else). So, this function maps > RTPROT_BOOT to NM_PLATFORM_SOURCE_KERNEL and "default" to > NM_PLATFORM_SOURCE_USER. I am not saying that this is wrong, just pointing out. > > Or how about: > > switch (rtprot) { > case RTPROT_REDIRECT: > case RTPROT_KERNEL: > case RTPROT_BOOT: > return NM_PLATFORM_SOURCE_KERNEL; > case RTPROT_RA: > return NM_PLATFORM_SOURCE_RDISC; > case RTPROT_DHCP: > return NM_PLATFORM_SOURCE_DHCP; > case RTPROT_STATIC: > return NM_PLATFORM_SOURCE_USER; > case RTPROT_UNSPEC: > default: > return NM_PLATFORM_SOURCE_UNKNOWN; > } > > > + if (source != NM_PLATFORM_SOURCE_UNKNOWN) > + rtnl_route_set_protocol (rtnlroute, source_to_rtprot (source)); > > if we don't set it, it will default to RTPROT_STATIC anyway... is that really > what we want? How about RTPROT_UNSPEC? > > Also, in the kernel (fib_table_delete()) it seem to me then when deleting a > IPv4 route, the protocol must correspond: > (!cfg->fc_protocol || > fi->fib_protocol == cfg->fc_protocol) && > maybe, in case of delete, we must set the protocol to zero? > If that is the case, I think > http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=019bf7512d0c8ece3e9051f609f10ecc1df7223a > could have added a regression -- by not correctly setting the protocol to zero.
(In reply to comment #13) Sorry, please ignore the leading "> ".
>> libnm-utils: remove INFERRABLE flag from properties we don't actually infer commit message: (Found while debugging https://bugzilla.redhat.com/show_bug.cgi?id=1086237, though this is not actually the problem there.) could you instead: Related: https://bugzilla.redhat.com/show_bug.cgi?id=1086237 because the script treats a "Related:" reference special. Rest looks good to me.
> in src/platform/tests/test-route.c, there are several no_error() calls at the > end of the line. This is not introduced by your patch, but could you move them > to a separate line? squashed into the whitespace-fixes patch > Looking at source_to_rtprot()/source_to_rtprot(), I think we could encode all > or NMPlatformSource types to internal numbers. E.g. also > NM_PLATFORM_SOURCE_VPN. > > include/uapi/linux/rtnetlink.h says: > /* Values of protocol >= RTPROT_STATIC are not interpreted by kernel; > they are just passed from user and back as is. > It will be used by hypothetical multiple routing daemons. > Note that protocol values should be standardized in order to > avoid conflicts. The last sentence suggests to me that we shouldn't just assign our own rtprot values. > when you manually add a route with iproute2, it will set the protocol to > RTPRO_BOOT (unless you specify something else). So, this function maps > RTPROT_BOOT to NM_PLATFORM_SOURCE_KERNEL and "default" to > NM_PLATFORM_SOURCE_USER. I am not saying that this is wrong, just pointing out. Hm... yes, the comments in rtnetlink.h don't really correspond to reality that well, do they. So RTPROT_BOOT should map to NM_PLATFORM_SOURCE_USER. > + if (source != NM_PLATFORM_SOURCE_UNKNOWN) > + rtnl_route_set_protocol (rtnlroute, source_to_rtprot (source)); > > if we don't set it, it will default to RTPROT_STATIC anyway... is that really > what we want? How about RTPROT_UNSPEC? I was worrying about the libnl layer; if we call rtnl_route_set_protocol(), then ce_mask gets ROUTE_ATTR_PROTOCOL, and so then it can only match against another rtnl_route that also has the correct protocol set, right? > Also, in the kernel (fib_table_delete()) it seem to me then when deleting a > IPv4 route, the protocol must correspond: Yes, possibly... NM never used to call rtnl_route_set_protocol() itself, and it looks like it defaults to RTPROT_STATIC, and when sending a delete message, the protocol gets included in the rtmsg regardless of whether ce_mask includes ROUTE_ATTR_PROTOCOL...
_route_match has: rtnl_route_get_protocol (rtnlroute) == RTPROT_KERNEL || I guess, we have to remove this restriction?
> platform: improve tracking of route sources update_ip6_routing() uses SOURCE_VPN for both the VPN and non-VPN case, I assume that's not what you wanted? update_ip4_routing() uses SOURCE_VPN/SOURCE_USER.
(In reply to comment #17) > _route_match has: > rtnl_route_get_protocol (rtnlroute) == RTPROT_KERNEL || > > I guess, we have to remove this restriction? I believe that was done so that we didn't blow away kernel routes, like the IPv6 cache routes that the kernel adds. Basically, anything the kernel added automatically was left alone, and NM wouldn't touch it.
(In reply to comment #19) > (In reply to comment #17) > > _route_match has: > > rtnl_route_get_protocol (rtnlroute) == RTPROT_KERNEL || > > > > I guess, we have to remove this restriction? > > I believe that was done so that we didn't blow away kernel routes, like the > IPv6 cache routes that the kernel adds. Basically, anything the kernel added > automatically was left alone, and NM wouldn't touch it. To expand on this point, NM will cache the "external" IP configuration (everything that NM didn't configure itself) for an interface and merge that with the other IP configurations (RA, DHCP, WWAN, VPN) before committing the configuration. The merged configuration is synced with the kernel, during which the kernel's routing is replaced by NM's view. Obviously that means that if something isn't in NM's view, it'll get blown away. NM at least used to (and I think still does?) ignore the kernel cache routes, because they are created every time an IPv6 host is contacted, even for a ping. Which means you might have 10 or 20 or 50 of them at a time, until they expire. That makes the 'nmcli' and UI client routing views pretty useless since they have a bunch of routes that are mostly noise. But if we removed that 'kernel' check, then I think NM would blow those routes away when it synced the merged config to the kernel, which we don't want. It's true that cached routes have a CACHE attribute, and so we could detect them differently than we do now, but I'm a bit worried about other kernel-created routes getting deleted unexpectedly.
We should probably get dhclient-script updated to set DHCP protocol on the routes too. Which means we should also make sure systemd-networkd gets updated, since they seem to use RTPROT_BOOT everywhere, even for DHCP and IPv4LL, which doesn't make a lot of sense.
Created attachment 277347 [details] [review] dhclient-script-route-protocol.patch Fedora dhclient candidate patch which sets RTPROT_DHCP on IPv4 routes. dhclient-script does not yet handle IPv6 routes, because those are usually delivered via RA instead.
Created attachment 277349 [details] [review] 0001-networkd-set-route-protocol-correctly.patch systemd-networkd apparently uses RTPROT_BOOT for everything, which seems pretty wrong. This patch attempts to fix that and use the right protocol for user-defined routes from config files, and for DHCP.
(In reply to comment #22) > Created an attachment (id=277347) [details] [review] > dhclient-script-route-protocol.patch > > Fedora dhclient candidate patch which sets RTPROT_DHCP on IPv4 routes. > dhclient-script does not yet handle IPv6 routes, because those are usually > delivered via RA instead. You did choose not to set the protocol for the default route? Ok, sounds fine. Looks good to me.
(In reply to comment #23) > Created an attachment (id=277349) [details] [review] > 0001-networkd-set-route-protocol-correctly.patch > > systemd-networkd apparently uses RTPROT_BOOT for everything, which seems pretty > wrong. This patch attempts to fix that and use the right protocol for > user-defined routes from config files, and for DHCP. LGTM too
(In reply to comment #16) > So RTPROT_BOOT should map to NM_PLATFORM_SOURCE_USER. fixed > > Also, in the kernel (fib_table_delete()) it seem to me then when deleting a > > IPv4 route, the protocol must correspond: rebased around your fix (In reply to comment #18) > update_ip6_routing() uses SOURCE_VPN for both the VPN and non-VPN case, I > assume that's not what you wanted? update_ip4_routing() uses > SOURCE_VPN/SOURCE_USER. fixed re-pushed
Comment on attachment 277349 [details] [review] 0001-networkd-set-route-protocol-correctly.patch >Subject: [PATCH] networkd: set route protocol correctly > >All routes added by networkd are currently set RTPROT_BOOT, which according >to the kernel means "Route installed during boot" (rtnetlink.h). But this >is not always the case, and the kernel provides more detailed protocols >that should be used instead. One could also argue that RTPROT_BOOT is somewhat >under-specified, since that the property of where the route came from (eg, >RA, DHCP, user config) is more relevant than when the route was installed. The name and the comment in rtnetlink.h are basically just wrong. Currently RTPROT_BOOT is the default value for unicast routes created from userland, and RTPROT_STATIC is the default value for multicast routes created from userland. This is probably something we should point out to the kernel network guys...
(In reply to comment #26) Looks good to me
So in testing today, I found one issue with IPv6 routing setup. At some point when the first RA comes in, something adds a host route to the DNS/DHCP server outside NetworkManager: NetworkManager[15324]: <debug> [1401298205.973219] [platform/nm-linux-platform.c:1631] event_notification(): netlink event (type 24) NetworkManager[15324]: <debug> [1401298205.973692] [platform/nm-linux-platform.c:403] get_kernel_object(): get_kernel_object for type 5 returned 0x2068c00 inet6 cache 3ffe:b80:17e2::1 type unicast scope global priority 0 protocol unspec nexthop dev 3 <> NetworkManager[15324]: <debug> [1401298205.973912] [platform/nm-platform.c:2478] log_ip6_route(): signal: route 6 added: 3ffe:b80:17e2::1/128 via :: dev enp0s25 metric 0 mss 0 src unknown note the 'metric 0' and 'protocol unspec' (which this branch translates to UNKNOWN, which is fine). That gets added to the "EXT" IP6 configuration: NetworkManager[15324]: --------- NMIP6Config 0x2070460 (EXT) NetworkManager[15324]: a: fe80::223:5aff:fe47:1f71/64 lft 4294967295 pref 4294967295 time 46 dev enp0s25 flags permanent src kernel NetworkManager[15324]: rt: 3ffe:b80:17e2::1/128 via :: dev enp0s25 metric 0 mss 0 src unknown But then later when NM goes to sync the IP configuration to the interface, nm_ip6_config_commit() does some dances with the metric to preserve routing priority if two interfaces are on the same subnet: /* Use the default metric only if the route was created by NM and * didn't already specify a metric. */ if (route.source != NM_PLATFORM_SOURCE_KERNEL && !route.metric) route.metric = priority ? priority : NM_PLATFORM_ROUTE_METRIC_DEFAULT; This causes the metric 0 route to get overwritten with a metric 1 route, and NM substitutes (for some reason) SOURCE_USER for the source/protocol. So there are two issues: 1) we need to expand that SOURCE_KERNEL check, to ensure that only routes created internally by NM get their metric changed. 2) because the route is getting re-added (which should be fixed by #1) with RTPROT_UNSPEC, it gets through to build_rtnl_route(), which preserves the UNSPEC. But then the kenrel sees the UNSPEC in ip6_route_add() and: if (cfg->fc_protocol == RTPROT_UNSPEC) cfg->fc_protocol = RTPROT_BOOT; which translates to static maybe? Not sure of the full path there. SO I think we need to make sure we compensate for the kernel conversion when needed?
pushed two fixups. try it out? (In reply to comment #29) > 2) because the route is getting re-added (which should be fixed by #1) with > RTPROT_UNSPEC, it gets through to build_rtnl_route(), which preserves the > UNSPEC. But then the kenrel sees the UNSPEC in ip6_route_add() and: > > if (cfg->fc_protocol == RTPROT_UNSPEC) > cfg->fc_protocol = RTPROT_BOOT; > > which translates to static maybe? Not sure of the full path there. SO I think > we need to make sure we compensate for the kernel conversion when needed? No... if NM tried to add a route with NM_PLATFORM_SOURCE_UNKNOWN, then build_rtnl_route() would not call rtnl_route_set_protocol() at all, (which is the right thing for matching purposes [ie, when deleting a route], because that means the protocol won't be compared when matching). But in the case of adding a route, it means the rtnl_route keeps its default protocol value of RTPROT_STATIC, so that's what the route would be created with. The right thing may just be to make it illegal to pass SOURCE_UNKNOWN to nm_platform_ip[46]_route_add(), since NM shouldn't ever actually be doing that.
(In reply to comment #30) > pushed two fixups. try it out? > > (In reply to comment #29) > > 2) because the route is getting re-added (which should be fixed by #1) with > > RTPROT_UNSPEC, it gets through to build_rtnl_route(), which preserves the > > UNSPEC. But then the kenrel sees the UNSPEC in ip6_route_add() and: > > > > if (cfg->fc_protocol == RTPROT_UNSPEC) > > cfg->fc_protocol = RTPROT_BOOT; > > > > which translates to static maybe? Not sure of the full path there. SO I think > > we need to make sure we compensate for the kernel conversion when needed? > > No... if NM tried to add a route with NM_PLATFORM_SOURCE_UNKNOWN, then > build_rtnl_route() would not call rtnl_route_set_protocol() at all, (which is > the right thing for matching purposes [ie, when deleting a route], because that > means the protocol won't be compared when matching). But in the case of adding > a route, it means the rtnl_route keeps its default protocol value of > RTPROT_STATIC, so that's what the route would be created with. > > The right thing may just be to make it illegal to pass SOURCE_UNKNOWN to > nm_platform_ip[46]_route_add(), since NM shouldn't ever actually be doing that. Well, unless the actual route that comes in from the kernel is UNSPEC? What do we do then? The route I was seeing: inet6 cache 3ffe:b80:17e2::1 type unicast scope global priority 0 protocol unspec actually is unspec somehow, and I feel like NM should preserve that if it can. Which means calling rtnl_route_set_protocol(UNSPEC) when adding routes, but obviously not when deleting?
So RE the last fix in the branch: -if (route.source != NM_PLATFORM_SOURCE_KERNEL && !route.metric) +if (route.source == NM_PLATFORM_SOURCE_USER && !route.metric) one issue I see with that is that I think we *do* want routes that NM gets from RA/DHCP to be added with the right metric, to ensure that we don't have the routes for two different interfaces connected to the same subnet (eg wired & wifi) with the same metric. But I don't think this change will preserve that, so we'd end up with both wired & wifi A quick fix would be to change this to !(KERNEL || UNSPEC), to ensure we set the device metric on any automatically created routes. Unfortunately, because we're not caching "NM created" anywhere, this would delete/re-add external routes with RTPROT_* values that NM doesn't recognize though (GATED, ZEBRA, etc) right?
(In reply to comment #31) > (In reply to comment #30) > > No... if NM tried to add a route with NM_PLATFORM_SOURCE_UNKNOWN, then > > build_rtnl_route() would not call rtnl_route_set_protocol() at all, (which is > > the right thing for matching purposes [ie, when deleting a route], because that > > means the protocol won't be compared when matching). But in the case of adding > > a route, it means the rtnl_route keeps its default protocol value of > > RTPROT_STATIC, so that's what the route would be created with. > > > > The right thing may just be to make it illegal to pass SOURCE_UNKNOWN to > > nm_platform_ip[46]_route_add(), since NM shouldn't ever actually be doing that. > > Well, unless the actual route that comes in from the kernel is UNSPEC? What do > we do then? The route I was seeing: > > inet6 cache 3ffe:b80:17e2::1 type unicast > scope global priority 0 protocol unspec > > actually is unspec somehow, and I feel like NM should preserve that if it can. > Which means calling rtnl_route_set_protocol(UNSPEC) when adding routes, but > obviously not when deleting? regarding rtnl_route_set_protocol() in build_rtnl_route(): For deleting ipv4 routes, this is already correct because ip4_route_delete() sets the protocol to 0 (meaning: ignore for matching). For deleting ipv6 routes, I *think* that it is already correct, because kernel ignores the protocol in that case (is this really correct?) For lookup, in ip_route_exists() this is irrelevant, because libnl-cache does not consider the protocol for cache lookup. And for adding IPv4/IPv6 routes, this effectively means the same as: if (source != NM_PLATFORM_SOURCE_UNKNOWN) rtnl_route_set_protocol (rtnlroute, source_to_rtprot (source)); else rtnl_route_set_protocol (rtnlroute, RTPROT_STATIC); which I think is wrong.
(In reply to comment #30) > pushed two fixups. try it out? > > (In reply to comment #29) > > 2) because the route is getting re-added (which should be fixed by #1) with > > RTPROT_UNSPEC, it gets through to build_rtnl_route(), which preserves the > > UNSPEC. But then the kenrel sees the UNSPEC in ip6_route_add() and: > > > > if (cfg->fc_protocol == RTPROT_UNSPEC) > > cfg->fc_protocol = RTPROT_BOOT; > > > > which translates to static maybe? Not sure of the full path there. SO I think > > we need to make sure we compensate for the kernel conversion when needed? > > No... if NM tried to add a route with NM_PLATFORM_SOURCE_UNKNOWN, then > build_rtnl_route() would not call rtnl_route_set_protocol() at all, (which is > the right thing for matching purposes [ie, when deleting a route], because that > means the protocol won't be compared when matching). But in the case of adding > a route, it means the rtnl_route keeps its default protocol value of > RTPROT_STATIC, so that's what the route would be created with. > > The right thing may just be to make it illegal to pass SOURCE_UNKNOWN to > nm_platform_ip[46]_route_add(), since NM shouldn't ever actually be doing that. Maybe I was unclear in the previous comment 33. The default value of rtnl_route_get_protocol is RTPROT_STATIC. Not setting it on NM_PLATFORM_SOURCE_UNKNOWN means the same as setting it to RTPROT_STATIC -- this is a libnl thing.
(In reply to comment #34) > (In reply to comment #30) > > pushed two fixups. try it out? > > > > (In reply to comment #29) > > > 2) because the route is getting re-added (which should be fixed by #1) with > > > RTPROT_UNSPEC, it gets through to build_rtnl_route(), which preserves the > > > UNSPEC. But then the kenrel sees the UNSPEC in ip6_route_add() and: > > > > > > if (cfg->fc_protocol == RTPROT_UNSPEC) > > > cfg->fc_protocol = RTPROT_BOOT; > > > > > > which translates to static maybe? Not sure of the full path there. SO I think > > > we need to make sure we compensate for the kernel conversion when needed? > > > > No... if NM tried to add a route with NM_PLATFORM_SOURCE_UNKNOWN, then > > build_rtnl_route() would not call rtnl_route_set_protocol() at all, (which is > > the right thing for matching purposes [ie, when deleting a route], because that > > means the protocol won't be compared when matching). But in the case of adding > > a route, it means the rtnl_route keeps its default protocol value of > > RTPROT_STATIC, so that's what the route would be created with. > > > > The right thing may just be to make it illegal to pass SOURCE_UNKNOWN to > > nm_platform_ip[46]_route_add(), since NM shouldn't ever actually be doing that. > > Maybe I was unclear in the previous comment 33. The default value of > rtnl_route_get_protocol is RTPROT_STATIC. Not setting it on > NM_PLATFORM_SOURCE_UNKNOWN means the same as setting it to RTPROT_STATIC -- > this is a libnl thing. Ah ok, so if we do get a legitimate external route with RTPROT_UNPSEC, NetworkManager needs to explicitly set RTPROT_UNSPEC when adding the route with libnl?
(In reply to comment #35) > (In reply to comment #34) > > (In reply to comment #30) > > > pushed two fixups. try it out? > > > > > > (In reply to comment #29) > > > > 2) because the route is getting re-added (which should be fixed by #1) with > > > > RTPROT_UNSPEC, it gets through to build_rtnl_route(), which preserves the > > > > UNSPEC. But then the kenrel sees the UNSPEC in ip6_route_add() and: > > > > > > > > if (cfg->fc_protocol == RTPROT_UNSPEC) > > > > cfg->fc_protocol = RTPROT_BOOT; > > > > > > > > which translates to static maybe? Not sure of the full path there. SO I think > > > > we need to make sure we compensate for the kernel conversion when needed? > > > > > > No... if NM tried to add a route with NM_PLATFORM_SOURCE_UNKNOWN, then > > > build_rtnl_route() would not call rtnl_route_set_protocol() at all, (which is > > > the right thing for matching purposes [ie, when deleting a route], because that > > > means the protocol won't be compared when matching). But in the case of adding > > > a route, it means the rtnl_route keeps its default protocol value of > > > RTPROT_STATIC, so that's what the route would be created with. > > > > > > The right thing may just be to make it illegal to pass SOURCE_UNKNOWN to > > > nm_platform_ip[46]_route_add(), since NM shouldn't ever actually be doing that. > > > > Maybe I was unclear in the previous comment 33. The default value of > > rtnl_route_get_protocol is RTPROT_STATIC. Not setting it on > > NM_PLATFORM_SOURCE_UNKNOWN means the same as setting it to RTPROT_STATIC -- > > this is a libnl thing. > > Ah ok, so if we do get a legitimate external route with RTPROT_UNPSEC, > NetworkManager needs to explicitly set RTPROT_UNSPEC when adding the route with > libnl? If we allocate a fresh rtnl_route, the protocol will initialized to RTPROT_STATIC. Although the "ce_mask" indicates that the protocol is unset, when creating a netlink message RTM_NEWROUTE/RTM_DELROUTE for the kernel, the field "rtm_protocol" of "struct rtmsg" will always be set. That is because rtm_protocol is not an optional attribute in the netlink message. I think, we should set it to RTPROT_UNSPEC in case of NM_PLATFORM_SOURCE_UNKNOWN. Just remove the "if()".
ok, retesting with the if() removed. (and pushed a fixup for that too). assuming testing goes well, is everyone happy with the current state?
(In reply to comment #37) > ok, retesting with the if() removed. (and pushed a fixup for that too). > assuming testing goes well, is everyone happy with the current state? ACK from my side
(In reply to comment #37) > ok, retesting with the if() removed. (and pushed a fixup for that too). > assuming testing goes well, is everyone happy with the current state? So that part looks good, but there's still the issue of metric not getting set for NM-created RA and DHCP routes in nm_ip[4|6]_config_commit(): -if (route.source != NM_PLATFORM_SOURCE_KERNEL && !route.metric) +if (route.source == NM_PLATFORM_SOURCE_USER && !route.metric) route.metric = priority ? priority : NM_PLATFORM_ROUTE_METRIC_DEFAULT; g_array_append_val (routes, route); since routes NM gets from RA/DHCP won't be SOURCE_USER, they won't get the right metric based on the NMDevice priority. A quick solution for that could be a gboolean 'nm_created' field in __NMPlatformIPRoute_COMMON which gets set to TRUE when the route comes from NM internally or from the NMSetting. Then we could just do: if (route.nm_created && !route.metric) route.metric = priority ? priority : NM_PLATFORM_ROUTE_METRIC_DEFAULT; (if we had accessors for NMPlatformIPRoute, I would have used bitfields and set the high bit on 'source' to indicate NM-created, but we don't have accessors, so we have to make NMPlatformIPRoute bigger...)
pushed these two to master: libnm-utils: remove INFERRABLE flag from properties we don't actually infer core: add debug logging to nm_utils_match_connection() since they're not really related to IPv6 routes anyway (In reply to comment #39) > So that part looks good, but there's still the issue of metric not getting set > for NM-created RA and DHCP routes in nm_ip[4|6]_config_commit(): It seems to me that the problem here is really that we're using metric=0 to mean both "unset" and "actually 0", depending on context. I've pushed danw/ipv6-route-match-alt which has a new commit "core: set route metrics earlier" where we actually set all the route metrics correctly from the start, rather than trying to figure out which ones we need to fix up later. What do you think of that? This also points out that VPN route metrics are currently a bit odd, so we might want to adjust them a bit...
I like danw/ipv6-route-match-alt. For VPN I'm not sure if we care about the parent device metric actually, since the VPN routing doesn't have much to do with the parent device routing. But I guess the VPN routes should be a higher priority than the device routes, because otherwise you might have a VPN route 192.168/16 that was lower priority than a device route, and that means you wouldn't be using the secure route that the administrator specified. Not sure how to do that easily though, since we start Ethernet at priority 1... Right now, VPN routes get the default metric (1024) so we might as well start there, and then later clean up the VPN priorities?
(In reply to comment #41) > Not sure how to do that easily though, since we start Ethernet at priority 1... There's no reason we can't just increment each value in nm_device_get_priority(), and then reserve 1 for VPNs, is there? > Right now, VPN routes get the default metric (1024) so we might as well start > there, and then later clean up the VPN priorities? Routes added to the VPN device itself get committed by the config_commit() calls in nm_vpn_connection_apply_config() with priority 0 (aka default), but routes added to the parent device config get committed by the parent device in nm_device_set_ip[46]_config() with the device priority: 10.0.0.0/8 dev tun0 proto static scope link metric 1024 66.187.233.55 via 192.168.1.1 dev wlp3s0 proto static metric 10 The patch attempts to preserve the current behavior exactly, even though it means that routing-only and tunnel-based VPNs behave differently. As you say, we can fix this up later.
(In reply to comment #42) > The patch attempts to preserve the current behavior exactly Except that vpn_routing_metric() had its logic reversed. Fixed now.
looking at: danw/ipv6-route-match-alt >> core: set route metrics earlier vpn_routing_metric() looks wrong to me, but your previous comments indicate that it's as desired. >> platform: improve tracking of route sources You could now remove the explict rtnl_route_set_protocol() call in ip4_route_delete(), since now build_rtnl_route() already sets it protocol 0. But it could stay for clarity. Rest LGTM
(In reply to comment #42) > (In reply to comment #41) > > Not sure how to do that easily though, since we start Ethernet at priority 1... > > There's no reason we can't just increment each value in > nm_device_get_priority(), and then reserve 1 for VPNs, is there? Totally right, I'm dumb. Though do we consider that some kind of notable behavior change now that all ethernet routes would be 2 instead of 1? In any case, we're punting this for later, but that would certainly be a simple course of action since we'd be changing priorities somehow anyway. > > Right now, VPN routes get the default metric (1024) so we might as well start > > there, and then later clean up the VPN priorities? > > Routes added to the VPN device itself get committed by the config_commit() > calls in nm_vpn_connection_apply_config() with priority 0 (aka default), but > routes added to the parent device config get committed by the parent device in > nm_device_set_ip[46]_config() with the device priority: > > 10.0.0.0/8 dev tun0 proto static scope link metric 1024 > 66.187.233.55 via 192.168.1.1 dev wlp3s0 proto static metric 10 > > The patch attempts to preserve the current behavior exactly, even though it > means that routing-only and tunnel-based VPNs behave differently. As you say, > we can fix this up later. Yeah, lets stick with current behavior and fix up later since it's not really "broken" just sub-optimal.
The current branch looks good to me.