GNOME Bugzilla – Bug 740064
[review] lr/route-manager-rh740064: better manage clashing routes in NM by adding NMRouteManager
Last modified: 2015-04-08 12:53:38 UTC
The kernel does not allow adding the same route twice, whereas a duplicate route means "same destination/prefix, metric". Currently each device itself adds and removes routes at its own discretion. This can easily lead to clashes. For example: have connect "em1" and "em2", both have a manually configured route "192.168.1.0/24, metric 25". Whenever NM syncs the routes for these connections to platform, they will overwrite the respectively other route. If you activate first "em1", then "em2", the route will be assigned to "em2". That is not immediately wrong (there is a clash after all), but the following examples show that we need to be more careful: (a) at first the route was assigned to "em1". Activating "em2" should not reassign the route to another interface. Instead, in face of identical metrics, the route should stick to "em1" until it gets removed from there (or "em1" deactivates). (b) when receiving a DHCP update on "em1", NM will resync the routes for "em1", and the route switches again (at random times). (c) when deactivating the interface that accidentally had the route assigned, we will not re-add it to the respectively other interface -- not until some random event causes the routes to resync on the other interface again. Note that for default routes this is already properly handled by the NMDefaultRouteManager. But there is a related issue: see https://bugzilla.gnome.org/show_bug.cgi?id=735512#c41 Another issue is together with bug 723178. At this moment we let kernel add the subnet routes for the addresses. But the proposed patch for bug 723178 will manually re-add these routes. The same clash can happen there. In this case it's especially bad. Imagine two DHCP ethernets connected to the same LAN. Both connections will get addresses from the same subnet and their subnet routes will constantly toggle between these devices. The work-around for all of the above is of course to manually adjust the metric of one of the connections to avoid the clash!! And don't configure identical metrics. I think the proper solution is not to call nm_platform_ipx_route_add()/delete() directly. Instead have a global route manager instance, and devices just call nm_route_manage_ip4_route_add()/delete(). This route manager remembers all routes that NM currently wants to have configured but keeping track of having when having the same route assigned to different ifindexes. In case of such clashes, it chooses one (the oldest entry) to be synced.
I think it's ready for a review now: http://cgit.freedesktop.org/NetworkManager/NetworkManager/log/?h=lr/route-manager-rh740064
basically everything looks good > core: split route management code out from platform >+gboolean nm_route_manager_ip4_route_sync (NMRouteManager *, int, const GArray *); >+gboolean nm_route_manager_ip6_route_sync (NMRouteManager *, int, const GArray *); >+gboolean nm_route_manager_route_flush (NMRouteManager *, int); Please use argument names in the prototypes. Makes everything more self-documenting. > core: pass routes with valid ipindex to the route-manager > g_array_append_vals (routes, route, 1); >+ g_array_index (routes, NMPlatformIP4Route, routes->len - 1).ifindex = ifindex; That's kind of ugly, and makes me wonder why we don't just set ifindex in NMIP4Config's route array too. And the answer that is that NMIP4Config doesn't know the ifindex of the device it's being used with, but there's no reason it couldn't; we never create an NMIP4Config other than for a specific device, so we could just pass the ifindex as an argument to nm_ip4_config_new()... > route-manager: remember routes that should be active > Kernel likes to remove a route in case an equivalend route is added to another typo "equivalend" >+ memcpy (&route, &g_array_index (routes, NMPlatformIP4Route, i), >+ sizeof (NMPlatformIP4Route)); This should work: route = g_array_index (routes, NMPlatformIP4Route, i); but you could also avoid the copy by just making route be a pointer. > route-manager: add test >+ g_assert_cmpint (routes->len, ==, sizeof(state1)/sizeof(state1[0])); g_assert_cmpint (routes->len, ==, G_N_ELEMENTS (state1));
>> route-manager: remember routes that should be active +dispose (GObject *object) +{ + NMRouteManagerPrivate *priv = NM_ROUTE_MANAGER_GET_PRIVATE (object); + + g_array_free (priv->ip4_routes, TRUE); + g_array_free (priv->ip6_routes, TRUE); must clear the fields, otherwise dispose() is not reentrant. Or clear those fields only later in finalize(). Pushed one fixup! commit
(In reply to comment #2) > basically everything looks good > > > > core: split route management code out from platform > > >+gboolean nm_route_manager_ip4_route_sync (NMRouteManager *, int, const GArray *); > >+gboolean nm_route_manager_ip6_route_sync (NMRouteManager *, int, const GArray *); > >+gboolean nm_route_manager_route_flush (NMRouteManager *, int); > > Please use argument names in the prototypes. Makes everything more > self-documenting. Fixed. > > core: pass routes with valid ipindex to the route-manager > > > g_array_append_vals (routes, route, 1); > >+ g_array_index (routes, NMPlatformIP4Route, routes->len - 1).ifindex = ifindex; > > That's kind of ugly, and makes me wonder why we don't just set ifindex in > NMIP4Config's route array too. And the answer that is that NMIP4Config doesn't > know the ifindex of the device it's being used with, but there's no reason it > couldn't; we never create an NMIP4Config other than for a specific device, so > we could just pass the ifindex as an argument to nm_ip4_config_new()... Hm, at least NMModemBroadband's static_stage3_ip4_done() doesn't seem to be aware of an interface number and it's rather unclear to me how to make it aware. The rest seems doable, but I'm not convinced if it's a good idea -- it would involve getting ifindex from an interface name in a handful of times (e.g. in the dhcp client code) and that seems somewhat ugly to me too. I've left it as it was now but I'll be thankful for an opinion on the above. > > route-manager: remember routes that should be active > > > Kernel likes to remove a route in case an equivalend route is added to another > > typo "equivalend" Fixed. > >+ memcpy (&route, &g_array_index (routes, NMPlatformIP4Route, i), > >+ sizeof (NMPlatformIP4Route)); > > This should work: > > route = g_array_index (routes, NMPlatformIP4Route, i); > > but you could also avoid the copy by just making route be a pointer. Thanks; I've used the assignment, I don't believe the memory management associated with the pointers is justified here. > > route-manager: add test > > >+ g_assert_cmpint (routes->len, ==, sizeof(state1)/sizeof(state1[0])); > > g_assert_cmpint (routes->len, ==, G_N_ELEMENTS (state1)); Neat. Applied. (In reply to comment #3) > >> route-manager: remember routes that should be active > > +dispose (GObject *object) > +{ > + NMRouteManagerPrivate *priv = NM_ROUTE_MANAGER_GET_PRIVATE (object); > + > + g_array_free (priv->ip4_routes, TRUE); > + g_array_free (priv->ip6_routes, TRUE); > > must clear the fields, otherwise dispose() is not reentrant. Or clear those > fields only later in finalize(). This indeed seems like a job for finalize. Fixed. > Pushed one fixup! commit Thank you; applied it. Pushed an updated lr/route-manager-rh740064
>> route-manager: remember routes that should be active - g_array_remove_index (routes, i); + g_array_remove_index_fast (routes, i); Now "/* Delete unknown routes */" doesn't look at the routes that are actually configured in system. I think that is wrong. The code still needs to to fetch the routes that are actually present (nm_platform_ip4_route_get_all), and remove the ones that shouldn't be there.
- if (!array_contains_ip4_route (routes, known_route)) { + if (!array_get_ip4_route (routes, 0, known_route)) { s/0/known_route->ifindex/ Then also "g_array_append_val (routes, *known_route);" moves inside the if. I think, you might have *many* @routes, but only few @known_routes that need to be added. How about constructing a list of @routes_to_add in the first loop "while (i < routes->len)".
(In reply to comment #5) > >> route-manager: remember routes that should be active > > > - g_array_remove_index (routes, i); > + g_array_remove_index_fast (routes, i); We need the list to be ordered chronologically, so that the route that was configured first always "wins". > Now "/* Delete unknown routes */" doesn't look at the routes that are actually > configured in system. I think that is wrong. > > The code still needs to to fetch the routes that are actually present > (nm_platform_ip4_route_get_all), and remove the ones that shouldn't be there. Fixed. (In reply to comment #6) > - if (!array_contains_ip4_route (routes, known_route)) { > + if (!array_get_ip4_route (routes, 0, known_route)) { > > s/0/known_route->ifindex/ > Then also "g_array_append_val (routes, *known_route);" moves inside the if. I believe it's correct as it is -- we don't want to add a route if a similar one is present on *any* interface as it would cause the older one to be removed. Pushed an updated lr/route-manager-rh740064
(In reply to comment #4) > (In reply to comment #2) > > basically everything looks good > > > > > > > core: split route management code out from platform > > > > >+gboolean nm_route_manager_ip4_route_sync (NMRouteManager *, int, const GArray *); > > >+gboolean nm_route_manager_ip6_route_sync (NMRouteManager *, int, const GArray *); > > >+gboolean nm_route_manager_route_flush (NMRouteManager *, int); > > > > Please use argument names in the prototypes. Makes everything more > > self-documenting. > > Fixed. > > > > core: pass routes with valid ipindex to the route-manager > > > > > g_array_append_vals (routes, route, 1); > > >+ g_array_index (routes, NMPlatformIP4Route, routes->len - 1).ifindex = ifindex; > > > > That's kind of ugly, and makes me wonder why we don't just set ifindex in > > NMIP4Config's route array too. And the answer that is that NMIP4Config doesn't > > know the ifindex of the device it's being used with, but there's no reason it > > couldn't; we never create an NMIP4Config other than for a specific device, so > > we could just pass the ifindex as an argument to nm_ip4_config_new()... > > Hm, at least NMModemBroadband's static_stage3_ip4_done() doesn't seem to be > aware of an interface number and it's rather unclear to me how to make it > aware. The rest seems doable, but I'm not convinced if it's a good idea -- it > would involve getting ifindex from an interface name in a handful of times > (e.g. in the dhcp client code) and that seems somewhat ugly to me too. I'd suggested adding an ifindex to NMIPConfig when Thomas was doing the NMDefaultRouteManager patches because I wasn't too happy with the DRM having to know about NMDevice or NMVPNConnection objects (still not too happy). My thought was that since the NMIPConfig is only ever applied to a single interface, the ifindex is clearly applicable to the IPConfig object too, and should really be set-once property of it. So I'm cool with this and it might make a lot more code NMDevice/NMVPNConnection agnostic. For the NMModemBroadband code, connect_ready() creates the IPConfig objects, and it knows the interface name already, so we could clearly set the NMIPConfig ifindex from there with a simple lookup of interface name -> ifindex. Also for the DHCP code, every NMDHCPClient gets an ifindex at construct time, so we should be good there, right? For the PPP manager we would have to do a ifindex lookup, but we also have the interface name there since it's delivered to NetworkManager at the same time as the IP configuration. So +1 on carrying the ifindex around with the IP4Config. Other things we could add to make NMDefaultRouteManager simpler at the same time (or later): * a 'const char *desc' parameter that would carry either the interface name (if created by NMDevice, WWAN, PPP, or DHCP) or the VPN connection name (if created by a VPN). * a default route metric (set from nm_device_get_ip4_route_metric() or nm_vpn_connection_get_ipX_route_metric())
(In reply to comment #4) > > we could just pass the ifindex as an argument to nm_ip4_config_new()... > > Hm, at least NMModemBroadband's static_stage3_ip4_done() doesn't seem to be > aware of an interface number ok, scratch that idea for now then > > >+ memcpy (&route, &g_array_index (routes, NMPlatformIP4Route, i), > > >+ sizeof (NMPlatformIP4Route)); > > > > This should work: > > > > route = g_array_index (routes, NMPlatformIP4Route, i); > > > > but you could also avoid the copy by just making route be a pointer. > > Thanks; I've used the assignment, I don't believe the memory management > associated with the pointers is justified here. No, I just meant you could make route a pointer, and add a "&" to the assignment: route = &g_array_index (routes, NMPlatformIP4Route, i); and then just use "route->foo" rather than "route.foo" after that. No memory management required.
>> route-manager: remember routes that should be active 1.) In the second loop ("/* Add missing routes */"), route-manager will now also add routes to other interfaces than ifindex. In the previous loop, that is probably correct, because we want to move the route on another interface. In the second loop, I don't see why we want to touch the other interface. 2.) Also, after "/* Ignore routes that already exist */" the route manager only adds @route, if it is not yet known (inside @routes). I could imagine, that if you remove the route externally, route-manager would not readd it, because it thinks it's still there. I think, the loop should check @plat_routes, and add it if it is not there. 3.) Also, if only e.g. the gateway changes. In this case, you would not update the route at all, neither in @routes, nor in platform. 4.) You could optimize the following part a bit: best_route = array_get_ip4_route (routes, 0, known_route); if (!best_route) { success = nm_platform_ip4_route_add (known_route->ifindex, known_route->source, known_route->network, known_route->plen, known_route->gateway, 0, known_route->metric, known_route->mss); /*snip*/ } if ( !(best_route && best_route->ifindex == ifindex) && !array_get_ip4_route (routes, known_route->ifindex, known_route)) g_array_append_val (routes, *known_route); 5.) The first loop moves a route to another interface. Just note that in general you cannot add a non-direct route, if there is no route to the gateway. What if there are two conflicting routes: 1.2.3.4/32 via 10.0.0.1 metric 0 ifindex 1 10.0.0.1/32 via 0.0.0.0 metric 0 ifindex 1 1.2.3.4/32 via 10.0.0.1 metric 0 ifindex 2 10.0.0.1/32 via 0.0.0.0 metric 0 ifindex 2 If the first loop first processes 1.2.3.4/32, it would move the route to ifindex 2, while the direct route to the gateway still goes via ifindex 1. Does this even succeed? The direct route is only changed later. Is that a problem then?
I am thinking, maybe it would be worth to implement the route-manager tests also based on fake AND linux platform? There is a lot of complexity in NMLinuxPlatform, and this might be a good way to reuse other test to test linux platform too. Also, this would force us to make fake platform more similar to actual platform (which would improve our fake platform to make it more usable as a real stub). For example, in fake platform you can add routes without having a link or address. Also, the tests should explicitly test the metrics 0 (both for IPv4 and IPv6). These are special corner cases.
(In reply to comment #8) > (In reply to comment #4) > > (In reply to comment #2) > > > basically everything looks good > > > > > > > > > > core: split route management code out from platform > > > > > > >+gboolean nm_route_manager_ip4_route_sync (NMRouteManager *, int, const GArray *); > > > >+gboolean nm_route_manager_ip6_route_sync (NMRouteManager *, int, const GArray *); > > > >+gboolean nm_route_manager_route_flush (NMRouteManager *, int); > > > > > > Please use argument names in the prototypes. Makes everything more > > > self-documenting. > > > > Fixed. > > > > > > core: pass routes with valid ipindex to the route-manager > > > > > > > g_array_append_vals (routes, route, 1); > > > >+ g_array_index (routes, NMPlatformIP4Route, routes->len - 1).ifindex = ifindex; > > > > > > That's kind of ugly, and makes me wonder why we don't just set ifindex in > > > NMIP4Config's route array too. And the answer that is that NMIP4Config doesn't > > > know the ifindex of the device it's being used with, but there's no reason it > > > couldn't; we never create an NMIP4Config other than for a specific device, so > > > we could just pass the ifindex as an argument to nm_ip4_config_new()... > > > > Hm, at least NMModemBroadband's static_stage3_ip4_done() doesn't seem to be > > aware of an interface number and it's rather unclear to me how to make it > > aware. The rest seems doable, but I'm not convinced if it's a good idea -- it > > would involve getting ifindex from an interface name in a handful of times > > (e.g. in the dhcp client code) and that seems somewhat ugly to me too. > > I'd suggested adding an ifindex to NMIPConfig when Thomas was doing the > NMDefaultRouteManager patches because I wasn't too happy with the DRM having to > know about NMDevice or NMVPNConnection objects (still not too happy). My > thought was that since the NMIPConfig is only ever applied to a single > interface, the ifindex is clearly applicable to the IPConfig object too, and > should really be set-once property of it. So I'm cool with this and it might > make a lot more code NMDevice/NMVPNConnection agnostic. > > For the NMModemBroadband code, connect_ready() creates the IPConfig objects, > and it knows the interface name already, so we could clearly set the NMIPConfig > ifindex from there with a simple lookup of interface name -> ifindex. > > Also for the DHCP code, every NMDHCPClient gets an ifindex at construct time, > so we should be good there, right? > > For the PPP manager we would have to do a ifindex lookup, but we also have the > interface name there since it's delivered to NetworkManager at the same time as > the IP configuration. > > So +1 on carrying the ifindex around with the IP4Config. Tried to do that: a18ccb3 ip6-config: keep track of ifindex a31bffd ip4-config: keep track of ifindex Downside is that the modification after addition to config->routes (since the parameter is const) which we intended to avoid is still there, now in nm_ip4_config_add_route(). Any ideas how to make it look less offensive? + g_array_index (priv->routes, NMPlatformIP4Route, priv->routes->len - 1).ifindex = priv->ifindex; (In reply to comment #9) > > > >+ memcpy (&route, &g_array_index (routes, NMPlatformIP4Route, i), > > > >+ sizeof (NMPlatformIP4Route)); > > > > > > This should work: > > > > > > route = g_array_index (routes, NMPlatformIP4Route, i); > > > > > > but you could also avoid the copy by just making route be a pointer. > > > > Thanks; I've used the assignment, I don't believe the memory management > > associated with the pointers is justified here. > > No, I just meant you could make route a pointer, and add a "&" to the > assignment: > > route = &g_array_index (routes, NMPlatformIP4Route, i); > > and then just use "route->foo" rather than "route.foo" after that. No memory > management required. I can't. We use the route after the removal of the route from an array, which would invalidate the pointer. (In reply to comment #10) > >> route-manager: remember routes that should be active > > 1.) In the second loop ("/* Add missing routes */"), route-manager will now > also add routes to other interfaces than ifindex. > In the previous loop, that is probably correct, because we want to move the > route on another interface. > In the second loop, I don't see why we want to touch the other interface. > > 2.) Also, after "/* Ignore routes that already exist */" the route manager only > adds @route, if it is not yet known (inside @routes). I could imagine, that if > you remove the route externally, route-manager would not readd it, because it > thinks it's still there. > I think, the loop should check @plat_routes, and add it if it is not there. I don't think it would. It iterates known_routes, and they should all belong to a single interface. > 3.) Also, if only e.g. the gateway changes. In this case, you would not update > the route at all, neither in @routes, nor in platform. Why? array_get_ip?_route() consider routes with a different gateway to be diferent routes. > 4.) You could optimize the following part a bit: > > best_route = array_get_ip4_route (routes, 0, known_route); > if (!best_route) { > success = nm_platform_ip4_route_add (known_route->ifindex, > known_route->source, > known_route->network, > known_route->plen, > known_route->gateway, > 0, > known_route->metric, > known_route->mss); > /*snip*/ > } > > if ( !(best_route && best_route->ifindex == ifindex) > && !array_get_ip4_route (routes, known_route->ifindex, > known_route)) > g_array_append_val (routes, *known_route); I've removed that route presence check, see below for reasoning. > 5.) The first loop moves a route to another interface. Just note that in > general you cannot add a non-direct route, if there is no route to the gateway. > What if there are two conflicting routes: > > 1.2.3.4/32 via 10.0.0.1 metric 0 ifindex 1 > 10.0.0.1/32 via 0.0.0.0 metric 0 ifindex 1 > > 1.2.3.4/32 via 10.0.0.1 metric 0 ifindex 2 > 10.0.0.1/32 via 0.0.0.0 metric 0 ifindex 2 > > If the first loop first processes 1.2.3.4/32, it would move the route to > ifindex 2, while the direct route to the gateway still goes via ifindex 1. Does > this even succeed? The direct route is only changed later. Is that a problem > then? Uh, this is nasty. Linux doesn't let you add a route via gateway that's not reachable via given gateway (it, however, retains such routes if the route to the gateway is removed). It seems to me that it's easier to change the policy from "oldest connection owns routes" to "newest connection owns routes." That leaves the old dependent routes in place when their gateway route is overridden (we can not create them manually when the gateway route is not there) and the old gateway route will be restored when the one that overrode it is removed. It wouldn't work the other way around -- if we wanted to add the new gateway route upon removal of the old one, we'd have to determine what other dependent routes were held back previously, which would be costly. (In reply to comment #11) > I am thinking, maybe it would be worth to implement the route-manager tests > also based on fake AND linux platform? > > There is a lot of complexity in NMLinuxPlatform, and this might be a good way > to reuse other test to test linux platform too. I've done this. One thing I'm not sure about is reaching away from src/tests/Makefile.am to ../platform/tests/. It is somehow ugly, I can't think of a way to rearrange it in a better way though. > Also, this would force us to make fake platform more similar to actual platform > (which would improve our fake platform to make it more usable as a real stub). > For example, in fake platform you can add routes without having a link or > address. Well, I've not added this particular case, but had to modify it a bit to resemble real Linux anyway. > Also, the tests should explicitly test the metrics 0 (both for IPv4 and IPv6). > These are special corner cases. Done. New attempt pushed to lr/route-manager-rh740064
whitespace: + success = nm_platform_ip4_route_add (known_route->ifindex, + known_route->source, + known_route->network, + known_route->plen, (In reply to comment #12) > (In reply to comment #8) > > >> route-manager: remember routes that should be active > > > > 1.) In the second loop ("/* Add missing routes */"), route-manager will now > > also add routes to other interfaces than ifindex. > > In the previous loop, that is probably correct, because we want to move the > > route on another interface. > > In the second loop, I don't see why we want to touch the other interface. > > > > 2.) Also, after "/* Ignore routes that already exist */" the route manager only > > adds @route, if it is not yet known (inside @routes). I could imagine, that if > > you remove the route externally, route-manager would not readd it, because it > > thinks it's still there. > > I think, the loop should check @plat_routes, and add it if it is not there. > > I don't think it would. It iterates known_routes, and they should all belong to > a single interface. > > > 3.) Also, if only e.g. the gateway changes. In this case, you would not update > > the route at all, neither in @routes, nor in platform. > > Why? array_get_ip?_route() consider routes with a different gateway to be > diferent routes. The gateway was an example. Say "mss" then. I didn't see that array_get_ip?_route() considers gateway. I think that is wrong. "conflicting route" means "same network/plen,metric", it does not consider the gateway. Otherwise, array_get_ip?_route() would not find a conflicting route (conflicting network/plen,metric) due to different gateway. (can you add a test case for this too?) > > 4.) You could optimize the following part a bit: > > > > best_route = array_get_ip4_route (routes, 0, known_route); > > if (!best_route) { > > success = nm_platform_ip4_route_add (known_route->ifindex, > > known_route->source, > > known_route->network, > > known_route->plen, > > known_route->gateway, > > 0, > > known_route->metric, > > known_route->mss); > > /*snip*/ > > } > > > > if ( !(best_route && best_route->ifindex == ifindex) > > && !array_get_ip4_route (routes, known_route->ifindex, > > known_route)) > > g_array_append_val (routes, *known_route); > > I've removed that route presence check, see below for reasoning. > > > 5.) The first loop moves a route to another interface. Just note that in > > general you cannot add a non-direct route, if there is no route to the gateway. > > What if there are two conflicting routes: > > > > 1.2.3.4/32 via 10.0.0.1 metric 0 ifindex 1 > > 10.0.0.1/32 via 0.0.0.0 metric 0 ifindex 1 > > > > 1.2.3.4/32 via 10.0.0.1 metric 0 ifindex 2 > > 10.0.0.1/32 via 0.0.0.0 metric 0 ifindex 2 > > > > If the first loop first processes 1.2.3.4/32, it would move the route to > > ifindex 2, while the direct route to the gateway still goes via ifindex 1. Does > > this even succeed? The direct route is only changed later. Is that a problem > > then? > > Uh, this is nasty. Linux doesn't let you add a route via gateway that's not > reachable via given gateway (it, however, retains such routes if the route to > the gateway is removed). > > It seems to me that it's easier to change the policy from "oldest connection > owns routes" to "newest connection owns routes." That means that whenever the routes get synced of an interface, the conflicting route will move over to the now-synced device. This happens at random times, e.g. on new DHCP lease. The "now-synced" device is not necessarily "newer", also a long connected device will get DHCP leases from time to time. This seems wrong behavior to me.
(In reply to comment #13) > whitespace: > > + success = nm_platform_ip4_route_add (known_route->ifindex, > + known_route->source, > + known_route->network, > + known_route->plen, Fixed. > > > 3.) Also, if only e.g. the gateway changes. In this case, you would not update > > > the route at all, neither in @routes, nor in platform. > > > > Why? array_get_ip?_route() consider routes with a different gateway to be > > diferent routes. > > The gateway was an example. Say "mss" then. > > I didn't see that array_get_ip?_route() considers gateway. I think that is > wrong. "conflicting route" means "same network/plen,metric", it does not > consider the gateway. Ah, I did not know this. I've changed array_get_ip?_route() to look for a specific route in case ifindex is specified, otherwise return any matching route. This consider matching routes for clash resolution while still removing non-equal routes for the interface we're syncing. > Otherwise, array_get_ip?_route() would not find a conflicting route > (conflicting network/plen,metric) due to different gateway. > > (can you add a test case for this too?) Done. > > It seems to me that it's easier to change the policy from "oldest connection > > owns routes" to "newest connection owns routes." > > That means that whenever the routes get synced of an interface, the conflicting > route will move over to the now-synced device. This happens at random times, > e.g. on new DHCP lease. The "now-synced" device is not necessarily "newer", > also a long connected device will get DHCP leases from time to time. > > This seems wrong behavior to me. Okay. Scratched the idea. Instead, I'm now trying to re-add each route that seems like it should be in platform, but is not. Extended a test to account for a situation where a route could not be previously added due to missing route to a gateway but not a route to the gateway returned. Pushed new lr/route-manager-rh740064
How about the two patches I pushed on top, squashed into the "ip4-config: keep track of ifindex" and "ip6-config: keep track of ifindex" patches? One thing it does do is use nm_device_get_ip_ifindex() in most places, since the "ip_ifindex" is what the IPConfig should actually get applied to. For most normal devices ifindex == ip_ifindex, but for stuff like WWAN (where we have ppp0) and ADSL (where we also have ppp0 or nas0) there won't be any ifindex because priv->iface isn't a kernel netdevice at all. The rest looks OK to me, though I haven't runtime tested it extensively.
Fixed in master: 47167ca platform: fix route addition ordering 874e4a7 core: split route management code out from platform 747292a ip4-config: keep track of ifindex 52711b5 ip6-config: keep track of ifindex 72e8c53 fake-platform: don't return null routes in place of deleted ones 1ee03ee fake-platform: move route deletion above addition f6c9b4f fake-platform: override routes that clash 4d09782 fake-platform: reject adding routes without the gateway on the same interface 72cefd5 fake-platform: normalize ipv6 route metric before deletion 7c52d09 route-manager: normalize ipv6 route metrics during comparison 4c3ba29 route-manager: remember routes that should be active 0659a67 route-manager: add test 84f54f0 core: pass ifindex as parameter to nm_ip4_config_new() f981407 core: pass ifindex as parameter to nm_ip6_config_new() 8d29b3a route-manager: merge branch 'lr/route-manager-rh740064'
some late comments: >> platform: fix route addition ordering I think this is wrong and inverts the previously correct behavior. Why is it correct? there are several assertions against ifindex in the form: g_assert (priv->ifindex); but other code likes to set invalid ifindexes to -1 (thus wrongly not failing the assertion). I think NMIP4Config should not allow negative indexes and avoid the ambiguity of multiple invalid ifindexes. Or does have 0 a different meaning then -1? It is not immediately obvious to me why some of these assertions hold. g_return_*() would be more forgiving. Also, ifindex is an immutable property for almost all parts. Except nm_ip4_config_replace(). I would - /* ifindex */ - if (src_priv->ifindex != dst_priv->ifindex) { - dst_priv->ifindex = src_priv->ifindex; - has_minor_changes = TRUE; - } + g_return_val_if_fail (src_priv->ifindex == dst_priv->ifindex, FALSE); >> route-manager: remember routes that should be active + if (ifindex) { + /* Looking for a specific route. */ + if ( c->ifindex != ifindex + || route->mss != c->mss + || route->gateway != c->gateway) + continue; + } To this you said: >> I've changed array_get_ip?_route() to look for a specific route in case >> ifindex is specified, otherwise return any matching route. This consider >> matching routes for clash resolution while still removing non-equal routes >> for the interface we're syncing. why does the mss or gateway matter at this point? I think that is wrong. If it's correct, why doesn't it consider the source field? + /* Learn about routes that platform knows but we don't. */ this first loop, effectively adds all the routes that are in platform to @routes. @routes should be the list of what NM wants to configure. By merging these two lists, nm-route-manager does no longer know which routes NM wants to configure and which came from platform below. I don't see how this can work... Also, (together with the previous point) if @routes contains a route "1.2.3.4/24 via 2.2.2.2", this loop would add another entry "1.2.3.4/24 via 3.3.3.3". Is that intended? IMO this commit needs more work, or at least code comments explaining how it works.
btw. some WIP ... th/route-manager-bgo740064
(In reply to Thomas Haller from comment #18) > btw. some WIP ... th/route-manager-bgo740064 now ready: th/route-manager-bgo740064
(In reply to Thomas Haller from comment #19) > (In reply to Thomas Haller from comment #18) > > btw. some WIP ... th/route-manager-bgo740064 > > now ready: th/route-manager-bgo740064 FTR: lrintel said LGTM on IRC
(In reply to Thomas Haller from comment #20) > (In reply to Thomas Haller from comment #19) > > (In reply to Thomas Haller from comment #18) > > > btw. some WIP ... th/route-manager-bgo740064 > > > > now ready: th/route-manager-bgo740064 > > FTR: lrintel said LGTM on IRC merged: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3c240a6d0b0b1ab6576ded7b3bb8c261b32d3d19