After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 735512 - [RFE] support priorities/metric for default routes and non-default routes.
[RFE] support priorities/metric for default routes and non-default routes.
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: High critical
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks: 723178 738268 740064
 
 
Reported: 2014-08-27 12:40 UTC by Thomas Haller
Modified: 2014-11-20 16:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Thomas Haller 2014-08-27 12:40:09 UTC
Allow more control over route metrics (and default routes).

(this applies equally for IPv4 and IPv6)



=== Current Status ===

Regarding default route:

Currently, only one device can get the default route. When NM decides to switch the preferred default route from one device to another, it removes the less preferred route and adds a new one.
Give that, the route metric is not that important, and NM always sets the metric to 1024.

The setting ipv4.never-default allows the user to configure that a device does not get the default route.

NM sets the default route based on a preference by device type (nm_device_get_priority()). E.g. if you have both a WiFi and an ethernet connection, ethernet has always the higher priority. If we have two devices with the same priority, the default route is "sticky". That means, the device first to activate gets the route, and it stays there until the device deactivates (or until a device with a higher priority connects).



Regarding non-default routes:

for manually configured routes (ipv4.routes), the user can choose to set a metric. If he omits the metric (which means: sets it to 0), NM chooses a metric based on nm_device_get_priority(). E.g. ethernet gets a metric of "1", WiFi 10.

For dynamic methods (SLAAC, dhcp), NM sets the route metric to nm_device_get_priority(). If there are places where we fail to set the metric, we should fix that (but such places are not known to me ATM).

Currently there is bug 723178 regarding prefix routes (e.g. when configuring configuring address 192.168.0.5/24, kernel will automatically add a device route "192.168.0.0/24 via 0.0.0.0"). As these routes are created by the kernel, they get metric 0. This should be fixed too, so that they a metric based on nm_device_get_priority().


=======================

The user should have more flexibility.


I was talking with dcbw and danw, and the envisioned solution is to add properties ipv4.route-priority (ipv6, respectively).

The priority would be a non-negative integer that relates ~somehow~ to the route-metric. Lower priority means, more preferred (similar to route metrics). Leaving ipv4.route-priority=0 is the default and it means: detect the priority based on the device type (as we do now with nm_device_get_priority()).

Probably, ipv4.route-priority=1 should equal a metric==0 (so that you can still set the highest possible metric via configuration).

Probably it makes sense to calculate:
    route-metric = ipv4.route-priority==0
        ? nm_device_get_priority()
        : ipv4.route-priority - 1;

Probably we would spread out nm_device_get_priority() which currently returns the numbers 1, 2, 3, for the metric, so that there are larger gaps between the device-types ...


ipv4.never-default would still have the same meaning, but if we activate a connection that is never-default=no, we *always* add a default route. But instead of always setting a metric of 1024, we determine the metric based on ipv4.route-priority/nm_device_get_priority().
With this change, we would actually add more then one default route and the metric would decide on the preference.

For non-default routes ipv4.route-priority is straight forward: it allows to overwrite what nm_device_get_priority() currently returns.


Regarding manually-configured routes, as "omitting the metric" means "setting it to 0", the user currently cannot add a route with metric 0 (because in that case we fallback to nm_device_get_priority()). It might be worth a thought to treat the value as "metric-1". So that:
  0 == use nm_device_get_priority()
  1 == set metric to 0
  2 == set metric to 1
  ...
Of course, that is a pretty bad behavioural change (also with regard to initscripts). But how else can a user configure a metric of 0 to it's route?
Reserve a special value? -1? Or G_MAXINT?



See also:
https://bugzilla.redhat.com/show_bug.cgi?id=663730
Comment 1 Thomas Haller 2014-08-27 13:13:11 UTC
(In reply to comment #0)
>
> Regarding manually-configured routes, as "omitting the metric" means "setting
> it to 0", the user currently cannot add a route with metric 0 (because in that
> case we fallback to nm_device_get_priority()). It might be worth a thought to
> treat the value as "metric-1". So that:
>   0 == use nm_device_get_priority()
>   1 == set metric to 0
>   2 == set metric to 1
>   ...
> Of course, that is a pretty bad behavioural change (also with regard to
> initscripts). But how else can a user configure a metric of 0 to it's route?
> Reserve a special value? -1? Or G_MAXINT?

related: bug 731402, commit http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=cdd09286d73919d64f0f9481821a6f5185f7876b

I think we should improve on user being unable to set metric "0". But how...?
Comment 2 Thomas Haller 2014-08-27 16:21:16 UTC
After discussion, I agree with pavlix that the route-priority should *equal* the route-metric. That is more transparent and clear if the priority maps to the metric.

So, we would have default value for route-priorty=-1, which means: use nm_device_get_priority() -- to have the value 0 usable for setting the metric to zero.


also, note that another branch is about to add autoconnect-priority, where higher number mean preferred. So, let's rename "route-priority" to "ipv4.route-metric", so that the distintion is clearer that for route-metric lower numbers mean preferred.
Comment 3 Thomas Haller 2014-08-28 21:02:28 UTC
Turns out, kernel supports the whole uint32 range for valid metrics. Hence, I am making ipv4.route-metric property of type int64, with default value -1, and a valid range [-1,G_MAXUINT32].
Comment 4 Thomas Haller 2014-08-29 11:37:26 UTC
Pushed branch th/bgo735512_route_metric.

This only contains a the first part.

ignore the last commit (WIP: core: always add default routes...").



still missing:
- fix bug 723178
- add *all* default routes
- use new metric also for default routes
Comment 5 Dan Winship 2014-10-01 21:05:06 UTC
> core: fix leak of lookup_addr in NMPolicy

> Also, as we now evaluate the arguments of logging statements,
> refactor a logging statement.

Missing the word "lazily"?
(And given the need to free the string afterward, this is the sort of place I'd use nm_logging_enabled())


> libnm: do not assert secret flag in get_secret_flags()

The commit message / comment make it sound like it's a mystery or something, but we know exactly what's happening: a client is buggy and is returning non-secrets from GetSecrets, and we aren't validating the data before passing it to get_secret_flags().

The nm_setting_get_secret_flags() docs already say that it returns FALSE if the property is not a secret (without saying anything about "you're not supposed to do that"), so let's just do that. No need for a FIXME comment.

(We could simplify a bit more and get rid of the verify_secret parameter by just checking whether flags_props is a valid property name rather than checking secret_name...)


> libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting

I don't really like making this an int64, but that does seem to be the simplest solution...

Maybe should be called gateway-metric, to clarify that it goes with the gateway, not the "routes" property? (That applies to several later commits as well. Eg, maybe "nm_device_get_ip6_gateway_priority()"

Also, we'll need to expose this in NMIP4Config / NMIP6Config too. (Or maybe we should just start including the default route in the routes array? Or maybe we already do that?)


> core: add nm_ipX_config_get_direct_route_for_host() functions

This adds a return value to nm_utils_ip6_address_clear_host_address() but then doesn't actually use it.
Comment 6 Dan Williams 2014-10-02 19:58:29 UTC
I don't have much beyond what danw said...

> libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting

Should we add this to libnm-util too?  If we don't, client apps that use it won't know anything about this property.  I think we should at least keep parity up to the 1.0 release and give stuff a bit of time to port to libnm before we start ignoring libnm-util (if at all).
Comment 7 Dan Williams 2014-10-02 20:02:15 UTC
(In reply to comment #5)
> > core: fix leak of lookup_addr in NMPolicy
> 
> > Also, as we now evaluate the arguments of logging statements,
> > refactor a logging statement.
> 
> Missing the word "lazily"?
> (And given the need to free the string afterward, this is the sort of place I'd
> use nm_logging_enabled())
> 
> 
> > libnm: do not assert secret flag in get_secret_flags()
> 
> The commit message / comment make it sound like it's a mystery or something,
> but we know exactly what's happening: a client is buggy and is returning
> non-secrets from GetSecrets, and we aren't validating the data before passing
> it to get_secret_flags().
> 
> The nm_setting_get_secret_flags() docs already say that it returns FALSE if the
> property is not a secret (without saying anything about "you're not supposed to
> do that"), so let's just do that. No need for a FIXME comment.
> 
> (We could simplify a bit more and get rid of the verify_secret parameter by
> just checking whether flags_props is a valid property name rather than checking
> secret_name...)
> 
> 
> > libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting
> 
> I don't really like making this an int64, but that does seem to be the simplest
> solution...
> 
> Maybe should be called gateway-metric, to clarify that it goes with the
> gateway, not the "routes" property? (That applies to several later commits as
> well. Eg, maybe "nm_device_get_ip6_gateway_priority()"

The individual route metrics default to 0 which used to be the device priority but is now either the device priority or the 'route-metric' value.  So while it gets used for gateway/subnet routes, it'll also get used VPN gateway host route and individual routes too, not just the gateway.

I can't think of a better name than route-priority or route-metric though.  However, the kernel calls it PRIORITY internally so I wonder why Thomas though "metric" was better (perhaps just because that's what /sbin/ip and ifconfig use for historical reasons?)

Maybe default-route-metric?

> Also, we'll need to expose this in NMIP4Config / NMIP6Config too. (Or maybe we
> should just start including the default route in the routes array? Or maybe we
> already do that?)

We don't, because that's controlled by the Policy and not part of the IPConfig objects which don't contain any global config.  That'll change with Thomas' changes near the top of the branch though.  Also, the default interface was traditionally specified by the Default/Default6 properties of the ActiveConnection so having it in the IPConfig was redundant since most users cared about whether the *interface* was default instead of routing details, like for displaying icons in a UI.
Comment 8 Thomas Haller 2014-10-06 12:37:00 UTC
(In reply to comment #5)
> > core: fix leak of lookup_addr in NMPolicy
> 
> > Also, as we now evaluate the arguments of logging statements,
> > refactor a logging statement.
> 
> Missing the word "lazily"?

right. Important word in this sentence.

> (And given the need to free the string afterward, this is the sort of place I'd
> use nm_logging_enabled())

I disagree. As our logging macro evaluates now only when needed, this is already equal to

if (nm_logging_enabled())
  nm_log_debug(...)
g_free (str);


so wrapping it in another nm_logging_enabled() will expand to:
if (nm_logging_enabled()) {
   if (nm_logging_enabled())
     nm_log_debug(...)
   g_free (str);
}

I think this is less preferable.


> > libnm: do not assert secret flag in get_secret_flags()
> 
> The commit message / comment make it sound like it's a mystery or something,
> but we know exactly what's happening: a client is buggy and is returning
> non-secrets from GetSecrets, and we aren't validating the data before passing
> it to get_secret_flags().
> 
> The nm_setting_get_secret_flags() docs already say that it returns FALSE if the
> property is not a secret (without saying anything about "you're not supposed to
> do that"), so let's just do that. No need for a FIXME comment.
> 
> (We could simplify a bit more and get rid of the verify_secret parameter by
> just checking whether flags_props is a valid property name rather than checking
> secret_name...)

I an expanded version of this patch is on th/bgo737380_log_connection_diff


> > libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting
> 
> I don't really like making this an int64, but that does seem to be the simplest
> solution...

I think so too. But I don't dislike it.


> Maybe should be called gateway-metric, to clarify that it goes with the
> gateway, not the "routes" property? (That applies to several later commits as
> well. Eg, maybe "nm_device_get_ip6_gateway_priority()"

It does not only apply to routes via a gateway (but also). It applies to any route that doesn't have an explicit metric set. Especially, it does *not* only apply to the default-route (but also).



> Also, we'll need to expose this in NMIP4Config / NMIP6Config too. (Or maybe we
> should just start including the default route in the routes array? Or maybe we
> already do that?)

In the WIP commit I noticed that we cannot expose the default route in NMIP4Config:route, because that breaks old libnm-clients (because they don't expect a 0.0.0.0 gateway).

But yeah, if it is useful we probably should expose it via DBUS (maybe later, on demand).


> > core: add nm_ipX_config_get_direct_route_for_host() functions
> 
> This adds a return value to nm_utils_ip6_address_clear_host_address() but then
> doesn't actually use it.

yes, I used it in a previous version. I kept it, because I think it is a more useful interface because it allows chaining. Let's keep it, even if it is an internal-only function?


(In reply to comment #7)
> (In reply to comment #5)
> > > libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting
> > 
> > I don't really like making this an int64, but that does seem to be the simplest
> > solution...
> > 
> > Maybe should be called gateway-metric, to clarify that it goes with the
> > gateway, not the "routes" property? (That applies to several later commits as
> > well. Eg, maybe "nm_device_get_ip6_gateway_priority()"
> 
> The individual route metrics default to 0 which used to be the device priority
> but is now either the device priority or the 'route-metric' value.  So while it
> gets used for gateway/subnet routes, it'll also get used VPN gateway host route
> and individual routes too, not just the gateway.
> 
> I can't think of a better name than route-priority or route-metric though. 
> However, the kernel calls it PRIORITY internally so I wonder why Thomas though
> "metric" was better (perhaps just because that's what /sbin/ip and ifconfig use
> for historical reasons?)

No strong reason. Before we said, it could be just a numeric value, that somehow translates to the route-metric known by the kernel. In that case, I would have called it -priority. But since it is directly the kernel-value, I prefer -metric because it makes it clearer to me.


> Maybe default-route-metric?

it's not the "metric of the default-route". Hence, I find this confusing. Better (but still awkward): route-metric-default


> > Also, we'll need to expose this in NMIP4Config / NMIP6Config too. (Or maybe we
> > should just start including the default route in the routes array? Or maybe we
> > already do that?)
> 
> We don't, because that's controlled by the Policy and not part of the IPConfig
> objects which don't contain any global config.  That'll change with Thomas'
> changes near the top of the branch though.  Also, the default interface was
> traditionally specified by the Default/Default6 properties of the
> ActiveConnection so having it in the IPConfig was redundant since most users
> cared about whether the *interface* was default instead of routing details,
> like for displaying icons in a UI.

Soon, we will add default routes for every device, hence it will be useful to expose them via the IPXConfig object. ... but let's do that on demand.



Rebased branch on master and some minor bugs.
Also, backported
  "libnm: add NMSettingIP4Config:route-metric and..."
to libnm-util.
Comment 9 Dan Williams 2014-10-12 17:40:47 UTC
I'm OK with 'route-metric-default'.

> libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting

I wouldn't say anything about the kernel parsing here as that doesn't add anything useful for users.  Also we don't need to justify using an int64 either, we just pick the right property types and don't need to apologize for it :)  For example we don't say that we need to use guint64 for NMSettingConnection::timestamp.

I would simply state what NetworkManager expects from the numbers, eg that 0 and G_MAXUINT32 are valid, and that -1 means default.  We might note that this is slightly different from the NMSettingIPxConfig::routes 'metric' because we made a mistake and 0 is valid, but we can't change it now.

Also, the descriptions for IPv4 and IPv6 aren't the same here, but I think they should be?

For the IPv6 comments about 0 vs. 1024, it's unfortuante that IPv4 doesn't work like IPv6 does, but I guess we can't help that since it's a kernel limitation, and thus we *should* mention '0' as a special value here that maps to 1024, and also note that this is different behavior than IPv4's '0' value.
Comment 10 Thomas Haller 2014-10-15 21:33:49 UTC
(In reply to comment #9)
> I'm OK with 'route-metric-default'.
> 
> I would simply state what NetworkManager expects from the numbers, eg that 0
> and G_MAXUINT32 are valid, and that -1 means default.  We might note that this
> is slightly different from the NMSettingIPxConfig::routes 'metric' because we
> made a mistake and 0 is valid, but we can't change it now.

we could obsolete our current NMSettingIPxConfig:route and add a route-v2 property (on DBUS API). The latter could treat -1 as default... we already dislike that we declared the route type as "aau".
Comment 11 Thomas Haller 2014-10-22 19:25:16 UTC
Repushed th/bgo735512_route_metric.

The last commit is a major change. I did not carefully test it yet, but in that's it in principle.
Comment 12 Dan Winship 2014-10-23 15:02:52 UTC
> libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting

>+ * Returns: the route metric that is used for IPv4 routes. The special

"that is used for routes that don't explicitly specify a metric; see #NMSettingIP4Config:route-metric for more details"

>+ * device type. As the linux kernel, setting this value to 0 means setting
>+ * it to 1024.

"As WITH the linux kernel" (in the function doc and property doc)


any libnm-util additions need "Since: 1.0" / NM_AVAILABLE_IN_1_0, but are you really adding the property there because you want it in libnm-util, or just for the side effect of getting it into the docs?



> cli: support new properties NM_SETTING_IP4_CONFIG_ROUTE_METRIC and NM_SETTING_IP6_CONFIG_ROUTE_METRIC

though annoying because of the need for renumbering, I would put ROUTE_METRIC immediately after ROUTES in the output


> core: cleanup type of route metric to ensure guint32

You didn't change nm_device_get_priority(), though I'm not sure if that's a mistake or not.

>+G_STATIC_ASSERT (sizeof (guint) >= sizeof (guint32));

You don't need to static assert that; if it wasn't true then the compiler would warn about passing G_MAXUINT32 to g_param_spec_uint() anyway.

(And in fact, GLib won't even build if sizeof(guint) < 4 anyway.)


> core: support overwriting the default route priority per device via connection setting

"overriding" (and "override" later in the commit message)


> platform: extend nm_platform_ipX_route_get_all() to return default-routes only

>+	NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT      = (0x01 << 2),

FLAG_ONLY_DEFAULT would be clearer


> libnm: allow zero length prefix in NMIP4Route and NMIP6Route

You don't need a warning comment in libnm-core, and you shouldn't change libnm-util, since it would be non-backward-compatible, and if people are going to depend on 1.0 anyway, they might as well just use libnm.


> core: add manager for default route and support multiple default routes

>    can get the default route (never-default=no). The default route is
>    choosen by assigning different metrics to the devices.

"chosen"

In both the commit message and in comments in the code, it seems in many places like "the default route" should be "a default route", since there is no longer a single "THE" default route.

>-	if (   connection
>-	    && !nm_settings_connection_get_nm_generated_assumed (NM_SETTINGS_CONNECTION (connection))) {

It is weird to explicitly add a pretend default route for generated-assumed connections which isn't ever actually going to be used.

>+	nm_ip4_config_merge_setting (composite,
>+	                             nm_connection_get_setting_ip4_config (connection),
...
>+		if (   (s_ip4 = nm_connection_get_setting_ip4_config (connection))

should just set s_ip4 earlier

Also, you can't add a default route if the config's gateway is unset. (NMPolicy checks:

	/* Make sure the device has a gateway */
	if (!nm_ip4_config_get_gateway (ip4_config) && (devtype != NM_DEVICE_TYPE_MODEM))
		continue;

although I think what it really means is not 'doesn't have a gateway and isn't a modem", but "doesn't have a gateway and doesn't have a peer address. I think?)

>+	/* To configure a route via gateway, the gateway must be onlink.
>+	 * Add this route too. */

if it's on link then we already have a route to it...

>+	/* prune all other default routes from this device. */
>+	if (VTABLE_IS_IP4) {
>+		routes = nm_platform_ip4_route_get_all (entry->route.ifindex, NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT);

The fact that you need separate loops for ip4 and ip6 makes the whole vtable thing sort of pointless; you can just call nm_platform_ip4_route_delete() in the ip4 branch and nm_platform_ip6_route_delete() in the ip6 branch...

Actually, there is virtually no shared code between the ip4 and ip6 cases here, and you should just have two separate sync methods.
Comment 13 Thomas Haller 2014-10-23 18:48:55 UTC
(In reply to comment #12)
> > libnm: add NMSettingIP4Config:route-metric and NMSettingIP6Config:route-metric setting
> 
> >+ * Returns: the route metric that is used for IPv4 routes. The special
> 
> "that is used for routes that don't explicitly specify a metric; see
> #NMSettingIP4Config:route-metric for more details"


done


> >+ * device type. As the linux kernel, setting this value to 0 means setting
> >+ * it to 1024.
> 
> "As WITH the linux kernel" (in the function doc and property doc)

done

> any libnm-util additions need "Since: 1.0" / NM_AVAILABLE_IN_1_0, but are you
> really adding the property there because you want it in libnm-util, or just for
> the side effect of getting it into the docs?

I think, if it is easy enough to support it in libnm-util, we can add it there.



> > cli: support new properties NM_SETTING_IP4_CONFIG_ROUTE_METRIC and NM_SETTING_IP6_CONFIG_ROUTE_METRIC
> 
> though annoying because of the need for renumbering, I would put ROUTE_METRIC
> immediately after ROUTES in the output

done


> > core: cleanup type of route metric to ensure guint32
> 
> You didn't change nm_device_get_priority(), though I'm not sure if that's a
> mistake or not.

That is intentional. Later I add nm_device_get_ip4_route_metric() that wraps nm_device_get_priority() -- which still has another use too. 


> >+G_STATIC_ASSERT (sizeof (guint) >= sizeof (guint32));
> 
> You don't need to static assert that; if it wasn't true then the compiler would
> warn about passing G_MAXUINT32 to g_param_spec_uint() anyway.
> 
> (And in fact, GLib won't even build if sizeof(guint) < 4 anyway.)

done.


> > core: support overwriting the default route priority per device via connection setting
> 
> "overriding" (and "override" later in the commit message)

done


> > platform: extend nm_platform_ipX_route_get_all() to return default-routes only
> 
> >+	NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT      = (0x01 << 2),
> 
> FLAG_ONLY_DEFAULT would be clearer

then it wouldn't be really a "flag". Indeed instead there could be enum values:

   NM_PLATFORM_GET_ROUTE_SELECT_ALL
   NM_PLATFORM_GET_ROUTE_SELECT_ONLY_DEFAULT
   NM_PLATFORM_GET_ROUTE_SELECT_ONLY_NON_DEFAULT

but, the flag approach seemed more extensible. (although not used currently).

Also, the flags could be instead white-list:

   NM_PLATFORM_GET_ROUTE_FLAG_INCLUDE_ALL
   NM_PLATFORM_GET_ROUTE_FLAG_INCLUDE_DEFAULT
   NM_PLATFORM_GET_ROUTE_FLAG_INCLUDE_NON_DEFAULT

but, didn't seem better either...

Ok how it is?


> > libnm: allow zero length prefix in NMIP4Route and NMIP6Route
> 
> You don't need a warning comment in libnm-core, and you shouldn't change
> libnm-util, since it would be non-backward-compatible, and if people are going
> to depend on 1.0 anyway, they might as well just use libnm.

the change to libnm-util itself is backward compatible. It's only not backward compatible if the server spits out default routes.
Although, by fixing libnm-util now, we might dare to change the behaviour of the server in the far future when nobody is using an old libnm-util version.

The warning in libnm-core is there so that we don't forget about this. Because effectively, the server must never change this.


> > core: add manager for default route and support multiple default routes
> 
> >    can get the default route (never-default=no). The default route is
> >    choosen by assigning different metrics to the devices.
> 
> "chosen"
> 
> In both the commit message and in comments in the code, it seems in many places
> like "the default route" should be "a default route", since there is no longer
> a single "THE" default route.

reworded commit message.


> >-	if (   connection
> >-	    && !nm_settings_connection_get_nm_generated_assumed (NM_SETTINGS_CONNECTION (connection))) {
> 
> It is weird to explicitly add a pretend default route for generated-assumed
> connections which isn't ever actually going to be used.

I guess, there are different solutions.

But my intent was that the NMIPxConfig objects of the NMDevice should contain the default route (if it has one). And then NMDefaultRouteManager should look at the device ip config to get the default route.

For assumed connections this would be equally true. Note that priv->ext_ip4_config doesn't contain the default route.

"which isn't ever actually going to be used" ... it is used by nm-default-route-manger, it is just not reapplied to the device via platform_route_add.


> >+	nm_ip4_config_merge_setting (composite,
> >+	                             nm_connection_get_setting_ip4_config (connection),
> ...
> >+		if (   (s_ip4 = nm_connection_get_setting_ip4_config (connection))
> 
> should just set s_ip4 earlier

done.


> Also, you can't add a default route if the config's gateway is unset. (NMPolicy
> checks:
> 
>     /* Make sure the device has a gateway */
>     if (!nm_ip4_config_get_gateway (ip4_config) && (devtype !=
> NM_DEVICE_TYPE_MODEM))
>         continue;
> 
> although I think what it really means is not 'doesn't have a gateway and isn't
> a modem", but "doesn't have a gateway and doesn't have a peer address. I
> think?)

I added checks for this. For now, I just compare for devtype!=MODEM... @dcbw, check peer-address instead?


> >+	/* To configure a route via gateway, the gateway must be onlink.
> >+	 * Add this route too. */
> 
> if it's on link then we already have a route to it...

fixed


 
> >+	/* prune all other default routes from this device. */
> >+	if (VTABLE_IS_IP4) {
> >+		routes = nm_platform_ip4_route_get_all (entry->route.ifindex, NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT);
> 
> The fact that you need separate loops for ip4 and ip6 makes the whole vtable
> thing sort of pointless; you can just call nm_platform_ip4_route_delete() in
> the ip4 branch and nm_platform_ip6_route_delete() in the ip6 branch...
> 
> Actually, there is virtually no shared code between the ip4 and ip6 cases here,
> and you should just have two separate sync methods.

I don't really agree, but I added a fixup! commit that does that. Do you feel this is better?



Repushed
Comment 14 Dan Winship 2014-10-31 14:26:43 UTC
(In reply to comment #13)
> > >+	NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT      = (0x01 << 2),
> > 
> > FLAG_ONLY_DEFAULT would be clearer
> 
> then it wouldn't be really a "flag".

Sure it would. It would have exactly the same meaning as EXCLUDE_NON_DEFAULT, just a different name.

But after thinking about it, it's not the "ONLY" vs "EXCLUDE" that annoys me, it's "EXCLUDE" + "NON" creating a confusing double negative. How about "ONLY_DEFAULT" and "EXCLUDE_DEFAULT" instead?

> > > libnm: allow zero length prefix in NMIP4Route and NMIP6Route

> The warning in libnm-core is there so that we don't forget about this. Because
> effectively, the server must never change this.

Right, so I think the fix here is going to be that with the addressdata branch, NMIPConfig:routes will still never contain default routes, but NMIPConfig:route-data will.

> > It is weird to explicitly add a pretend default route for generated-assumed
> > connections which isn't ever actually going to be used.

> But my intent was that the NMIPxConfig objects of the NMDevice should contain
> the default route (if it has one). And then NMDefaultRouteManager should look
> at the device ip config to get the default route.
> 
> For assumed connections this would be equally true. Note that
> priv->ext_ip4_config doesn't contain the default route.

If NM does not support having a generated-assumed connection have the default route at all, then there's no reason to add a fake one to its config. And if we *do* support having a generated-assumed connection have the default route, then we should capture the device's actual default route, with the metric that it was externally configured to use, rather than making up our own.

> I added checks for this. For now, I just compare for devtype!=MODEM... @dcbw,
> check peer-address instead?

This doesn't need to get addressed in this bug, but it would be good to file a new bug for it if not, or otherwise we're just going to forget about it...

> > Actually, there is virtually no shared code between the ip4 and ip6 cases here,
> > and you should just have two separate sync methods.
> 
> I don't really agree, but I added a fixup! commit that does that. Do you feel
> this is better?

no... both ways are equally ugly :-}. I guess we can't make this simpler as long as NMPlatformIP4Route and NMPlatformIP6Route have incompatible representations.
Comment 15 Dan Williams 2014-10-31 15:24:46 UTC
> platform: extend nm_platform_ipX_route_get_all() to return default-routes only

NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT is basically a double-negative and harder to parse, I'd suggest NM_PLATFORM_GET_ROUTE_FLAG_ONLY_DEFAULT instead.

> libnm: allow zero length prefix in NMIP4Route and NMIP6Route

THis now allows users to specify static default routes in the configuration (if the UI allowed it).  Is that intentional?  NM hasn't historically allowed this since it controls the default route.  But now with the route priority/metric stuff and  multiple default routes, perhaps we can allow this now without the conflicts it would have previously created.


> 

+		if (   (s_ip4 = nm_connection_get_setting_ip4_config (connection))
+		    && !nm_setting_ip4_config_get_never_default (s_ip4)) {

(or later)

+		s_ip4 = nm_connection_get_setting_ip4_config (connection);
+		if (   s_ip4
+		    && !nm_setting_ip4_config_get_never_default (s_ip4)
+		    && (gw_addr = nm_ip4_config_get_gateway (composite))

Doesn't connection normalization now ensure there is always an NMSettingIP4Config and NMSettingIP6Config in the connection?


For the nm-policy.c changes, the default route manager effectively is the thing that determines the "best device", so couldn't we remove most of the NMPolicy code for get_best_ipX_device() somehow?

Also, it doesn't look like the default route manager handles VPN connections at all right now.  Does the DRM really need to track devices, or could the devices and the VPN connections add/remove their IPxConfig to the DRM along with some other metadata (default priority, assumed-or-not, etc)?  I think it would be great if we could put both the VPN and device default route handling into the DRM.
Comment 16 Thomas Haller 2014-10-31 21:01:23 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > > >+	NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT      = (0x01 << 2),
> > > 
> > > FLAG_ONLY_DEFAULT would be clearer
> > 
> > then it wouldn't be really a "flag".
> 
> Sure it would. It would have exactly the same meaning as EXCLUDE_NON_DEFAULT,
> just a different name.
> 
> But after thinking about it, it's not the "ONLY" vs "EXCLUDE" that annoys me,
> it's "EXCLUDE" + "NON" creating a confusing double negative. How about
> "ONLY_DEFAULT" and "EXCLUDE_DEFAULT" instead?


I meant "flag" in the sense of:

If you don't pass any (blacklist) flags, you get *all* routes. Then specifying additional flags, you can filter/exclude parts. The flags can be combined with "|".

Alternatively, I could imagine whitelist flags. If you don't specify any, you get no result. By specifying flags, you add routes to the result.

This would make sense when having also flags like _EXCLUDE_IPv4 and _EXCLUDE_IPv6.

Having a "ONLY_DEFAULT" is not a flag in the above sense. But arguably, we will never have more then 3 modes: all, only_default, and non_default. I 

TL;DR: I rename the flags to "mode", there are now three of them.
Comment 17 Thomas Haller 2014-10-31 21:17:32 UTC
(In reply to comment #14)
> (In reply to comment #13)

> > > > libnm: allow zero length prefix in NMIP4Route and NMIP6Route
> 
> > The warning in libnm-core is there so that we don't forget about this. Because
> > effectively, the server must never change this.
> 
> Right, so I think the fix here is going to be that with the addressdata branch,
> NMIPConfig:routes will still never contain default routes, but
> NMIPConfig:route-data will.

agree.



> > > It is weird to explicitly add a pretend default route for generated-assumed
> > > connections which isn't ever actually going to be used.
> 
> > But my intent was that the NMIPxConfig objects of the NMDevice should contain
> > the default route (if it has one). And then NMDefaultRouteManager should look
> > at the device ip config to get the default route.
> > 
> > For assumed connections this would be equally true. Note that
> > priv->ext_ip4_config doesn't contain the default route.
> 
> If NM does not support having a generated-assumed connection have the default
> route at all, then there's no reason to add a fake one to its config. And if we
> *do* support having a generated-assumed connection have the default route, then
> we should capture the device's actual default route, with the metric that it
> was externally configured to use, rather than making up our own.

NMDefaultRouteManager must consider generated-assumed devices as well. Consider when NM starts and has several devices assumed and activated. DRM must still have a notion of who has the default route.

Well, the connection of the generated-assumed device is generated based on the device's actual default route, right? So that should be no difference (if it would be, it is a bug somewhere else).



> > I added checks for this. For now, I just compare for devtype!=MODEM... @dcbw,
> > check peer-address instead?
> 
> This doesn't need to get addressed in this bug, but it would be good to file a
> new bug for it if not, or otherwise we're just going to forget about it...

The actual fix for this seems few lines of code. I would rather do it right away, if I only knew what we actually want...(??)


> > > Actually, there is virtually no shared code between the ip4 and ip6 cases here,
> > > and you should just have two separate sync methods.
> > 
> > I don't really agree, but I added a fixup! commit that does that. Do you feel
> > this is better?
> 
> no... both ways are equally ugly :-}. I guess we can't make this simpler as
> long as NMPlatformIP4Route and NMPlatformIP6Route have incompatible
> representations.

dropped commit 16f0b777d5e5b28d414ca548d4a97efca5f9fa7b
Comment 18 Thomas Haller 2014-10-31 21:35:04 UTC
(In reply to comment #15)
> > platform: extend nm_platform_ipX_route_get_all() to return default-routes only
> 
> NM_PLATFORM_GET_ROUTE_FLAG_EXCLUDE_NON_DEFAULT is basically a double-negative
> and harder to parse, I'd suggest NM_PLATFORM_GET_ROUTE_FLAG_ONLY_DEFAULT
> instead.

fixed


> > libnm: allow zero length prefix in NMIP4Route and NMIP6Route
> 
> THis now allows users to specify static default routes in the configuration (if
> the UI allowed it).  Is that intentional?  NM hasn't historically allowed this
> since it controls the default route.  But now with the route priority/metric
> stuff and  multiple default routes, perhaps we can allow this now without the
> conflicts it would have previously created.

That was not intentional, but maybe we want to allow it. But effectively we cannot, because it would break older clients (again).

I think we should extend nm_setting_ip_config:verfiy() to check that the setting contains no default route. I would do it in this branch, but that only conflicts later with address-route changes (bug 682946).

Note that all users of libnm already ensure that they don't add a default route to the setting. Otherwise they had hit the assertion I just removed.



> +        if (   (s_ip4 = nm_connection_get_setting_ip4_config (connection))
> +            && !nm_setting_ip4_config_get_never_default (s_ip4)) {
> 
> (or later)
> 
> +        s_ip4 = nm_connection_get_setting_ip4_config (connection);
> +        if (   s_ip4
> +            && !nm_setting_ip4_config_get_never_default (s_ip4)
> +            && (gw_addr = nm_ip4_config_get_gateway (composite))
> 
> Doesn't connection normalization now ensure there is always an
> NMSettingIP4Config and NMSettingIP6Config in the connection?

Probably. I'd like to be a bit defensive here.


> For the nm-policy.c changes, the default route manager effectively is the thing
> that determines the "best device", so couldn't we remove most of the NMPolicy
> code for get_best_ipX_device() somehow?
> 
> Also, it doesn't look like the default route manager handles VPN connections at
> all right now.  Does the DRM really need to track devices, or could the devices
> and the VPN connections add/remove their IPxConfig to the DRM along with some
> other metadata (default priority, assumed-or-not, etc)?  I think it would be
> great if we could put both the VPN and device default route handling into the
> DRM.

Yes, I strongly agree with that. I just didn't do it yet, because I wanted to see what you say and also how this relates to your VPN changes.
Comment 19 Dan Williams 2014-11-04 14:45:01 UTC
The VPN changes won't land for bit because they'll take a bit more work.  Creating all devices (even unrealized ones) was more work than I thought and I'd like to land the existing branch before we start destabilizing NMActiveConnection and NMDevice to move VPNs to it...  I think the route metric stuff is more important though, so we should work to get that in sooner.
Comment 20 Thomas Haller 2014-11-05 10:24:31 UTC
branch rebased and repushed.

Now VPN also managed by NMDefaultRouteManager.


There are several new commits:

3afafd0 policy: return best config based on the internal sorting of NMDefaultRouteManager
2e9aa37 policy: improve get_best_device() to strictly adhering the sort order of the entries
9f11d53 policy: set default routes for VPN via NMDefaultRouteManager
3bc6ef9 policy: move get_best_config() function to nm-default-route-manager
c63b1c6 policy: track default route for VPN in NMDefaultRouteManager
f49acbe vpn: add nm_vpn_connection_get_connection_id() function
8ba6ba3 policy: better sync get_best_device() with NMDefaultRouteManager
54979b1 policy: move get_best_device() function to nm-default-route-manager
cf802f8 fixup! core: add manager for default route and support multiple default routes
cc7a4b7 policy: minor refactoring in get_best_ipx_device()
ddf49b7 policy: fix updating the default route for VPN
eefcaf9 fixup! core: commiting ip config should not skip over default-route
a12bf04 device: add function nm_device_uses_assumed_connection()
40499e7 fixup! core: overwrite the default route priority via connection setting
804c238 fixup! core: add explicit functions for the route priority/metric
735bb35 fixup! core: modify the values/route metric returned by nm_device_get_priority()
f2018fc core: forward declare NMVpnConnection in nm-types.h
564d8de platform: don't include gsystem-local-alloc.h in nm-platform.h
1791b06 all: add macro NM_IN_SET

(I did not modify existing commits, except rewording commit messages).
Comment 21 Thomas Haller 2014-11-05 14:35:55 UTC
did some testing, fixed several issues. Now it works well AFAIS.

Repushed.
Comment 22 Dan Williams 2014-11-05 22:36:33 UTC
> fixup! core: modify the values/route metric returned by nm_device_get_priority()

I think NM_VPN_ROUTE_METRIC_DEFAULT should go into nm-vpn-connection.c since it's only used there and is VPN specific.

> core: commiting ip config should not skip over default-route

I think the commit summary should be something like "core: remove dead code for skipping IP default route", because later we just add back some of that functionality elsewhere, and because this code really was dead code.

> policy: minor refactoring in get_best_ipx_device()

Might as well consolidate the get-act-request/get-connection into a single nm_device_get_connection() ?

> core: add manager for default route and support multiple default routes

The get_effective_metric() functions aren't used at all.

The device_get_route_metric VTable function isn't used at all.

I'd feel a lot better about the DRM if it didn't have to care about NMDevice or NMVpnConnection at all, but if it could just deal with NMIPxConfig objects instead?  I just think making the DRM aware of NMDevice/NMVpnConnection makes the DRM have a more complicated API and object model and is less testable.  If we can make it as isolated as possible, I think that would be better.

It looks like the things that NMDevice is used for is:

1) _entry_at_idx_update() checking for assumed connections; this could be a tag/property on the NMIPxConfig object instead

2) _ipX_device_removed_cb() signal handlers; like the VPNs, shouldn't the NMDevice remove the IPXConfig when it cleans itself up?  Same for _ipX_vpn_removed_cb().

3) _ipx_update_default_route() - getting the ifindex; the ifindex could be attached to the IPXConfig

4) _ipx_update_default_route() - getting the default route from the device's IPXConfig - would be unecessary if the DRM tracked IPXConfigs instead of NMDevice/NMVpnConnection; same for the VPN thing just below

5)  _ipx_update_default_route() - connecting signal handlers for 'removed' signals - should just depend on the NMDevice or NMVpnConnection removing the config?  Alternatively if we really need to know about removal, the Manager should tell the DRM directly.

6) all the best device/config stuff moved over from NMPolicy ; this one would take more work to fix up to be device/vpn agnostic

Doing this would allow us to more easily add testcases for the DRM too (to ensure route ordering and behavior) since we don't have to create "fake" NMDevice and NMVpnConnection classes.

For the 'never-default' stuff, shouldn't that always come from the NMIPxConfig objects?  Do we really need the setting at all?

For the route priority stuff, the VPN or the NMDevice could set the fallback route priority on the NMIPxCOnfig object before handing it to the DRM.
Comment 23 Thomas Haller 2014-11-05 23:10:45 UTC
(In reply to comment #22)
> > fixup! core: modify the values/route metric returned by nm_device_get_priority()
> 
> I think NM_VPN_ROUTE_METRIC_DEFAULT should go into nm-vpn-connection.c since
> it's only used there and is VPN specific.

done

> > core: commiting ip config should not skip over default-route
> 
> I think the commit summary should be something like "core: remove dead code for
> skipping IP default route", because later we just add back some of that
> functionality elsewhere, and because this code really was dead code.

reworded. Ok?

> > policy: minor refactoring in get_best_ipx_device()
> 
> Might as well consolidate the get-act-request/get-connection into a single
> nm_device_get_connection() ?

It gets later removed anyway. This change is only here, so that the diff of a later change is smaller and better understood. Also, I wanted this patch almost trivial.


> > core: add manager for default route and support multiple default routes
> 
> The get_effective_metric() functions aren't used at all.
> 
> The device_get_route_metric VTable function isn't used at all.

They were used at earlier commits. I removed them at the points when they are no longer needed.




> I'd feel a lot better about the DRM if it didn't have to care about NMDevice > or NMVpnConnection at all, [...]

I'll address that in next comment. Re-pushed with the changes above
Comment 24 Thomas Haller 2014-11-05 23:34:12 UTC
(In reply to comment #22)
> > fixup! core: modify the values/route metric returned by 
> I'd feel a lot better about the DRM if it didn't have to care about NMDevice > or NMVpnConnection at all, but if it could just deal with NMIPxConfig objects
> instead?  I just think making the DRM aware of NMDevice/NMVpnConnection makes
> the DRM have a more complicated API and object model and is less testable.  If
> we can make it as isolated as possible, I think that would be better.
> 
> It looks like the things that NMDevice is used for is:

the most important thing that it is used is that the device/vpn-connection is the ID for an entry. When calling update(), we search whether we already have a source.


> 1) _entry_at_idx_update() checking for assumed connections; this could be a
> tag/property on the NMIPxConfig object instead

that would mean to add unrelated properties to NMIPxConfig. And tags are really ugly hacks.


> 2) _ipX_device_removed_cb() signal handlers; like the VPNs, shouldn't the
> NMDevice remove the IPXConfig when it cleans itself up?  Same for
> _ipX_vpn_removed_cb().

They could. But this is only a last failsafe not to keep the device hanging. It's indeed not necessary (but not wrong either).

Indeed registering to NM_VPN_CONNECTION_INTERNAL_STATE_CHANGED was completely wrong. I removed it.


> 3) _ipx_update_default_route() - getting the ifindex; the ifindex could be
> attached to the IPXConfig

see 1)

> 4) _ipx_update_default_route() - getting the default route from the device's
> IPXConfig - would be unecessary if the DRM tracked IPXConfigs instead of
> NMDevice/NMVpnConnection; same for the VPN thing just below

NMDevice indeed tracks now the default route in IPXConfig. NMVpnConnection does not. That is a major difference between the both. I didn't do that, because it seemed more complicated to me.


> 5)  _ipx_update_default_route() - connecting signal handlers for 'removed'
> signals - should just depend on the NMDevice or NMVpnConnection removing the
> config?  Alternatively if we really need to know about removal, the Manager
> should tell the DRM directly.

same as 2)


> 6) all the best device/config stuff moved over from NMPolicy ; this one would
> take more work to fix up to be device/vpn agnostic

what do you mean with this?


> Doing this would allow us to more easily add testcases for the DRM too (to
> ensure route ordering and behavior) since we don't have to create "fake"
> NMDevice and NMVpnConnection classes.

If you really want to go that route, I would rather have NMDevice and NMVpnConnection implement a slim interface. Only downside is, that this would be so cumbersome in glib. 


> For the 'never-default' stuff, shouldn't that always come from the NMIPxConfig
> objects?  Do we really need the setting at all?

you mean:
  never_default =    nm_ip4_config_get_never_default (vpn_config)
      || nm_setting_ip4_config_get_never_default 
              (nm_connection_get_setting_ip4_config (connection));
? That comes nm-policy.c:get_best_ip4_config()


> For the route priority stuff, the VPN or the NMDevice could set the fallback
> route priority on the NMIPxCOnfig object before handing it to the DRM.

you mean the effective_metric? That cannot be set by NMIPxConfig, because only DRM knows all the routes and can assign the effective metric to avoid clashes.
Comment 25 Thomas Haller 2014-11-05 23:56:45 UTC
Btw. one interesting error scenario I found is the following:


have 3 connections em1, em2, and openvpn1. All have the same route-priority 100 (and can get the default route).


(USER) activate em1
  - default route em1, metric 100
(USER) activate em2
  - default route em1, metric 100
  - default route em2, metric 101
(USER) activate openvpn1
  - default route em1, metric 100
  - default route em2, metric 101
  - default route openvpn1, metric 103

the openvpn connection uses a tun0 device which now has IP config. At this point NM creates a generated-assumed connection tun0.

(NM) generated-assume tun0
  - default route em1, metric 100
  - default route em2, metric 101
  - default route openvpn1, metric 103
  - default route tun0, metric 103 (!synced)

(USER) deactivate em2
  - default route em1, metric 100
  - default route openvpn1, metric 102
  - default route tun0, metric 103 (!synced)

(NM) now, generated-assumed tun0 gets updated with the new metric:
  - default route em1, metric 100
  - default route openvpn1, metric 102
  - default route tun0, metric 102 (!synced)
  - default route tun0, metric 103 (<<<<<)

We leaked the 103 route because in the last step NM did not remove it from tun0 (because it is an assumed connection, !synced).



I think there is no easy solution and the issue is not severe. We should fix NM not to create tun0.
Comment 26 Dan Williams 2014-11-06 15:38:25 UTC
(In reply to comment #25)
> Btw. one interesting error scenario I found is the following:
> 
> 
> have 3 connections em1, em2, and openvpn1. All have the same route-priority 100
> (and can get the default route).
> 
> 
> (USER) activate em1
>   - default route em1, metric 100
> (USER) activate em2
>   - default route em1, metric 100
>   - default route em2, metric 101
> (USER) activate openvpn1
>   - default route em1, metric 100
>   - default route em2, metric 101
>   - default route openvpn1, metric 103
> 
> the openvpn connection uses a tun0 device which now has IP config. At this
> point NM creates a generated-assumed connection tun0.
> 
> (NM) generated-assume tun0
>   - default route em1, metric 100
>   - default route em2, metric 101
>   - default route openvpn1, metric 103
>   - default route tun0, metric 103 (!synced)
> 
> (USER) deactivate em2
>   - default route em1, metric 100
>   - default route openvpn1, metric 102
>   - default route tun0, metric 103 (!synced)
> 
> (NM) now, generated-assumed tun0 gets updated with the new metric:
>   - default route em1, metric 100
>   - default route openvpn1, metric 102
>   - default route tun0, metric 102 (!synced)
>   - default route tun0, metric 103 (<<<<<)
> 
> We leaked the 103 route because in the last step NM did not remove it from tun0
> (because it is an assumed connection, !synced).
> 
> 
> 
> I think there is no easy solution and the issue is not severe. We should fix NM
> not to create tun0.

NM should know about tun0 and should certainly track the IP config, but since it's an assumed connection, NM should not be touching the route metric at all.
Comment 27 Dan Williams 2014-11-06 15:57:08 UTC
(In reply to comment #24)
> (In reply to comment #22)
> > > fixup! core: modify the values/route metric returned by 
> > I'd feel a lot better about the DRM if it didn't have to care about NMDevice > or NMVpnConnection at all, but if it could just deal with NMIPxConfig objects
> > instead?  I just think making the DRM aware of NMDevice/NMVpnConnection makes
> > the DRM have a more complicated API and object model and is less testable.  If
> > we can make it as isolated as possible, I think that would be better.
> > 
> > It looks like the things that NMDevice is used for is:
> 
> the most important thing that it is used is that the device/vpn-connection is
> the ID for an entry. When calling update(), we search whether we already have a
> source.

Yeah, though I think that could be done in a different way, or the ID could just be the pointer of the NMIP4Config/NMIP6Config.  After all, if the NMIP4/IP6Config change, then the information they represent should change too.

> > 1) _entry_at_idx_update() checking for assumed connections; this could be a
> > tag/property on the NMIPxConfig object instead
> 
> that would mean to add unrelated properties to NMIPxConfig. And tags are really
> ugly hacks.

Yeah, I meant non-exported properties.  But the way I look at it, when a Device applies an IPConfig object, the information in that object is specifically applied to an ifindex and will never be applied to any other ifindex after that.  The NMDevice's IPConfigs aren't generic and cannot be passed around to another device, they are always tied to the ip_ifindex/ip_iface of that specific device.  So I think properties like 'ifindex' and 'default route metric' are related to the IPConfig.

> > 2) _ipX_device_removed_cb() signal handlers; like the VPNs, shouldn't the
> > NMDevice remove the IPXConfig when it cleans itself up?  Same for
> > _ipX_vpn_removed_cb().
> 
> They could. But this is only a last failsafe not to keep the device hanging.
> It's indeed not necessary (but not wrong either).
> 
> Indeed registering to NM_VPN_CONNECTION_INTERNAL_STATE_CHANGED was completely
> wrong. I removed it.

Yeah, I was going to point that out but I forgot.

> > 3) _ipx_update_default_route() - getting the ifindex; the ifindex could be
> > attached to the IPXConfig
> 
> see 1)
> 
> > 4) _ipx_update_default_route() - getting the default route from the device's
> > IPXConfig - would be unecessary if the DRM tracked IPXConfigs instead of
> > NMDevice/NMVpnConnection; same for the VPN thing just below
> 
> NMDevice indeed tracks now the default route in IPXConfig. NMVpnConnection does
> not. That is a major difference between the both. I didn't do that, because it
> seemed more complicated to me.

Yeah, this is an interesting integration point.  The default route for a device really does belong in that device's IPConfig, so that's fine.  But I'm not sure the NMDevice should do anything with the default route at all, since the device itself has no knowledge of the overall situation, certainly not enough to say what the priorities should be.  So I think maybe they should just ingore the default route internally and let the DRM manage it.

> > 5)  _ipx_update_default_route() - connecting signal handlers for 'removed'
> > signals - should just depend on the NMDevice or NMVpnConnection removing the
> > config?  Alternatively if we really need to know about removal, the Manager
> > should tell the DRM directly.
> 
> same as 2)
> 
> 
> > 6) all the best device/config stuff moved over from NMPolicy ; this one would
> > take more work to fix up to be device/vpn agnostic
> 
> what do you mean with this?

I just meant that I thought there were solutions to remove all the NMDevice/NMVpnConneciton knowledge in the DRM, *until* I got to this code :)  I think we can make most of this agnostic too, with the exception of the get_best_device() stuff.  Not sure how that should work yet.

> > Doing this would allow us to more easily add testcases for the DRM too (to
> > ensure route ordering and behavior) since we don't have to create "fake"
> > NMDevice and NMVpnConnection classes.
> 
> If you really want to go that route, I would rather have NMDevice and
> NMVpnConnection implement a slim interface. Only downside is, that this would
> be so cumbersome in glib. 

I'm not convinced that's really better...  I still think we can make the DRM somewhat simpler by only tracking the IPConfig objects, but I will try to put code before my words to see if this can actually be done.  Again, I *think* it can be done, but I'm not sure yet.

> > For the 'never-default' stuff, shouldn't that always come from the NMIPxConfig
> > objects?  Do we really need the setting at all?
> 
> you mean:
>   never_default =    nm_ip4_config_get_never_default (vpn_config)
>       || nm_setting_ip4_config_get_never_default 
>               (nm_connection_get_setting_ip4_config (connection));
> ? That comes nm-policy.c:get_best_ip4_config()

I think that's redundant now because nm-vpn-connection.c::nm_vpn_connection_ip4_config_get() calls nm_ip4_config_merge_setting() which copies the setting's never-default into the VPN's IPConfig.  So if we have the IPConfig we should have the right never-default including the user's preferences.  Same for IPv6.

> > For the route priority stuff, the VPN or the NMDevice could set the fallback
> > route priority on the NMIPxCOnfig object before handing it to the DRM.
> 
> you mean the effective_metric? That cannot be set by NMIPxConfig, because only
> DRM knows all the routes and can assign the effective metric to avoid clashes.

I meant the calls to nm_device_get_ip4_route_metric() in _ipx_get_best_activating_device() and nm_vpn_connection_get_ip4_route_metric() in _ipx_update_default_route().

I don't think the "default metric" stuff is really independent of the IPConfig at all (like ifindex) so I think the NMDevice and NMVpnConnection could set that on their IPConfig before handing it to the DRM.

---------------

That all said, it's just implementation details.  I'm OK staying with the current approach for 1.0, and then potentially reworking it later.  Since I'm advocating for the changes above, I'm happy to prototype them and see if they are simpler/work better and then we can discuss.  Does that sound OK?
Comment 28 Dan Williams 2014-11-07 03:01:46 UTC
I get this assertion when Ctl+C-ing NM.  I had both p4p1 and wlp12s0 (wifi) active and both had default routes.

^C
NetworkManager[26183]: <info>  caught signal 2, shutting down normally.
NetworkManager[26183]: <info>  (wlp12s0): device state change: activated -> deactivating (reason 'removed') [100 110 36]
NetworkManager[26183]: _ipx_get_best_activating_device: assertion '!best_activated_device || g_slist_find ((GSList *) devices, best_activated_device)' failed
NetworkManager[26183]: <info>  (wlp12s0): device state change: deactivating -> unmanaged (reason 'removed') [110 10 36]
NetworkManager[26183]: <info>  (wlp12s0): deactivating device (reason 'removed') [36]
NetworkManager[26183]: <info>  (wlp12s0): canceled DHCP transaction, DHCP client pid 26696
NetworkManager[26183]: <info>  (wlp12s0): DHCPv4 state changed bound -> done
NetworkManager[26183]: _ipx_get_best_activating_device: assertion '!best_activated_device || g_slist_find ((GSList *) devices, best_activated_device)' failed
NetworkManager[26183]: <info>  exiting (success)
Comment 29 Dan Williams 2014-11-07 04:01:00 UTC
In commit 62a44e0c ("fixup! core: add manager for default route and support multiple default routes") there is this line:

	memset (route, 0, sizeof (*route));
	route->source = NM_IP_CONFIG_SOURCE_USER;
	route->gateway = *gateway;
	route->metric = nm_device_get_ip6_route_metric (self);

This crashes for PPP-based WWAN devices because 'gateway' is NULL.  I think this should be:

   route->gateway = gateway ? *gateway : in6addr_any;

But actually, I don't think we'll ever have WWAN devices with a NULL gateway because the IPv6 over WWAN still uses router advertisements to get a prefix, thus you'll always have a gateway if the device has real IPv6 connectivity.

SO I think for IPv6 you can remove the NM_DEVICE_IS_MODEM() bit there and only check for 'gateway != NULL'.
Comment 30 Thomas Haller 2014-11-07 14:45:21 UTC
(In reply to comment #28)
> I get this assertion when Ctl+C-ing NM.  I had both p4p1 and wlp12s0 (wifi)
> active and both had default routes.
> 
> ^C
> NetworkManager[26183]: <info>  caught signal 2, shutting down normally.
> NetworkManager[26183]: <info>  (wlp12s0): device state change: activated ->
> deactivating (reason 'removed') [100 110 36]
> NetworkManager[26183]: _ipx_get_best_activating_device: assertion
> '!best_activated_device || g_slist_find ((GSList *) devices,
> best_activated_device)' failed
> NetworkManager[26183]: <info>  (wlp12s0): device state change: deactivating ->
> unmanaged (reason 'removed') [110 10 36]
> NetworkManager[26183]: <info>  (wlp12s0): deactivating device (reason
> 'removed') [36]
> NetworkManager[26183]: <info>  (wlp12s0): canceled DHCP transaction, DHCP
> client pid 26696
> NetworkManager[26183]: <info>  (wlp12s0): DHCPv4 state changed bound -> done
> NetworkManager[26183]: _ipx_get_best_activating_device: assertion
> '!best_activated_device || g_slist_find ((GSList *) devices,
> best_activated_device)' failed
> NetworkManager[26183]: <info>  exiting (success)


This should be fixed by:

 static void
 _cleanup_generic_post (NMDevice *self, gboolean deconfigure)
 {
     NMDevicePrivate *priv = NM_DEVICE_GET_PRIVATE (self);
     NMDeviceStateReason ignored = NM_DEVICE_STATE_REASON_NONE;
 
+    priv->default_route.v4_has = FALSE;
+    priv->default_route.v6_has = FALSE;
+
+    nm_default_route_manager_ip4_remove_default_route (nm_default_route_manager_get (), self);
+    nm_default_route_manager_ip6_remove_default_route (nm_default_route_manager_get (), self);



(for the moment, I would leave the assertion, because it might be correct and point out to a real problem).


(In reply to comment #29)
> In commit 62a44e0c ("fixup! core: add manager for default route and support
> multiple default routes") there is this line:
> 
>     memset (route, 0, sizeof (*route));
>     route->source = NM_IP_CONFIG_SOURCE_USER;
>     route->gateway = *gateway;
>     route->metric = nm_device_get_ip6_route_metric (self);
> 
> This crashes for PPP-based WWAN devices because 'gateway' is NULL.  I think
> this should be:
> 
>    route->gateway = gateway ? *gateway : in6addr_any;
> 
> But actually, I don't think we'll ever have WWAN devices with a NULL gateway
> because the IPv6 over WWAN still uses router advertisements to get a prefix,
> thus you'll always have a gateway if the device has real IPv6 connectivity.
> 
> SO I think for IPv6 you can remove the NM_DEVICE_IS_MODEM() bit there and only
> check for 'gateway != NULL'.

done both
Comment 32 Ferry Huberts 2014-11-07 15:16:42 UTC
It appears you picked an int64 for the route metric.
That is awesome because in olsrd we do the same, or is it because of that? :-)

When can I expect this in Fedora?
Comment 33 Dan Winship 2014-11-07 15:24:46 UTC
(In reply to comment #32)
> It appears you picked an int64 for the route metric.
> That is awesome because in olsrd we do the same, or is it because of that? :-)

we just needed to be able to express both UINT32_MAX and -1, so int64 was really the only choice

> When can I expect this in Fedora?

F22 probably
Comment 34 Ferry Huberts 2014-11-07 15:27:54 UTC
arghhh why so late?
at least make it an update for F21... please...
Comment 35 Dan Winship 2014-11-07 15:30:20 UTC
This is one of the last features going into what will be NetworkManager 1.0. Way too much code has changed for us to be able to put 1.0 out as an update for F21, and this patch depends on way too many changes elsewhere in the code for us to be able to backport it to 0.9.10.
Comment 36 Ferry Huberts 2014-11-07 15:32:45 UTC
In that case I truely hope that F22 will be a normal 1/2 year release instead of waiting another year for this (critical) feature.
Can I at least try it out on rawhide then?
Comment 37 Thomas Haller 2014-11-10 15:07:34 UTC
Reopening this BZ, because I thought about a few special cases that can be improved.

Please review: th/bgo735512_route_metric_v2
Comment 38 Thomas Haller 2014-11-11 11:42:35 UTC
(In reply to comment #37)
> Reopening this BZ, because I thought about a few special cases that can be
> improved.
> 
> Please review: th/bgo735512_route_metric_v2

(Repushed again. Bugfixes and two major enhancements. See commit messages)

it tests great for me
Comment 39 Dan Winship 2014-11-12 18:54:37 UTC
> device: only add default route when having any addresses

>+	priv->ext_ip4_config_had_any_addresses =    priv->ext_ip4_config
>+	                                         && nm_ip4_config_get_num_addresses (priv->ext_ip4_config) > 0;

I don't know about other editors, but emacs won't respect that indentation (eg, if you have to reindent the line after making a change) unless you parenthesize the right side:

>	priv->ext_ip4_config_had_any_addresses = (   priv->ext_ip4_config
>	                                          && nm_ip4_config_get_num_addresses (priv->ext_ip4_config) > 0);



> policy: consider additional assumed routes when synchronizing the default route
>
> Not only consider the best route of assumed devices when synching the route
> metrics. This fixes the following scenario:

s/Not/Don't/

> When activating em2, NMDefaultRouteManager would  have determined
> 21 as the effective metric, thus overwriting the assumed route of em1.

I think you mean "override" (here and again later on). (ie, the new route has higher priority than the old route; it does not erase and replace the old route).

> 	/* For synced entries, we expect that the metric is chosen uniquely. */
>-	g_assert (!entry || !has_unsynced_entry || metric == G_MAXUINT32);
>+	g_assert (!entry || !entry_unsynced || (entry->route.rx.ifindex == entry_unsynced->route.rx.ifindex) || metric == G_MAXUINT32);

The comment needs to be expanded. I don't see why the new clause should necessarily be true.

>+_resync_all_construct_used_metric_index (const VTableIP *vtable, NMDefaultRouteManager *self, GArray *routes)

I would call this something like "get_assumed_interface_metrics", and then you don't even need the comment explaining what it does, since it's obvious from the name...

>+		/* A non synced entry is completely ignored, if we have
>+		 * a synced entry for the same if index.

how would that happen?

>+		       && g_hash_table_lookup_extended (assumed_metrics, GUINT_TO_POINTER (expected_metric), NULL, NULL)) {

g_hash_table_contains (assumed_metric, GUINT_TO_POINTER (expected_metric))

>+			if (   r->metric == expected_metric
>+				&& r->ifindex == entry->route.rx.ifindex) {

indentation


> policy: pick up externally configured default routes for managed interfaces

This seems wrong. If the device is managed, then it's managed, and NM should control its default routes, not let the user do it.
Comment 40 Thomas Haller 2014-11-12 21:23:50 UTC
(In reply to comment #39)


addresses your points, and rebased+repushed

> >+		/* A non synced entry is completely ignored, if we have
> >+		 * a synced entry for the same if index.
> 
> how would that happen?

When activating a openvpn connection you'll have one VPN entry, and NM will assume a connection on tun0. Certainly that is already a bug in VPN code, but it shows a case where we can have more then one entries for the same ifindex.

> > policy: pick up externally configured default routes for managed interfaces
> 
> This seems wrong. If the device is managed, then it's managed, and NM should
> control its default routes, not let the user do it.

I disagree.

This is only relevant for nm-managed, non-assumed, never-default=yes connections.

When the user adds a non-default-route to a managed device, NM will pick them up (and not mess with it).

Now, when the user adds a default-route, the same happens. NM will will happily allow that and accept the user-added default route. (ie. it will consider it for "best-device").

I am actually more concerned about the previous commit, that constantly undoes user actions on nm-managed, non-assumed, never-default=no devices.

But overall, I think both changes are great, and make NM much better to pickup externally configured stuff.
Comment 41 Thomas Haller 2014-11-13 09:52:36 UTC
Todo: I think the following scenario shoes a bug.


Have two connections active, with exactly the same default-route (same metric, same gateway).

default via 192.168.4.1 dev eth1  proto static  metric 20
default via 192.168.4.1 dev eth2  proto static  metric 21 


This is correct. If the gateway is not reachable directly, we also must add a direct route:

192.168.4.1 dev eth2  proto static  scope link  metric 20


(note that we don't do this metric-game for the direct route as we do it for the default route. Hence we can only have one route. In this case, the lastly activated connection wins).


I think this alone is already wrong, because we have a default route via 192.168.4.1 on eth1, but 192.168.4.1 is supposedly reachable via eth2.

Another bug is, if you now down eth2, the direct route to 192.168.4.1 via eth1 will not be re-added.



This is not easily solvable, because the direct-route is not managed by NMDefaultRouteManager... thinking
Comment 42 Ferry Huberts 2014-11-13 09:57:20 UTC
(In reply to comment #41)
> Todo: I think the following scenario shoes a bug.
> 
> 
> Have two connections active, with exactly the same default-route (same metric,
> same gateway).
> 
> default via 192.168.4.1 dev eth1  proto static  metric 20
> default via 192.168.4.1 dev eth2  proto static  metric 21 
> 
> 
> This is correct. If the gateway is not reachable directly, we also must add a
> direct route:
> 
> 192.168.4.1 dev eth2  proto static  scope link  metric 20
> 
> 
> (note that we don't do this metric-game for the direct route as we do it for
> the default route. Hence we can only have one route. In this case, the lastly
> activated connection wins).
> 
> 
> I think this alone is already wrong, because we have a default route via
> 192.168.4.1 on eth1, but 192.168.4.1 is supposedly reachable via eth2.
> 
> Another bug is, if you now down eth2, the direct route to 192.168.4.1 via eth1
> will not be re-added.
> 
> 
> 
> This is not easily solvable, because the direct-route is not managed by
> NMDefaultRouteManager... thinking

I see no problem with always adding the direct route, even though it's already reachable. That also simplifies the interface setup: always add/remove all routes with the interface metric.
The kernel will pick the right route because of the metric.

This also translates to default gateway routes.
Comment 43 Thomas Haller 2014-11-13 13:33:57 UTC
(In reply to comment #42)
> (In reply to comment #41)

> I see no problem with always adding the direct route, even though it's already
> reachable. That also simplifies the interface setup: always add/remove all
> routes with the interface metric.
> The kernel will pick the right route because of the metric.
> 
> This also translates to default gateway routes.

The issue is if you have the *same* metric. Kernel does not allow adding the same destination/prefix,metric tuple to different interfaces. Quite annoying, IMO.

The problem from comment 41 is not only limited to direct routes to the gateway. I opened bug 740064 to handle this issue in general.
Comment 44 Ferry Huberts 2014-11-13 13:50:15 UTC
(In reply to comment #43)
> (In reply to comment #42)
> > (In reply to comment #41)
> 
> > I see no problem with always adding the direct route, even though it's already
> > reachable. That also simplifies the interface setup: always add/remove all
> > routes with the interface metric.
> > The kernel will pick the right route because of the metric.
> > 
> > This also translates to default gateway routes.
> 
> The issue is if you have the *same* metric. Kernel does not allow adding the
> same destination/prefix,metric tuple to different interfaces. Quite annoying,
> IMO.
> 
> The problem from comment 41 is not only limited to direct routes to the
> gateway. I opened bug 740064 to handle this issue in general.

No you do not.
You made a typo: eth2 has metric 21, not 20.
Comment 45 Dan Williams 2014-11-13 22:27:32 UTC
th/bgo735512_route_metric_v2

> policy: refactor NMDefaultRouteManager to use union type NMPlatforIPXRoute

When memsetting the route in _ipx_update_default_route() shouldn't the memset call be:

memset (&rt, 0, sizeof (rt));

just to be safe, otherwise the end of the struct will have random data in it because the IPv6 part is larger than the IPv4 part.


> device: only add default route when having any address

I don't like the new priv-> variable, couldn't ip4_config_merge_and_apply() simply check:

 else if ((priv->ext_ip4_config && nm_ip4_config_get_num_addresses (priv->ext_ip4_config)) || (commit && nm_ip4_config_get_num_adderesses (composite))

why does the value of whether priv->ext_ip4_config had any addresses need to stick around?  Yes, it gets re-generated when changes happen, but shouldn't the decision be made with the information about what data priv->ext_ip4_config has *now*, rather than at some point in the past?

> policy: consider additional assumed routes...

In the commit message, "syncing" instead of "synching"

> policy: resnyc routes on platform change events

"is_resynching" -> "is_resyncing"

But even so, the whole structure is called 'resync', so I think "resync.is_resyncing" is redundant.  I'd just call this "resync.count".

General comment, some of the LOG lines are really long, would be good to keep them to max 120 or so chars wid.  For example, in _resync_idle_reschedule().
Comment 46 Thomas Haller 2014-11-14 12:31:24 UTC
(In reply to comment #45)
> th/bgo735512_route_metric_v2
> 
> > policy: refactor NMDefaultRouteManager to use union type NMPlatforIPXRoute
> 
> When memsetting the route in _ipx_update_default_route() shouldn't the memset
> call be:
> 
> memset (&rt, 0, sizeof (rt));
> 
> just to be safe, otherwise the end of the struct will have random data in it
> because the IPv6 part is larger than the IPv4 part.

Added a fixup for it. Though I don't think this is better, because if we initialize the value rt.r4, then whenever you try to interpret it as rt.r6 you get garbage. Doesn't matter whether the trailing bytes are initialized to zero.

Indeed if they are not initialized, valgrind could warn about such use... (I guess).



> > device: only add default route when having any address
> 
> I don't like the new priv-> variable, couldn't ip4_config_merge_and_apply()
> simply check:
> 
>  else if ((priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> (priv->ext_ip4_config)) || (commit && nm_ip4_config_get_num_adderesses
> (composite))
> 
> why does the value of whether priv->ext_ip4_config had any addresses need to
> stick around?  Yes, it gets re-generated when changes happen, but shouldn't the
> decision be made with the information about what data priv->ext_ip4_config has
> *now*, rather than at some point in the past?

I don't think it's the same.

In the "commit && nm_ip4_config_get_num_adderesses(composite)" case we are anyway good, because we are going to configure some addresses.

The commit==FALSE case is called from update_ip_config().
Imagine ext_ip4_config has address 192.168.1.2/24 already configured. One of the nm_ip4_config_subtract() calls could remove the address.
With your code, we would *not* add the default route in this case, while with the current version we would.


I added one fixup, which should be no functional change, but make the condition a clearer.


> > policy: consider additional assumed routes...
> 
> In the commit message, "syncing" instead of "synching"

fixed


> > policy: resync routes on platform change events
> 
> "is_resynching" -> "is_resyncing"
> 
> But even so, the whole structure is called 'resync', so I think
> "resync.is_resyncing" is redundant.  I'd just call this "resync.count".

how about "reentrancy_guard"?


> General comment, some of the LOG lines are really long, would be good to keep
> them to max 120 or so chars wid.  For example, in _resync_idle_reschedule().

done.



Added another commit:
>> policy: sort default routes by metrics before adding them
   


Rebased on master and repushed.
Comment 47 Dan Williams 2014-11-14 18:54:12 UTC
(In reply to comment #46)
> > I don't like the new priv-> variable, couldn't ip4_config_merge_and_apply()
> > simply check:
> > 
> >  else if ((priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> > (priv->ext_ip4_config)) || (commit && nm_ip4_config_get_num_adderesses
> > (composite))
> > 
> > why does the value of whether priv->ext_ip4_config had any addresses need to
> > stick around?  Yes, it gets re-generated when changes happen, but shouldn't the
> > decision be made with the information about what data priv->ext_ip4_config has
> > *now*, rather than at some point in the past?
> 
> I don't think it's the same.
> 
> In the "commit && nm_ip4_config_get_num_adderesses(composite)" case we are
> anyway good, because we are going to configure some addresses.
> 
> The commit==FALSE case is called from update_ip_config().
> Imagine ext_ip4_config has address 192.168.1.2/24 already configured.

Is that address added externally from NetworkManager?  Or by some IP config process in NetworkManager?

> One of the nm_ip4_config_subtract() calls could remove the address.

Only if the address was configured by NetworkManager internally via DHCP, IPv4LL, WWAN, or VPN.  If the address was configured externally, it should always be in ext_ip4_config.

ext_ip4_config initially includes *all* addresses, even those that NM itself adds as a result of DHCP or RA or static config.  So the subtraction stuff is required to ensure that ext_ip4_config represents only that configuration added externally to NM.

So I'm not sure what caching the value is actually supposed to achieve.  As it currently is, it will always be TRUE if the interface ever has any IP addresses, either added externally or by NM itself.

> With your code, we would *not* add the default route in this case, while with
> the current version we would.

If the address was added externally to NM, then it should still be in ext_ip4_config and we can check that with nm_ip4_config_get_num_addresses().  But if the address was added externally, should we *ever* create a default route based on that if we don't already have one?

ip4_config_merge_and_apply() gets called any time the device has IP configuration (either assumed, generated + assumed, or managed).  The cases are:

1) unmanaged but user adds external address
- here, the device has no NMConnection yet (because it was unconfigured).  update_ip_config() is run but only priv->ext_ip4_config should be !NULL and only priv->ext_ip4_config will have any addresses.  Then ip4_config_merge_and_apply() gets called to update the 'composite' config that D-Bus clients will see.  Since the device has no connection yet, none of the default route stuff will run here.  But then nm_device_set_ip4_config() is called, and has_changes will be TRUE, and a RECHECK_ASSUME will be emitted.

That will trigger connection generation, and connection assumption, and we'll eventually end up in ip4_config_merge_and_apply() again.  This time the device *does* have a connection (generated + assumed).  So it should not run the default route code in ip4_config_merge_and_apply() this time either, but it will pick up any externally configured default route.

2) assumed from existing config but user adds external address
- the device has a (generated + assumed) connection, so the second part of #1 above will run, and an externally configured default route will be picked up.  The external address addition should have no effect on the default route, only the user adding an external default route should have an effect.

3) managed and user adds external address
- the device has existing NM-created configuration.  The device does not use an assumed connection, and the connection allows the device to have a default route.  Thus we finally get to the code in question:

if (   !assumed
    && nm_default_route_manager_ip4_connection_has_default_route (nm_default_route_manager_get (), connection)) {
	guint32 gateway = 0;

we'll get here for this case (#3).

	if (   (!commit && priv->ext_ip4_config_had_any_addresses)

!commit will only be true when merge_ip4_config() is called from update_ip_confgi() in response to the user adding the external address.  ext_ip4_config will contain that external IP address already, so we can just check whether ext_ip4_config has any addresses, right?

	    || ( commit && nm_ip4_config_get_num_addresses (composite))) {

This will be true from a lot of places, and it will capture *both* an NM-added address and externally added addresses.


I guess I'm still not quite clear on what ext_ip4_config_had_any_addresses is supposed to do?

> > But even so, the whole structure is called 'resync', so I think
> > "resync.is_resyncing" is redundant.  I'd just call this "resync.count".
> 
> how about "reentrancy_guard"?

That sounds pretty verbose too...  I'd just go with "count" or "guard", it'll be apparent what's going on I think.
Comment 48 Thomas Haller 2014-11-18 19:57:42 UTC
(In reply to comment #47)
> (In reply to comment #46)
> > > I don't like the new priv-> variable, couldn't ip4_config_merge_and_apply()
> > > simply check:
> > > 
> > >  else if ((priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> > > (priv->ext_ip4_config)) || (commit && nm_ip4_config_get_num_adderesses
> > > (composite))
> > > 
> > > why does the value of whether priv->ext_ip4_config had any addresses need to
> > > stick around?  Yes, it gets re-generated when changes happen, but shouldn't the
> > > decision be made with the information about what data priv->ext_ip4_config has
> > > *now*, rather than at some point in the past?
> > 
> > I don't think it's the same.
> > 
> > In the "commit && nm_ip4_config_get_num_adderesses(composite)" case we are
> > anyway good, because we are going to configure some addresses.
> > 
> > The commit==FALSE case is called from update_ip_config().
> > Imagine ext_ip4_config has address 192.168.1.2/24 already configured.
> 
> Is that address added externally from NetworkManager?  Or by some IP config
> process in NetworkManager?
> 
> > One of the nm_ip4_config_subtract() calls could remove the address.
> 
> Only if the address was configured by NetworkManager internally via DHCP,
> IPv4LL, WWAN, or VPN.  If the address was configured externally, it should
> always be in ext_ip4_config.
> 
> ext_ip4_config initially includes *all* addresses, even those that NM itself
> adds as a result of DHCP or RA or static config.  So the subtraction stuff is
> required to ensure that ext_ip4_config represents only that configuration added
> externally to NM.
> 
> So I'm not sure what caching the value is actually supposed to achieve.  As it
> currently is, it will always be TRUE if the interface ever has any IP
> addresses, either added externally or by NM itself.
> 
> > With your code, we would *not* add the default route in this case, while with
> > the current version we would.
> 
> If the address was added externally to NM, then it should still be in
> ext_ip4_config and we can check that with nm_ip4_config_get_num_addresses(). 
> But if the address was added externally, should we *ever* create a default
> route based on that if we don't already have one?
> 
> ip4_config_merge_and_apply() gets called any time the device has IP
> configuration (either assumed, generated + assumed, or managed).  The cases
> are:
> 
> 1) unmanaged but user adds external address
> - here, the device has no NMConnection yet (because it was unconfigured). 
> update_ip_config() is run but only priv->ext_ip4_config should be !NULL and
> only priv->ext_ip4_config will have any addresses.  Then
> ip4_config_merge_and_apply() gets called to update the 'composite' config that
> D-Bus clients will see.  Since the device has no connection yet, none of the
> default route stuff will run here.  But then nm_device_set_ip4_config() is
> called, and has_changes will be TRUE, and a RECHECK_ASSUME will be emitted.
> 
> That will trigger connection generation, and connection assumption, and we'll
> eventually end up in ip4_config_merge_and_apply() again.  This time the device
> *does* have a connection (generated + assumed).  So it should not run the
> default route code in ip4_config_merge_and_apply() this time either, but it
> will pick up any externally configured default route.
> 
> 2) assumed from existing config but user adds external address
> - the device has a (generated + assumed) connection, so the second part of #1
> above will run, and an externally configured default route will be picked up. 
> The external address addition should have no effect on the default route, only
> the user adding an external default route should have an effect.
> 
> 3) managed and user adds external address
> - the device has existing NM-created configuration.  The device does not use an
> assumed connection, and the connection allows the device to have a default
> route.  Thus we finally get to the code in question:
> 
> if (   !assumed
>     && nm_default_route_manager_ip4_connection_has_default_route
> (nm_default_route_manager_get (), connection)) {
>     guint32 gateway = 0;
> 
> we'll get here for this case (#3).
> 
>     if (   (!commit && priv->ext_ip4_config_had_any_addresses)
> 
> !commit will only be true when merge_ip4_config() is called from
> update_ip_confgi() in response to the user adding the external address. 
> ext_ip4_config will contain that external IP address already, so we can just
> check whether ext_ip4_config has any addresses, right?
> 
>         || ( commit && nm_ip4_config_get_num_addresses (composite))) {
> 
> This will be true from a lot of places, and it will capture *both* an NM-added
> address and externally added addresses.
> 
> 
> I guess I'm still not quite clear on what ext_ip4_config_had_any_addresses is
> supposed to do?


What are you suggesting instead?

(A):

    if (   (priv->ext_ip4_config && nm_ip4_config_get_num_addresses
(priv->ext_ip4_config))
        || (commit && nm_ip4_config_get_num_adderesses (composite)) {

(B):

    if (   (!commit && priv->ext_ip4_config && nm_ip4_config_get_num_addresses (priv->ext_ip4_config))
        || ( commit && nm_ip4_config_get_num_addresses (composite))) {

(C):

    if (   (priv->ext_ip4_config && nm_ip4_config_get_num_addresses (priv->ext_ip4_config))
        || (nm_ip4_config_get_num_addresses (composite))) {


All the above is wrong.




Scenario: have an managed DHCP interface with two addresses assigned via DHCP. Manually remove one of the two IP addresses.

This will trigger update_ip_config(), where
    nm_ip4_config_subtract (priv->ext_ip4_config, priv->dev_ip4_config);
removes the one address from ext_ip4_config.

In both cases (A) and (B), ext_ip4_config contains no addresses. Current code will remove the default route from the interface, which is wrong. 

(C) is correct. But now remove the second IP address manually.

Now, the condition with (C) evaluates that we can have a default address. But that is wrong. We cannot, there is no more address configured.



Actually, the condition should be:

   if (   (!commit && nm_platform_ip4_address_has_any (ifindex)
       || ( commit && nm_ip4_config_get_num_adderesses (composite)) {

The chosen solution with ext_ip4_config_had_any_addresses is effectively the same, but saves iterating again over the platform cache.



> > > But even so, the whole structure is called 'resync', so I think
> > > "resync.is_resyncing" is redundant.  I'd just call this "resync.count".
> > 
> > how about "reentrancy_guard"?
> 
> That sounds pretty verbose too...  I'd just go with "count" or "guard", it'll
> be apparent what's going on I think.

renamed.



Added another commit 
>> policy: fix handling managed devices without default route
Comment 49 Dan Williams 2014-11-19 03:19:49 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > > I don't like the new priv-> variable, couldn't ip4_config_merge_and_apply()
> > > > simply check:
> > > > 
> > > >  else if ((priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> > > > (priv->ext_ip4_config)) || (commit && nm_ip4_config_get_num_adderesses
> > > > (composite))
> > > > 
> > > > why does the value of whether priv->ext_ip4_config had any addresses need to
> > > > stick around?  Yes, it gets re-generated when changes happen, but shouldn't the
> > > > decision be made with the information about what data priv->ext_ip4_config has
> > > > *now*, rather than at some point in the past?
> > > 
> > > I don't think it's the same.
> > > 
> > > In the "commit && nm_ip4_config_get_num_adderesses(composite)" case we are
> > > anyway good, because we are going to configure some addresses.
> > > 
> > > The commit==FALSE case is called from update_ip_config().
> > > Imagine ext_ip4_config has address 192.168.1.2/24 already configured.
> > 
> > Is that address added externally from NetworkManager?  Or by some IP config
> > process in NetworkManager?
> > 
> > > One of the nm_ip4_config_subtract() calls could remove the address.
> > 
> > Only if the address was configured by NetworkManager internally via DHCP,
> > IPv4LL, WWAN, or VPN.  If the address was configured externally, it should
> > always be in ext_ip4_config.
> > 
> > ext_ip4_config initially includes *all* addresses, even those that NM itself
> > adds as a result of DHCP or RA or static config.  So the subtraction stuff is
> > required to ensure that ext_ip4_config represents only that configuration added
> > externally to NM.
> > 
> > So I'm not sure what caching the value is actually supposed to achieve.  As it
> > currently is, it will always be TRUE if the interface ever has any IP
> > addresses, either added externally or by NM itself.
> > 
> > > With your code, we would *not* add the default route in this case, while with
> > > the current version we would.
> > 
> > If the address was added externally to NM, then it should still be in
> > ext_ip4_config and we can check that with nm_ip4_config_get_num_addresses(). 
> > But if the address was added externally, should we *ever* create a default
> > route based on that if we don't already have one?
> > 
> > ip4_config_merge_and_apply() gets called any time the device has IP
> > configuration (either assumed, generated + assumed, or managed).  The cases
> > are:
> > 
> > 1) unmanaged but user adds external address
> > - here, the device has no NMConnection yet (because it was unconfigured). 
> > update_ip_config() is run but only priv->ext_ip4_config should be !NULL and
> > only priv->ext_ip4_config will have any addresses.  Then
> > ip4_config_merge_and_apply() gets called to update the 'composite' config that
> > D-Bus clients will see.  Since the device has no connection yet, none of the
> > default route stuff will run here.  But then nm_device_set_ip4_config() is
> > called, and has_changes will be TRUE, and a RECHECK_ASSUME will be emitted.
> > 
> > That will trigger connection generation, and connection assumption, and we'll
> > eventually end up in ip4_config_merge_and_apply() again.  This time the device
> > *does* have a connection (generated + assumed).  So it should not run the
> > default route code in ip4_config_merge_and_apply() this time either, but it
> > will pick up any externally configured default route.
> > 
> > 2) assumed from existing config but user adds external address
> > - the device has a (generated + assumed) connection, so the second part of #1
> > above will run, and an externally configured default route will be picked up. 
> > The external address addition should have no effect on the default route, only
> > the user adding an external default route should have an effect.
> > 
> > 3) managed and user adds external address
> > - the device has existing NM-created configuration.  The device does not use an
> > assumed connection, and the connection allows the device to have a default
> > route.  Thus we finally get to the code in question:
> > 
> > if (   !assumed
> >     && nm_default_route_manager_ip4_connection_has_default_route
> > (nm_default_route_manager_get (), connection)) {
> >     guint32 gateway = 0;
> > 
> > we'll get here for this case (#3).
> > 
> >     if (   (!commit && priv->ext_ip4_config_had_any_addresses)
> > 
> > !commit will only be true when merge_ip4_config() is called from
> > update_ip_confgi() in response to the user adding the external address. 
> > ext_ip4_config will contain that external IP address already, so we can just
> > check whether ext_ip4_config has any addresses, right?
> > 
> >         || ( commit && nm_ip4_config_get_num_addresses (composite))) {
> > 
> > This will be true from a lot of places, and it will capture *both* an NM-added
> > address and externally added addresses.
> > 
> > 
> > I guess I'm still not quite clear on what ext_ip4_config_had_any_addresses is
> > supposed to do?
> 
> 
> What are you suggesting instead?
> 
> (A):
> 
>     if (   (priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> (priv->ext_ip4_config))
>         || (commit && nm_ip4_config_get_num_adderesses (composite)) {
> 
> (B):
> 
>     if (   (!commit && priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> (priv->ext_ip4_config))
>         || ( commit && nm_ip4_config_get_num_addresses (composite))) {
> 
> (C):
> 
>     if (   (priv->ext_ip4_config && nm_ip4_config_get_num_addresses
> (priv->ext_ip4_config))
>         || (nm_ip4_config_get_num_addresses (composite))) {
> 
> 
> All the above is wrong.
> 
> 
> 
> 
> Scenario: have an managed DHCP interface with two addresses assigned via DHCP.
> Manually remove one of the two IP addresses.

Ok, I think that explains a bit more what had_any_addresses is trying to do.  But I think there are some issues with that...

> This will trigger update_ip_config(), where
>     nm_ip4_config_subtract (priv->ext_ip4_config, priv->dev_ip4_config);
> removes the one address from ext_ip4_config.

If you remove an address (or route) that comes from an automatic source, NetworkManager will simply re-add that address back later.  There isn't any kind of negative tracking of addresses/routes from sources that NM manages (DHCP, RA, IPv4LL), there is only tracking of external addresses/routes that are not those from managed sources.

So in your scenario, if the user removes an IP address (or routes) that was assigned from DHCP, that address (or routes) will no longer be present in ext_ip4_config.  But it's still going to be in dev_ip4_config.  So the result of ext_ip4_config - dev_ip4_config won't do anything interesting.

And the address (or route) will just get re-applied eventually because dev_ip4_config gets merged into the composite config that gets applied to the device.  Yeah, it won't get applied immediately because update_ip_config() doesn't set commit=TRUE, but the next lease or RA or whatever will do it.  Or if it doesn't, it should.

> In both cases (A) and (B), ext_ip4_config contains no addresses. Current code
> will remove the default route from the interface, which is wrong. 

Yes, that's wrong.  But I can't see where it would remove the default route in the code?  And I don't see why ext_ip4_config should affect that decision, because our default route decisions should be based on 'composite'.  In this case, dev_ip4_config is what has the addresses because they are configured from DHCP, and that means the composite has the addresses now too.  And the default route should be determined from 'composite', which takes everything into account.

> (C) is correct. But now remove the second IP address manually.
> 
> Now, the condition with (C) evaluates that we can have a default address. But
> that is wrong. We cannot, there is no more address configured.

But there will still be addresses in dev_ip4_config because we don't try to remove them from dev/vpn/wwan/etc configs when they are removed manually.  So the addresses that the user just removed will still exist in dev_ip4_config and will still be in 'composite', will still be exported over D-Bus, and will still get applied to the interface any time the config gets updated.

> Actually, the condition should be:
> 
>    if (   (!commit && nm_platform_ip4_address_has_any (ifindex)
>        || ( commit && nm_ip4_config_get_num_adderesses (composite)) {
> 
> The chosen solution with ext_ip4_config_had_any_addresses is effectively the
> same, but saves iterating again over the platform cache.

Yes, that would be more "correct" but it's trying to handle a case that's not really supported.  There isn't any concept of "I removed this address manually so blacklist it forever"...  so when the lease gets renewed, the addresses will just get added right back to the interface.  If a new RA comes in, we'll just add that address right back to the interface with an updated lifetime.

Unless we fix that (and I'm not really sure it's useful to do so) then it's kinda pointless to accommodate that, when the thing that was removed will just come right back in the near future.  This is the same behavior you'd get running dhclient manually too, it'll just re-add the address/routes on renewal.  

So if the default route gets removed because the user removed a DHCP IP address, then I think we should certainly make sure the default route gets added back, but I don't think it should have anything do with ext_ip4_config.  I think that decision has to be made with 'composite', because that has the full picture.

Wouldn't the correct thing to do be just:

if (nm_ip4_config_get_num_addresses (composite))

because if composite has any addresses, then that means we expect the interface to have some (even if the user removed one), and we should add a default route based on that. 

That may be a NOP though, because the decision of whether or not to actually commit to the kernel (and therefore add the address back) is based on a diff of the old + new composite configs, and in this case they haven't changed because dev_ip4_config (which still has the removed address) was merged back into both the old and new composites.  So we should at least figure out how to fix that...
Comment 50 Thomas Haller 2014-11-19 11:57:24 UTC
- NMDefaultRouteManager does not add addresses or direct routes to the default gateway. That is done by NMDevice. NMDefaultRouteManager only adds the default route.

- If a device has no addresses (physically) configured, NMDefaultRouteManager cannot add a default route and it will fail with an <error> line. Also, it will keep pretending that the device has a default route, while it actually hasn't. This is wrong behavior [1].

- ip4_config_merge_and_apply() gets called with either commit, or not commit. In the commit case, we care about 'composite', because first we will add all the addresses and routes we need, we will platform_sync them, and then add the default route. That works fine.
In the non-commit case, 'composite' is just a wish of what should be configured.


If you remove all addresses, it will trigger update_ip4_config() and eventually ip4_config_merge_and_apply() with commit==FALSE. It does not matter that there are addresses in composite/dev_ip4_config/etc., because we are not going to commit them now.


> If you remove an address (or route) that comes from an automatic source,
> NetworkManager will simply re-add that address back later. 

"later". Yes, but not now (in the !commit case).

> And the address (or route) will just get re-applied eventually because
> dev_ip4_config gets merged into the composite config that gets applied to the
> device.  Yeah, it won't get applied immediately because update_ip_config()
> doesn't set commit=TRUE, but the next lease or RA or whatever will do it.  Or
> if it doesn't, it should.

Indeed. It will be re-added later at random times. And we will later also re-add the default route. But for now, it is gone.


For now, NMDefaultRouteManager must forget that the device has a default route. Indeed, the default route is already gone from the interface.


> Unless we fix that (and I'm not really sure it's useful to do so) then it's
> kinda pointless to accommodate that, when the thing that was removed will just
> come right back in the near future. 

Important: all this is not about supporting the use-case of users manually blacklisting IP addresses. It is about not behaving wrongly if the user removes an IP address [1].




Again the example:

A managed DHCP interface, two addresses.

Remove the first address. This leads to ip4_config_merge_and_apply(commit=FALSE).
  - default route must stay on the interface
  - ext_ip4_had_ip_addresses=TRUE
  - nm_ip4_config_get_num_addresses(ext_ip4_config)==0
  - nm_ip4_config_get_num_addresses(composite)==2

Remove the second, last address.
  - NMDRM must forget the default route for the device
  - ext_ip4_had_ip_addresses=FALSE
  - nm_ip4_config_get_num_addresses(ext_ip4_config)==0
  - nm_ip4_config_get_num_addresses(composite)==2


You say:

> So the result of ext_ip4_config - dev_ip4_config won't do anything 
> interesting.

I disagree. Note that neither nm_ip4_config_get_num_addresses(ext_ip4_config) nor nm_ip4_config_get_num_addresses(composite) can express what we actually ask for. Only ext_ip4_had_ip_addresses can.




Indeed a similar problem is if you remove the direct route to the gateway. But that is a less common issue. Maybe for that, we should even add that route in the !commit case, but anyway, that would be a separate issue.
Comment 51 Dan Williams 2014-11-19 18:02:51 UTC
(In reply to comment #50)
> - NMDefaultRouteManager does not add addresses or direct routes to the default
> gateway. That is done by NMDevice. NMDefaultRouteManager only adds the default
> route.

Yes.

> - If a device has no addresses (physically) configured, NMDefaultRouteManager
> cannot add a default route and it will fail with an <error> line. Also, it will
> keep pretending that the device has a default route, while it actually hasn't.
> This is wrong behavior [1].

Right.  But I think NM's existing behavior of allowing the address to be removed but adding it back later is OK, but unintentional.  I think we should  make it intentional, but not quite in this way with ext_ip4_config_had_any_addresses.

> - ip4_config_merge_and_apply() gets called with either commit, or not commit.
> In the commit case, we care about 'composite', because first we will add all
> the addresses and routes we need, we will platform_sync them, and then add the
> default route. That works fine.
> In the non-commit case, 'composite' is just a wish of what should be
> configured.

update_ip_config() only exists to create ext_ip4_config to preserve external IP changes.  Which is why it doesn't commit, because the interface already has the changes that ext_ip4_config describes, so it's pointless to recommit them in what would be a NOP.

What update_ip_config() does *not* handle (which I think was an oversight) is external changes that remove IP details from automatic configuration.  It's true that when this happens, the interface already has these changes, so "commit" should still be FALSE.

> If you remove all addresses, it will trigger update_ip4_config() and eventually
> ip4_config_merge_and_apply() with commit==FALSE. It does not matter that there
> are addresses in composite/dev_ip4_config/etc., because we are not going to
> commit them now.

Right.  But NM should really have a consistent view of the world here.  Two things:

1) the fact that we ignore composite/dev/vpn/wwan/etc is wrong.  But:

2) the fact that composite/dev/wwan/etc do not represent the current state of the interface is wrong too

> > If you remove an address (or route) that comes from an automatic source,
> > NetworkManager will simply re-add that address back later. 
> 
> "later". Yes, but not now (in the !commit case).

Yes.  And I think we should make NM have a consistent view of the world here.  We already have the old ext_ip4_config, so if we diff new -> old we'll see what changes got made.  If something got removed, then we should probably remove that from dev/wwan/ra configs.  Yes, the route/IP will just come back on the next lease renewal or RA receipt, but NM should really have a consistent view of the world until then.

> > And the address (or route) will just get re-applied eventually because
> > dev_ip4_config gets merged into the composite config that gets applied to the
> > device.  Yeah, it won't get applied immediately because update_ip_config()
> > doesn't set commit=TRUE, but the next lease or RA or whatever will do it.  Or
> > if it doesn't, it should.
> 
> Indeed. It will be re-added later at random times. And we will later also
> re-add the default route. But for now, it is gone.
> 
> 
> For now, NMDefaultRouteManager must forget that the device has a default route.
> Indeed, the default route is already gone from the interface.

Right.  But I don't think that decision should happen from ext_ip4_config, I think it should happen with 'composite' because that should reflect exactly the configuration the interface should have.  Thus I think we should somehow fix up the other configs so that when merged, 'composite' reflects the new configuration instead of being completely wrong after the removal.

> Again the example:
> 
> A managed DHCP interface, two addresses.
> 
> Remove the first address. This leads to
> ip4_config_merge_and_apply(commit=FALSE).
>   - default route must stay on the interface
>   - ext_ip4_had_ip_addresses=TRUE
>   - nm_ip4_config_get_num_addresses(ext_ip4_config)==0
>   - nm_ip4_config_get_num_addresses(composite)==2

Assuming people agree with me about making sure that 'composite' reflects what is actually configured on the itnerface, this process would isntead be:

Remove the first address:
- diff old + new ext_ip4_config and find an IP address was removed.  Remove that address from dev/wwan/vpn/etc
- This leads to ip4_config_merge_and_apply(commit=FALSE)
- 'composite' no longer has the user-removed IP address but still has another one
- no changes required unless the removed IP address was the route to the current default gateway

> Remove the second, last address.
>   - NMDRM must forget the default route for the device
>   - ext_ip4_had_ip_addresses=FALSE
>   - nm_ip4_config_get_num_addresses(ext_ip4_config)==0
>   - nm_ip4_config_get_num_addresses(composite)==2

Remove the second (last) address:
- diff old + new ext_ip4_config and find an IP address was removed.  Remove that address from dev/wwan/vpn/etc
- This leads to ip4_config_merge_and_apply(commit=FALSE)
- 'composite' no longer has any addresses, thus the default route must be removed

If we do not make 'composite' consistent with the interface's configuration, then our D-Bus interface is lying, and I think that's wrong...
Comment 52 Dan Williams 2014-11-19 21:55:47 UTC
As discussed on IRC, OK to merge route_metric_v2.
Comment 53 Thomas Haller 2014-11-19 22:05:19 UTC
Merged as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=5338178d8161a2f37d756896be43859b3b86ff3e

Leave BZ open, for remaining issues (see comment 51)
Comment 54 Thomas Haller 2014-11-20 16:40:19 UTC
Opened new bug 740443 for the remaining issue.

Closing this bug as solved.