After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 702488 - [review] pavlix/runtime: support foreign and configured runtime connection take over
[review] pavlix/runtime: support foreign and configured runtime connection ta...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
git master
Other Linux
: Normal normal
: ---
Assigned To: Pavel Simerda
NetworkManager maintainer(s)
Depends on: 699842 703311
Blocks: 682872 683206
 
 
Reported: 2013-06-17 16:38 UTC by Pavel Simerda
Modified: 2013-11-08 22:55 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
core: match connections to connections instead of devices (10.68 KB, patch)
2013-07-25 23:13 UTC, Pavel Simerda
none Details | Review
patch for "platform: fix and simplify address lifetime compensation" (1.46 KB, patch)
2013-08-30 16:40 UTC, Thomas Haller
none Details | Review

Description Pavel Simerda 2013-06-17 16:38:28 UTC
I'm starting my work on pushing initial IP configuration to NMConnection objects. AFAIK this means I have to create NMConnection objects *if* the respective device is detected to be already configured and *and* no assumable connection was found. Could anyone please confirm this, and also confirm that nm-manager's add_device is the right starting point?

Also, I would like to know whether there are any dependencies we're missing and whether taking over unknown connections (i.e. creating NMConnection for existing configuration) is always the right way to go. In my opinion, this would be a behavioral change if it blocks an auto=yes connection from being applied to an already configured device. Also, we should agree on what constitutes an already configured device. If it is IFF_UP, or if it's a device with non-linklocal addresses assigned, or what.
Comment 1 Dan Williams 2013-06-18 22:49:29 UTC
(In reply to comment #0)
> I'm starting my work on pushing initial IP configuration to NMConnection
> objects. AFAIK this means I have to create NMConnection objects *if* the
> respective device is detected to be already configured and *and* no assumable
> connection was found. Could anyone please confirm this, and also confirm that
> nm-manager's add_device is the right starting point?

Pretty much.  You're piggy-backing on the existing connection assume stuff, so:

1) scrape the device for addresses, routes, dhcp, etc and build up an NMConnection
2) compare to existing connections and find one to assume
3) if something to assume, done
4) if nothing to assume, ask the NMSettings to add your new connection as 'unsaved'
5) proceed to activate that using the same paths as the existing 'assume' code does

You'll need to expose nm-settings.c's add_new_conenction() in a more formal manner, maybe we should rename nm_settings_add_connection() => nm_settings_add_connection_with_auth() and rename add_new_connection() => nm_settings_add_connection_internal().

> Also, I would like to know whether there are any dependencies we're missing and
> whether taking over unknown connections (i.e. creating NMConnection for
> existing configuration) is always the right way to go. In my opinion, this
> would be a behavioral change if it blocks an auto=yes connection from being
> applied to an already configured device. Also, we should agree on what
> constitutes an already configured device. If it is IFF_UP, or if it's a device
> with non-linklocal addresses assigned, or what.

Yep, it's a behavior change, but one that we accept for a few reasons:

1) users are not expected to restart NetworkManager; but if they do, NM should recover those interfaces seamlessly and non-destructively through the work you're doing here

2) NM starts very early at system boot time, and will need to assume all these connections because the rootfs might be using one of them

3) interfaces added underneath NM are known to NM immediately and thus won't have any configuration yet, so there's nothing to pick up really.  Thus auto=yes connections would still apply here.

IMO, rarely do users start NetworkManager *wanting* it to blow everything away.  Yes, this is currently expected, but I don't believe it's extremely desirable.  We'd rather have a cooperative and unsurprising NM than one that just blows everything away.  Given that this happens at system bootup anyway, if at this point you don't have the connection you expected, then your initrd configuration is wrong anyway.

Would you agree that the auto=yes question is not relevant if NM is never restarted?  Then, if we agree there, would you agree that *if* we make this takeover/assume work correct, and *if* we make NM seamlessly handle changes made underneath it with /sbin/ip, that the reasons people stop and restart NM right now are mostly gone?

My point is, people stop NM now because they don't want NM messing with stuff they do manually.  But if we make NM cooperate, then there's much less reason to ever stop it, and thus the question is mostly avoided.
Comment 2 Pavel Simerda 2013-06-19 10:42:03 UTC
Thanks for your comments for now. Will get something working and we'll see.
Comment 3 Pavel Simerda 2013-06-21 20:06:26 UTC
Initial code is there, feel free to check whether it's getting in the right direction.

(In reply to comment #1)
> 1) scrape the device for addresses, routes, dhcp, etc and build up an
> NMConnection

Addresses done, routes will need a bit more care because of default routing and maybe address routes.

> 2) compare to existing connections and find one to assume

For simplicity I'm creating the foreign connection *after* connection selection.

> 3) if something to assume, done
> 4) if nothing to assume, ask the NMSettings to add your new connection as
> 'unsaved'

Done.

> 5) proceed to activate that using the same paths as the existing 'assume' code
> does
Comment 4 Pavel Simerda 2013-06-25 14:57:26 UTC
Apart from improving the existing code, the next step will be to build the foreign connection *before* we try to take over existing connections and instead of comparing the potentially assumable connections to the current status of the device, we will compare to the build up foreign connection [as dcbw already asked for]. Apart from that, we will want to be able to detect DHCP addresses and exclude them from the configured address list and, on the other hand, include the information that DHCP will be used for this connection.

This might be actually a bit complicated for various reasons, as it needs changes from the DHCP code and nm-platform, through the NMDevice to NMConnection. I think it will finally lead to modifications to the public API (hopefully just extensions) and it will still be constrained by current NetworkManager configuration limits.
Comment 5 Pavel Simerda 2013-07-04 23:08:51 UTC
Hi,

I ended up with a rather big patchset now (many trivial or separately reviewed patches already pushed to master). While most of the code is ready to use, please generally expect a proof of concept implementation. I'm going to revisit the code myself but I would already welcome any help and review.

The patches, apart from many cleanups, implement missing peaces of the framework to seamlessly take over connections whether or not the actual network configuration matches a known configure connection. IPv4 and IPv6 configuration is supported and proof of concept code for dynamic address support is there. The pavlix/rdisc code (bug 699772) will cooperate well with only minor changes.

What's currently missing is the implementation of individual nm-device subclasses' fill_connection() functions that would replace the current match_l2_config().

Also I expect many minor glitches which will in turn render the whole system not yet working or not yet usable. My intention is to keep the branch as long as major device types are affected.

Please look at 'pavlix/runtime' and start commenting/reviewing.
Comment 6 Dan Winship 2013-07-05 16:22:39 UTC
>    core: cleanup nm-ip[46]-config

Missed cleanups: the return type should be on its own line in nm_ip4_config_reset_routes(), nm_ip4_config_*_ptp_address(), nm_ip4_config_*_wins(), nm_ip4_config_*_mtu(), nm_ip4_config_*_mss(), and nm_ip4_config_*_nis_server*(). And the "{" should be on its own line in routes_are_duplicate().

And pretty much identically in nm-ip6-config.c



>    core: add nm_device_generate_connection()

Indentation is wrong on the g_object_set() in nm_device_generate_connection().

nm_ipX_config_to_setting() could get rid of all the "naddresses", etc, variables, and just do "if (!config) return NULL;" at the beginning, and then just do

	for (i = 0; i < nm_ip4_config_get_num_addresses(); i++) {

etc after that, so you don't have to keep doing "config ? whatever : NULL" at the beginning.


You're not setting :method on the setting, which means it won't pass nm_setting_verify().

You're also not setting NMSettingIP6Config:ip6-privacy.



>    core: match connections to connections

one of those "connections" should be "devices"

>+	nm_log_dbg (LOGD_DEVICE, "(%s): found matching connection '%s'",
>+				nm_device_get_iface (device),
>+				nm_connection_get_id (connection));

indentation

>-	nm_log_dbg (LOGD_DEVICE, "(%s): will attempt to assume existing connection",
>+	nm_log_dbg (LOGD_DEVICE, "(%s): will attempt to assume connection connection",

search-and-replace-o



>    TMP: core: avoid nm_ip[46]_config_diff

as discussed on IRC, we don't want to emit :properties-changed if no dbus-visible properties have changed.



>    core: switch nm-ip4-config's NMIP[46]Address to NMPlatformIP[46]Address

in nm-dhcp-client.c:ip4_options_to_config(), now that the gateway is not treated as part of the address, it would make more sense if the gateway handling was moved after the nm_ip4_config_add_address() call. (As it is now, it does most of the address handling, then stops to set the gateway, then finishes up the address handling.)

Similarly in nm_dhcp_dhclient_get_lease_config(), and in reverse in nm_vpn_connection_ip[46]_config_get() (move the gateway stuff to before the memset()).



>    WIP: platform: add support for address lifetimes

lots of incorrect indentation.

You should document the difference between "lifetime" and "preferred_lft". And note that they're time_t values. (Actually, you should just make them time_t rather than guint32.) Also, it's weird to abbreviate "lifetime" in one name but not the other. "pref_lifetime" would be better if you must abbreviate.

You need to update nm-linux-platform.c:init_ip[46]_address() to set the NMPlatformIP[46]Address lifetimes from the rtnl lifetimes.



>    core: detect automatic addresses when assuming connections
>    TMP: dhcp: remove *_get_lease_config() functions

Is the rule here (address has lifetime -> method = auto) always true?

Also, does the lease config file contain the client-id? If so, we could use that to initialize the corresonding NMSettingIP[46]Config property when capturing a config.



>    WIP: dhcp: convey DHCP lifetime to nm-ip[46]-config

the lease is in UTC, but mktime() will treat it as local time. You could use g_date_time_new_utc() followed by g_date_time_to_unix().
Comment 7 Pavel Simerda 2013-07-08 17:40:10 UTC
(In reply to comment #6)
> >    core: cleanup nm-ip[46]-config
> 
> Missed cleanups: the return type should be on its own line in
> nm_ip4_config_reset_routes(), nm_ip4_config_*_ptp_address(),
> nm_ip4_config_*_wins(), nm_ip4_config_*_mtu(), nm_ip4_config_*_mss(), and
> nm_ip4_config_*_nis_server*(). And the "{" should be on its own line in
> routes_are_duplicate().
> 
> And pretty much identically in nm-ip6-config.c

Actually, this code was already there. But you're right it is a good idea to extend the cleanups to the rest of the code when I already touched the majority of it (TODO).

> nm_ipX_config_to_setting() could get rid of all the "naddresses", etc,
> variables,

Those variables are there to avoid calling the *_num_* function repeatedly, which is generally a good thing, whether GArray or GSList or whatever else is behind.

> and just do "if (!config) return NULL;" at the beginning, and then
> just do

Could be also done by either decoupling initialization from declaration or moving it down in c99 style which dcbw doesn't like. Doing the former for now.

> You're not setting :method on the setting, which means it won't pass
> nm_setting_verify().

Good point.

> You're also not setting NMSettingIP6Config:ip6-privacy.

Sounds like we'll have to check the relation between IPv6 address and the MAC address. Or maybe keep something similar to lease files at least for NetworkManager. Or just treat ip6-privacy as fuzzy. TODO.

> >    core: match connections to connections
> 
> one of those "connections" should be "devices"

The whole point of the change is to match connections to connections instead of devices. Changing the message to emphasize that.

> >+	nm_log_dbg (LOGD_DEVICE, "(%s): found matching connection '%s'",
> >+				nm_device_get_iface (device),
> >+				nm_connection_get_id (connection));
> 
> indentation
> 
> >-	nm_log_dbg (LOGD_DEVICE, "(%s): will attempt to assume existing connection",
> >+	nm_log_dbg (LOGD_DEVICE, "(%s): will attempt to assume connection connection",
> 
> search-and-replace-o

Fixed.

> >    TMP: core: avoid nm_ip[46]_config_diff
> 
> as discussed on IRC, we don't want to emit :properties-changed if no
> dbus-visible properties have changed.

Yes. The TMP: prefix is there for a reason. And there's also the rest of the message.

> >    core: switch nm-ip4-config's NMIP[46]Address to NMPlatformIP[46]Address
> 
> in nm-dhcp-client.c:ip4_options_to_config(), now that the gateway is not
> treated as part of the address, it would make more sense if the gateway
> handling was moved after the nm_ip4_config_add_address() call. (As it is now,
> it does most of the address handling, then stops to set the gateway, then
> finishes up the address handling.)

It definitely makes sense. But it still might be nicer to do it in a separate patch. TODO.

> Similarly in nm_dhcp_dhclient_get_lease_config(), and in reverse in
> nm_vpn_connection_ip[46]_config_get() (move the gateway stuff to before the
> memset()).
> 
> 
> 
> >    WIP: platform: add support for address lifetimes
> 
> lots of incorrect indentation.

I feel like I'll just join the lines and let the editors cope with it. This „keep everything aligned instead of just indented“ policy seems to just distract the focus from the real work.

> You should document the difference between "lifetime" and "preferred_lft". And
> note that they're time_t values. (Actually, you should just make them time_t
> rather than guint32.)

Where did you get that information?

/usr/include/libnl3/netlink/route/addr.h:extern uint32_t rtnl_addr_get_valid_lifetime(struct rtnl_addr *);
/usr/include/libnl3/netlink/route/addr.h:extern void	rtnl_addr_set_valid_lifetime(struct rtnl_addr *, uint32_t);

But you reminded me that DHCP doesn't set the value properly. Fixed that, added a timestamp attribute and added lifetime correction to nm-platform.

> Also, it's weird to abbreviate "lifetime" in one name but
> not the other. "pref_lifetime" would be better if you must abbreviate.

Sure.
 
> You need to update nm-linux-platform.c:init_ip[46]_address() to set the
> NMPlatformIP[46]Address lifetimes from the rtnl lifetimes.

Fixed.

> >    core: detect automatic addresses when assuming connections
> >    TMP: dhcp: remove *_get_lease_config() functions
> 
> Is the rule here (address has lifetime -> method = auto) always true?

This idea comes from our meeting at Westford. Note that you can't do a strict match, as NetworkManager's configuration is rather limited and the kernel doesn't have all the information either. The existence of a lifetimed address tells you that at least one of the addresses is not static.

When assuming a configured connection, you can tweak it by the strictness of FUZZY matching (but marking attributes fuzzy is not enough!). When assuming an unconfigured connection, you are always in the best effort mode, as you are matching information from two distinct models.

So the answer is yes, for our purposes it is good enough to assume method=auto when we find a lifetimed address.

> Also, does the lease config file contain the client-id? If so, we could use
> that to initialize the corresonding NMSettingIP[46]Config property when
> capturing a config.

Not sure now. We can get into more details when we have most of the patches in master. I assume reading the lease config is needed for some things and I'm surprised that it doesn't seem to be used.

> >    WIP: dhcp: convey DHCP lifetime to nm-ip[46]-config
> 
> the lease is in UTC, but mktime() will treat it as local time. You could use
> g_date_time_new_utc() followed by g_date_time_to_unix().

Fixed.
Comment 8 Dan Winship 2013-07-08 18:34:15 UTC
> > You should document the difference between "lifetime" and "preferred_lft". And
> > note that they're time_t values. (Actually, you should just make them time_t
> > rather than guint32.)
> 
> Where did you get that information?

I'm not sure if I actually looked at the code, or if I just assumed they must be time_t's since they're too small to be monotonic time, and there's nothing else plausible that they might be. At any rate, if they represent seconds-since-00:00:00T01-01-1970Z, then *we* should store them in a time_t even if libnl/kernel doesn't. And if they aren't that kind of timestamp, then you should document what kind of timestamp they are.
Comment 9 Pavel Simerda 2013-07-09 07:55:39 UTC
(In reply to comment #8)
> I'm not sure if I actually looked at the code, or if I just assumed they must
> be time_t's since they're too small to be monotonic time, and there's nothing
> else plausible that they might be. At any rate, if they represent
> seconds-since-00:00:00T01-01-1970Z,then *we* should store them in a time_t
> even if libnl/kernel doesn't. And if they aren't that kind of timestamp, then
> you should document what kind of timestamp they are.

They are no timestamps at all, they are lifetimes which means they are allowed to exist for some period of time (and be preferred for a shorter period of time) since a reference timestamp that I added later. The lifetimes are guint32 as they are in the IETF standards with 0xffffffff meaning infinity so that we don't need to translate them back and forth.

I could possibly document them with a reference to the respective RFC.
Comment 10 Pavel Simerda 2013-07-10 20:10:41 UTC
Ready for next round of review:

(In reply to comment #7)
> Actually, this code was already there. But you're right it is a good idea to
> extend the cleanups to the rest of the code when I already touched the majority of it (TODO).

The whole nm-ip4-config and nm-ip6-config are fully cleaned up and added to 'pavlix/runtime' branch and 'pavlix/rdisc' branch.

> > You're also not setting NMSettingIP6Config:ip6-privacy.
> 
> Sounds like we'll have to check the relation between IPv6 address and the MAC
> address. Or maybe keep something similar to lease files at least for
> NetworkManager. Or just treat ip6-privacy as fuzzy. TODO.

My opinion is that the current patchset should keep ip6-privacy as default and that the default should be either UNKNOWN or DISABLED. That doesn't mean it can't be improved later.

Also, I think that with FUZZY matching UNKNOWN should match to all ip6-privacy values so that a foreign connection matches a configured one if the IPv6 address type is not recognised. There may be other ways, of course.

> > >    core: switch nm-ip4-config's NMIP[46]Address to NMPlatformIP[46]Address
> > 
> > in nm-dhcp-client.c:ip4_options_to_config(), now that the gateway is not
> > treated as part of the address, it would make more sense if the gateway
> > handling was moved after the nm_ip4_config_add_address() call. (As it is now,
> > it does most of the address handling, then stops to set the gateway, then
> > finishes up the address handling.)
> 
> It definitely makes sense. But it still might be nicer to do it in a separate
> patch. TODO.

I think that it would be best to do this together with other DHCP cleanups after this feature is pushed to master.

For the purpose of the review, you can safely ignore the TMP: patches.

There's one item that still needs more design information. The NMIP?Config objects implement a _diff() function by full comparison and a _hash() function that creates hash values that can be compared. I tend to prefer using a single implementation instead, whether it's full comparison or hashing, as the usage of those is not different in nature.

Any of those implementations can be used to select a subset of attributes to be taken into account. The full diff implementation can return diff map as it does now or it can use attributes to narrow the diff operation. The hash implementation can only use the latter, which is not an important limitation, IMO.

So in my opinion the decision depends on whether we are afraid of duplicates or not, as well as how hashing copes with lists whose ordering is not significant and whether any of those two things lead to real issues. I don't currently favor any of those options but would like to avoid having both of them unless there's a good reason.
Comment 11 Pavel Simerda 2013-07-12 17:42:28 UTC
Added first bits of reacting to IPv4/IPv6 nm-platform change events but I currently see circular signalling, so I'll need to rework this.
Comment 12 Dan Williams 2013-07-18 15:29:01 UTC
> core: add nm_device_generate_connection()

nm_device_generate_connection() - it's cleaner to do

if (!nm_device_can_assume_connection (device))
    return NULL;

and put all the other function variables at the top; you also save an indent level.
Comment 13 Dan Williams 2013-07-18 16:05:22 UTC
> TMP: core: avoid nm_ip[46]_config_diff

Actually, 'found' is reset to FALSE.  It's done from the inner for() loop initialization.

> core: switch nm-ip4-config's NMIP[46]Address to NMPlatformIP[46]Address

+addresses_are_duplicate (const NMPlatformIP4Address *a, const NMPlatformIP4Address *b)
 {
-	if (nm_ip4_address_get_address (a) != nm_ip4_address_get_address (b))
-		return FALSE;
-
-	return TRUE;
+	return a->address == b->address;
 }

Should we warn or something when you try to add the same address with a different prefix?

Also:

if (item->plen <= plen && same_prefix (item->address, network, item->plen)); <===

be careful when rebasing onto git master so you don't revert http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=7c2abb2c9f97ce74bd71f00cc3a65ad877794fee
Comment 14 Dan Williams 2013-07-18 16:47:19 UTC
> platform: add support for address lifetimes

src/platform/nm-fake-platform.c's ip6_address_add() doesn't set 'tp' to anything.

nm_platform_ip4_address_sync() - shouldn't we do some sanity checking on shift before we use it?

nm_platform_ip6_address_sync() - indentation

src/platform/nm-platform.h:

- gint64 owner;
- gint64 group;
+ time_t owner;
+ time_t group;

Huh?

src/platform/nm-platform.c:

+static gboolean
+do_ip4_address_add (char **argv)
+{
+	int ifindex = parse_ifindex (*argv++);
+	ip4_t address;
+	int plen;

newline here please, same for do_ip6_address_add().

+	if (ifindex && parse_ip4_address (*argv++, &address, &plen)) {
+		guint32 lifetime = strtol (*argv++, NULL, 10);
Comment 15 Dan Williams 2013-07-18 17:01:02 UTC
> TMP: dhcp: remove *_get_lease_config() functions

This code exists for one reason: to distinguish between multiple DHCP-enabled connections that might apply to a single device, when NM is restarted.

If you have "DHCP#1" and "DHCP#2" and both happen to apply to eth0, then when NM starts and the interface has an address 192.168.0.10, there's no way to know which connection you need to use other than (a) asking dhclient what the current values are, which we can't do because nobody builds it with OMAPI, or (b) looking in the leasefiles to find out what leases are current and what addresses they got.

Whether or not we want to retain this is another question, but we need to be aware that removing this code will cause the matching code to be incorrect in this case.

> dhcp: convey DHCP lifetime to nm-ip[46]-config

We should sanity check the time returned from DHCP.

Plus, dhclient doesn't send "expiry".  It appears to send "expire" and "dhcp_lease_time":

               dict entry(
                  string "expiry"
                  variant                      string "1374167251"
               )
               dict entry(
                  string "dhcp_lease_time"
                  variant                      string "600"
               )

it might be easier to just use dhcp_lease_time and add that to the current time than trying to parse random values that come back from the DHCP client.  We may need to try a couple values depending on how the dhcp client returns them (eg, I don't think "expiry" is a standard thing, but "dhcp_lease_time" should be a standard option).
Comment 16 Dan Williams 2013-07-19 14:51:09 UTC
Plus, dhclient doesn't send "expiry".  It appears to send "expire" and
"dhcp_lease_time":

---> by which I meant dhclient sends *expiry* but does not appear to send 'expire'.
Comment 17 Dan Williams 2013-07-19 17:47:41 UTC
With the new push, this comment still stands:

"Actually, 'found' is reset to FALSE.  It's done from the inner for() loop
initialization."

except now it's for "core: replace nm_ip[46]_config_diff()", the commit message for which states:

    As a sidenote, the original implementation was buggy. Some of the helper
    functions used the 'found' variable improperly by not resetting it in
    each step of the outer loop.

which AFAICT is actually false due to the initialization of "found" in the inner for() statement initializer.

Instead of that commit message, I would just say something like:

"Callers of these functions now only care whether two IP configs are different and not what specific property changed, so we can simplify this code down to a simple comparison for equality, based on the hashing that's already done for the DNS manager."
Comment 18 Pavel Simerda 2013-07-19 21:30:16 UTC
(In reply to comment #12)
> it might be easier to just use dhcp_lease_time and add that to the current time
> than trying to parse random values that come back from the DHCP client.

This one is also still valid. TODO.
Comment 19 Pavel Simerda 2013-07-22 11:10:22 UTC
Over the weekend, I merged some of the patches to master together with pavlix/rdisc to avoid future conflicts. Still a larger part of pavlix/runtime patches are there to polish and merge.
Comment 20 Pavel Simerda 2013-07-25 23:13:38 UTC
Created attachment 250151 [details] [review]
core: match connections to connections instead of devices

I squashed two related patches together to form the following patch. I'd be glad if you could review it, as it's the basis for the rest of the work being done in this branch.

My intention was to create a patch that would bridge the current connection assumption method with the new one. If the connection generation and comparison doesn't work 100% (which it doesn't), it's OK for now. But please look for anything that could break the current match_l2_config() method.

commit 62a34d1317fb3c908845e971f25f61c0e54fcc61
Author: Pavel Šimerda <psimerda@redhat.com>
Date:   Thu Jun 27 14:39:13 2013 +0200

    core: match connections to connections instead of devices
    
    This backwards compatible patch adds the possibility to use new
    nm_device_generate_connection() API via generate_connection() virtual
    method implementations in NMDevice subclasses.
    
    Compatibility is achieved by first trying to use the former API and
    match_l2_config() virtual method and only then moving on to
    generate_connection().
    
    The nm_device_generate_connection() wrapper calls nm_connection_verify()
    to avoid using the result of NMDevice's generate_connection() directly
    with a pleasant side effects of enforcing validity for generated
    connections.
    
    For more details see the source code comments.
    
    Please note that the generate_connection() method doesn't implement DHCP
    lease configuration matching. We mostly agreed on IRC that lease
    matching is not necessary. Even if there is a use case for that, we're
    going to fix it later.
Comment 21 Dan Williams 2013-07-28 13:49:05 UTC
- what's the reasoning for keeping the old connection matching API, just to keep things working for now until the new stuff lands?

- nm_device_generate_connection() creates a "Generic" connection, but that doesn't get overridden by the device subclasses in update_connection.  So most connections won't match.  The subclasses should remove the Generic setting and replace that with the appropriate type-specific setting, and change NM_SETTING_CONNECTION_CONNECTION_TYPE.  I know generate_connection() currently only works for software links, but we do have settings for most of those.

- I don't think we should name generated connections "eth0-foreign" for a few reasons; first, we want these connections to be first-class NM citizens, so they really aren't foreign, they're just the connection the device happens to be currently using.  Second, it's not a very human-readable name, we should be using something like "eth0 Connection" or if we ever match WiFi, just the SSID or something like that.
Comment 22 Pavel Simerda 2013-07-29 11:10:58 UTC
Just please note that I recently changed my mind about the architecture *because* I was working on live reconfiguration of NMConnection. I think you will like the new architecture which uses the update_connection() virtual function.

(In reply to comment #21)
> - what's the reasoning for keeping the old connection matching API, just to
> keep things working for now until the new stuff lands?

Exactly.

> - nm_device_generate_connection() creates a "Generic" connection, but that
> doesn't get overridden by the device subclasses in update_connection.

Ah, good. You're looking at the new update_connection() version from Saturday, which is a bit different from what we are talking about on IRC. Finally it proved more practical to step back to the original version as update_connection() can be used for both generation and live update.

> So most connections won't match.

Exactly. The patch is a step from *old API* to *dual API*, where the next step would be to switch all subclasses one by one to the new API (which consists of a class attribute connection_type and virtual function update_connection()). When all subclasses are converted, the old API can be abandoned.

> The subclasses should remove the Generic setting and
> replace that with the appropriate type-specific setting, and change
> NM_SETTING_CONNECTION_CONNECTION_TYPE.

Each subclass should implement update_connection() which you should see for ethernet, vlan, bridge and bond and each of them should also fill in klass->connection_type, which I'll fill in.

> I know generate_connection() currently
> only works for software links, but we do have settings for most of those.

Nope. It works for *software links* or *links with at least one IP address*. This is just a tool to detect unconfigured interfaces and allow their automatic configuration.

> - I don't think we should name generated connections "eth0-foreign" for a few
> reasons; first, we want these connections to be first-class NM citizens, so
> they really aren't foreign, they're just the connection the device happens to
> be currently using.

Note that this name only applies to connections that failed to match any existing connection and therefore come from a different tool than NetworkManager (unless the matching is broken, of course).

But I have chosen the name arbitrarily and feel free to choose a different one. 

> Second, it's not a very human-readable name, we should be
> using something like "eth0 Connection"

Also please see bug 693201 which I was trying to adhere to.

> or if we ever match WiFi, just the SSID or something like that.

We currently don't support connection matching for WiFi so in my code I don't support connection generation for WiFi. Nor do I have a precise idea about how that should work.

I see two possibilities.

1) A true foreign connection. The NMActiveConnection would have an attribute that specifies the connection is foreign and we just want to track it and avoid touching it. This would be good to detect non-NetworkManager routing information including the default route.

2) A connection taken over by NetworkManager. We would try to destroy the connection and kill the tools running it, then reconstruct it with NetworkManager. This would bring us eternal damnation from users of plain wpa_supplicant and other tools. Or at least they would mark the particular device *unmanaged*.

But either of them is out of scope of the work I currently want to finish.
Comment 23 Dan Winship 2013-07-30 13:46:27 UTC
assuming Pavel didn't actually mean to change this to NEEDINFO/High/critical
Comment 24 Pavel Simerda 2013-07-30 15:56:06 UTC
(In reply to comment #23)
> assuming Pavel didn't actually mean to change this to NEEDINFO/High/critical

Thanks.
Comment 25 Pavel Simerda 2013-07-30 17:27:07 UTC
Updated the branch, mostly just comment cleanups and rebasing. The current version of the branch contains just two patches. Please only review the first one as the second one is still work in progress.
Comment 26 Pavel Simerda 2013-08-13 19:54:04 UTC
Updated with fixes suggested by Thomas Haller.
Comment 27 Dan Williams 2013-08-13 21:36:21 UTC
> libnm-util: add NM_SETTING_COMPARE_FLAG_CANDIDATE flag

If it's not actually meant for public consumption, we should leave the flag out of the enum and put it as a #define into libnm-util/nm-setting-private.h which is available at build time, but doesn't get distributed as public API.  We should also bump the value to 0x80000000 too.

Maybe add a placeholder comment at the end of the enum to make sure we never make a public value for it.

> core: match connections to connections instead of devices

- in nm_device_generate_connection() lets just move the setting stuff to the top and set it after the (!generate), even though yes, that means a few extra LoC.  That way we skip an ugly indent block.

- the g_object_set() has a stray newline at the start of the argument block

- should we really print an error if we can't generate the connection?

- The connection name created in generate_connection() should be something more descriptive, like "Wired connection 1" or " something like that.  "eth0-foreign" doesn't look good when shown in a UI.  The name should be something generic and friendly that gets shown to users, and "eth0-foreign" isn't that.  I would actually make the generic code here name it "eth0 connection" and then have all the subclasses that implement update_connection() change it to something else more descriptive if they want to.

- generate_connection() also shouldn't set the type to GENERIC since it's not even adding a NMSettingGeneric.  It should just require the subclasses to always update the connection type; that way we can avoid the klass->connection_type stuff completely too.

- I think update_ip_config() should be using priv->ip4_config and priv->ip6_config (ie, the merged configs) instead of just the captured configs, just for consistency.  Also, it's not going to work like you want if the user removes addresses/routes underneath NM, since they would have been added to the Connection when the user added them, and now that the user removed them from the kernel, how do they get removed from the Connection?  One way to do this would be to diff the old priv->ext_ipX_config to the new one, and remove from the Connection what got removed from the NMIPxConfig, then add back everything from the captured config.  Or something like that.

- Also, update_ip_config() will now add static DNS servers to the connection that actually are dynamic.  If we get a DNS server from RA or DHCP, ti'll get added to the connection as a static one.  Before we start updating the live device config from runtime info, we need to add tags for DNS information too (eg auto, static, etc).  Same for search domains and just about everything else in the IP configs :(.

I think the changes to update_ip_config() should probably be a separate patch so we can work on getting all the edge-cases like this iron out; I think it's separate from the Assume stuff that nm_device_generate_connection() does and one can merged without the other.

- Can we rename get_connection() to get_existing_connection() or get_runtime_connection() or something like that to better reflect what it does?

- Some whitespace issues in get_connection() in the debug log message arguments

> core: implement update_connection() for ethernet/vlan/bond/bridge

- if it's a master interface, update_connection() needs to set NM_SETTING_BRIDGE_INTERFACE_NAME or NM_SETTING_BOND_INTERFACE_NAME or whatever to the interface name

- Need to fix the VLAN update_connection() stuff to do the right thing

- Stray newline on g_object_set() in VLAN update_connection()

- "NM_SETTING_VLAN_INTERFACE_NAME, nm_platform_link_get_mtu (ifindex)," ???

- Pull the VLAN parent interface name from priv->parent
Comment 28 Dan Williams 2013-08-13 21:45:03 UTC
Also, there should be code to ensure there aren't duplicate connection names when generating these connections.  The complete_connection() code in nm-device.c already does that.
Comment 29 Pavel Simerda 2013-08-14 18:07:46 UTC
(In reply to comment #27)
> > libnm-util: add NM_SETTING_COMPARE_FLAG_CANDIDATE flag
> 
> If it's not actually meant for public consumption, we should leave the flag out
> of the enum and put it as a #define into libnm-util/nm-setting-private.h which
> is available at build time, but doesn't get distributed as public API.  We
> should also bump the value to 0x80000000 too.
> 
> Maybe add a placeholder comment at the end of the enum to make sure we never
> make a public value for it.

Done. Though I'm not sure whether most other flags were meant as public API either.

> > core: match connections to connections instead of devices
> 
> - in nm_device_generate_connection() lets just move the setting stuff to the
> top and set it after the (!generate), even though yes, that means a few extra
> LoC.  That way we skip an ugly indent block.

Done. Not sure though whether it's less ugly, though ;).

> - the g_object_set() has a stray newline at the start of the argument block

Done.

> - should we really print an error if we can't generate the connection?

OK. I changed it to just g_return_if_fail() assertion as it is a programming error not to generate an invalid connection when update_connection() was already used.

> - The connection name created in generate_connection() should be something more
> descriptive, like "Wired connection 1" or " something like that. 
> "eth0-foreign" doesn't look good when shown in a UI.  The name should be
> something generic and friendly that gets shown to users, and "eth0-foreign"
> isn't that.  I would actually make the generic code here name it "eth0
> connection"

Done.

> and then have all the subclasses that implement update_connection()
> change it to something else more descriptive if they want to.

Sure.

> - generate_connection() also shouldn't set the type to GENERIC since it's not
> even adding a NMSettingGeneric.

Done.

> It should just require the subclasses to
> always update the connection type; that way we can avoid the
> klass->connection_type stuff completely too.

My objective was to keep subclasses as easy as possible. The connection_type class attribute helps to avoid having to modify s_con in subclasses and might be handy whenever you want to know the name of connection type belonging to NMDevice.

Also, connection type should be set once in generate_connection(), not on each update in update_connection() but generate_connection() is not a virtual function in the current version as connection_type was the only thing that needs to be different before you reminded me of bridge/bond's interface-name (but that's a redundant option anyway so we could just allow it to be NULL).

It might be also helpful to add link_type to each device and then you could simplify platform_link_added_cb() in nm-manager to just go through the list of available device classes (preferably registered with nm-manager) and find a suitable device class (use NMDeviceGeneric as the last resort).

> - I think update_ip_config() should be using priv->ip4_config and
> priv->ip6_config (ie, the merged configs) instead of just the captured configs,
> just for consistency.  Also, it's not going to work like you want if the user
> removes addresses/routes underneath NM, since they would have been added to the
> Connection when the user added them, and now that the user removed them from
> the kernel, how do they get removed from the Connection?  One way to do this
> would be to diff the old priv->ext_ipX_config to the new one, and remove from
> the Connection what got removed from the NMIPxConfig, then add back everything
> from the captured config.  Or something like that.
> 
> - Also, update_ip_config() will now add static DNS servers to the connection
> that actually are dynamic.  If we get a DNS server from RA or DHCP, ti'll get
> added to the connection as a static one.  Before we start updating the live
> device config from runtime info, we need to add tags for DNS information too
> (eg auto, static, etc).  Same for search domains and just about everything else
> in the IP configs :(.
> 
> I think the changes to update_ip_config() should probably be a separate patch
> so we can work on getting all the edge-cases like this iron out; I think it's
> separate from the Assume stuff that nm_device_generate_connection() does and
> one can merged without the other.

Split the live connection update part into a separate patch.

> - Can we rename get_connection() to get_existing_connection() or
> get_runtime_connection() or something like that to better reflect what it does?

Out of the three names above, get_connection() is the only one that fits. You would have to use get_generated_or_existing_connection() or something like that, but even that doesn't convey the real purpose.

I added some inline documentation for now.

> - Some whitespace issues in get_connection() in the debug log message arguments

Yep, the alignment breaks as usual. One of them was brought in from Thomas's fixup, the other was created by vim which re-tabs the spaces on each indentation change (haven't yet found any plugin that would support mixed tab indentation and space alignment, all other projects seem to just resign on alignment consistency).

> > core: implement update_connection() for ethernet/vlan/bond/bridge
> 
> - if it's a master interface, update_connection() needs to set
> NM_SETTING_BRIDGE_INTERFACE_NAME or NM_SETTING_BOND_INTERFACE_NAME or whatever
> to the interface name

As written above, I'm curious whether we can't just get rid of those. Even nmcli does its best to avoid those dupes. I'll fix it otherwise.

> - Need to fix the VLAN update_connection() stuff to do the right thing
> 
> - Stray newline on g_object_set() in VLAN update_connection()

Was trying to save all the work with spaced alignment.

> - "NM_SETTING_VLAN_INTERFACE_NAME, nm_platform_link_get_mtu (ifindex)," ???

Done.

> - Pull the VLAN parent interface name from priv->parent

Done.

(In reply to comment #28)
> Also, there should be code to ensure there aren't duplicate connection names
> when generating these connections.  The complete_connection() code in
> nm-device.c already does that.

Could we document the exact purpose of complete_connection()? I thought we could use it from nm_device_generate_connection() but then realized it's an abstract method and therefore cannot actually do what you say.
Comment 30 Dan Winship 2013-08-19 21:24:54 UTC
NM_SETTING_COMPARE_FLAG_CANDIDATE is not a very descriptive name. I don't have a better suggestion but figured I'd mention this in case anyone else does...

> core: match connections to connections instead of devices

>  * NMSetting's compare_property() implementations in combination with
>  NM_SETTING_COMPARE_FLAG_CANDIDATE are not yet fully ready thus rendering
>  false negatives in some cases. This can be fixed in a separate patch.

It would be better to have that patch come first, so we don't have non-working commits in the tree that might complicate future git-bisects.

Though it seems that that patch doesn't even exist yet...

>+	/* Check the connection in case of update_connection() bug. */
>+	g_return_val_if_fail (nm_connection_verify (connection, NULL), NULL);

I think it would be better to call nm_connection_verify() with a GError, and nm_log_warn() with the error message if it fails.

>+	 * The update_connection() virtual function is also used for life
>+	 * reconfiguration of the connection according to link level changes.

"live" reconfiguration


> WIP: connection live reconfig

is this patch supposed to be there?


> core: implement update_connection() for ethernet/vlan/bond/bridge

In nm-device-ethernet, you need to do something to replace the check for 802.1x / PPPoE.

Although I'm not sure if there's any way to find out if 802.1x was used... does the supplicant stay running?
Comment 31 Dan Winship 2013-08-20 14:04:36 UTC
another thing: we need to create NMSettingBridgePort settings for bridge ports
Comment 32 Pavel Simerda 2013-08-20 15:02:23 UTC
(In reply to comment #30)
> NM_SETTING_COMPARE_FLAG_CANDIDATE is not a very descriptive name. I don't have
> a better suggestion but figured I'd mention this in case anyone else does...

Sure. I spent quite some time and this one was the most descriptive one as it's for checking whether a connection is a candidate for a generated one. By the way, NM_SETTING_COMPARE_FLAG_FUZZY is not even as descriptive as this one, and also seems to be only used by NM code, so it might also benefit from a rename.

> > core: match connections to connections instead of devices
> >  * NMSetting's compare_property() implementations in combination with
> >  NM_SETTING_COMPARE_FLAG_CANDIDATE are not yet fully ready thus rendering
> >  false negatives in some cases. This can be fixed in a separate patch.
> 
> It would be better to have that patch come first, so we don't have non-working
> commits in the tree that might complicate future git-bisects.

Where first actually means *before* or *when* the device-specific patches are merged, not necessarily before this particular patch is merged. Not that it would make much sense to do git-bisect accross patches before and after a change like this one.

> Though it seems that that patch doesn't even exist yet...

Currently it looks like ethernet will need some care because of 'mtu', others are unknown. I'm still trying to set up a proper test environment for connection-connection matching.

Many false negatives will not be known until testing in real-world environments but that's about new use cases that didn't work before. Note that the worst case scenario is that a candidate connection wouldn't be recognized.

> >+	/* Check the connection in case of update_connection() bug. */
> >+	g_return_val_if_fail (nm_connection_verify (connection, NULL), NULL);
> 
> I think it would be better to call nm_connection_verify() with a GError, and
> nm_log_warn() with the error message if it fails.

I changed after dcbw's comments which were not very specific. I don't really care about that one or you can even change it afterwards. It just catches a programmer's error of constructing an invalid connection from a device subclass but that's written in the comment anyway.

> >+	 * The update_connection() virtual function is also used for life
> >+	 * reconfiguration of the connection according to link level changes.
> 
> "live" reconfiguration

Fixed.

> > WIP: connection live reconfig
> 
> is this patch supposed to be there?

Sure but it doesn't need to be reviewed, *yet*. It's a work in progress patch created according to Dan Williams' request to split out this stuff. I reworked it thoroughly but is still not complete and I'm not 100% sure about dcbw's plans regarding compositing the ip configs in nm-device.

Please fetch the current version of the branch and look at the new patch and specifically in the comments, whenever you have time for that.

> > core: implement update_connection() for ethernet/vlan/bond/bridge
> 
> In nm-device-ethernet, you need to do something to replace the check for 802.1x
> / PPPoE.

I *think* this is already solved because those connections will be considered different by nm_connection_compare().

> Although I'm not sure if there's any way to find out if 802.1x was used... does
> the supplicant stay running?

My interpretation has always been that NetworkManager MUST kill all daemons it started upon exit to allow for security updates if nothing else. AFAIK wpa_supplicant is handled correctly according to this policy.

Also, wpa_supplicant connections are currently not expected to be assumed. Therefore I consider the patchset correct in this regard.

(In reply to comment #31)
> another thing: we need to create NMSettingBridgePort settings for bridge ports

That is a new feature, right? That means we don't risc any regression pushing the patches without this feature. TODO.

******

I updated the patchset with lots of small improvements. Apart from those mentioned in the reply above, there's the interface-name fix for both bridges and bonds, lots of inline comments, and a rather standalone patch at the end to be considered.
Comment 33 Dan Winship 2013-08-20 17:11:47 UTC
(In reply to comment #32)
> Not that it
> would make much sense to do git-bisect accross patches before and after a
> change like this one.

Sure it would. There are lots of things that could potentially regress from 0.9.8 to 0.9.10, that you wouldn't know if it happened before or after this change.

At any rate, it sounds like what you're saying is that we know we'll need this flag at some point, but we don't know *exactly* what we'll need it for yet, in which case I guess it's fine to commit this way.

> > > WIP: connection live reconfig
> > 
> > is this patch supposed to be there?

> Please fetch the current version of the branch and look at the new patch and
> specifically in the comments, whenever you have time for that.

The particular thing that I'd noticed before is that in update_ip_config(), you added an NMIP4Config variable that doesn't get used, and made the NMIP6Config be auto-freed, but also left the code to manually free it. So, it seemed pretty half-baked.

> > In nm-device-ethernet, you need to do something to replace the check for 802.1x
> > / PPPoE.
> 
> I *think* this is already solved because those connections will be considered
> different by nm_connection_compare().

Ah, yes, it works for the case of connection assumption. But it's wrong for the more general automatically-generate-connections-for-devices case.

> > Although I'm not sure if there's any way to find out if 802.1x was used... does
> > the supplicant stay running?
> 
> My interpretation has always been that NetworkManager MUST kill all daemons it
> started upon exit to allow for security updates if nothing else.

Actually, I was wondering if the supplicant stays running at all (even without NM exiting), since as I understand it, in the 802.1x case, the supplicant is only needed when bringing up the connection, and not afterward, so it could just exit as soon as it does the initial auth. I may be mistaken about that.

In some cases, killing a daemon associated with a connection is guaranteed to take down that connection. (Eg, any VPN connection, or anything that uses ppp. Not sure how necessary wpa_supplicant is to a wifi connection.) So killing daemons on restart conflicts with the goal of keeping connections up.

> > another thing: we need to create NMSettingBridgePort settings for bridge ports
> 
> That is a new feature, right? That means we don't risc any regression pushing
> the patches without this feature. TODO.

With the old code, it wouldn't be able to assume the bridge connection. With the new code, it might assume it incorrectly. But this can wait I guess.

Actually, nm_device_generate_connection() needs to set NMSettingConnection:master when appropriate too, or it won't be able to assume slaves at all.
Comment 34 Pavel Simerda 2013-08-22 13:07:36 UTC
My yesterday's status didn't get here (bugzilla threw an error but I had to leave the building already), so I'll comment today.

I updated 'pavlix/runtime' with a version where nm-device-ethernet connection generation and assumption works correctly for me, which IMO makes the first three patches ready to be pushed. Please review and comment.

I'll be available later during the day.
Comment 35 Pavel Simerda 2013-08-22 16:52:56 UTC
(In reply to comment #33)
> At any rate, it sounds like what you're saying is that we know we'll need this
> flag at some point, but we don't know *exactly* what we'll need it for yet, in
> which case I guess it's fine to commit this way.

Correct. Marked as so in the commit message.

> The particular thing that I'd noticed before is that in update_ip_config(), you
> added an NMIP4Config variable that doesn't get used, and made the NMIP6Config
> be auto-freed, but also left the code to manually free it. So, it seemed pretty
> half-baked.

Sounds like a git rebase artefact then. Please ignore it for now then.

> > I *think* this is already solved because those connections will be considered
> > different by nm_connection_compare().
> 
> Ah, yes, it works for the case of connection assumption.

Good then.

> But it's wrong for the
> more general automatically-generate-connections-for-devices case.

Good to know. My position on that is that connection generation is a best effort operation anyway. Therefore the only thing we have to do is to ensure that NMDeviceEthernet knows such a connection is not assumable and therefore it must clean it up on exit. But in that case it's trivial to check.

> Actually, I was wondering if the supplicant stays running at all (even without
> NM exiting), since as I understand it, in the 802.1x case, the supplicant is
> only needed when bringing up the connection, and not afterward, so it could
> just exit as soon as it does the initial auth. I may be mistaken about that.

Good. Anyway if we just record that the device's current connection is not assumable and clean it up on exit, we're done. If you ever want to make even those connections assumable, you'd need a bit more persistency but that's another task.

> In some cases, killing a daemon associated with a connection is guaranteed to
> take down that connection.

We discussed this a lot with Jiří Pírko because of teamd. I think the best way is to kill those daemons anyway. If you really want to have connection assumption support, it should be supported by those daemons as well.

For daemons like teamd and wpa_supplicant it would mean they could leave the connection running accross a restart (or boottime take over) but the features they provide (checking the master, switching to backup, rekeying, etc.) would be temporarily suspended. For the details you have to look at the individual technologies.

> (Eg, any VPN connection,

Some of the VPNs use kernel IPsec support and could at least theoretically work accross restarts.

> or anything that uses ppp.

Other VPNs and anything on PPP are IMO the only ones whose connectivity actually depends on a userspace daemon. There are various solutions to that. We can discuss them when this issue gets more urgent.

> Not sure how necessary wpa_supplicant is to a wifi connection.) So killing
> daemons on restart conflicts with the goal of keeping connections up.

I'd like to remind you that it is *not* a goal to keep connections up when NetworkManager is stopped. Rather it is a *way* to achieve the following goals:

1) Take over connections from a NetworkManager instance initramfs. Possibly also from a simpler tool.

2) Allow NetworkManager and all other software to be restarted in order to apply security updates at runtime.

By leaving unmanaged daemons running, we explicitly throw goal #2 away (or severely limit it, which is the same in context of security). Therefore it's only useful for various use cases of #1.

At least this is how I understand it from our looong discussions.

> > > another thing: we need to create NMSettingBridgePort settings for bridge ports
> > 
> > That is a new feature, right? That means we don't risc any regression pushing
> > the patches without this feature. TODO.
> 
> With the old code, it wouldn't be able to assume the bridge connection. With
> the new code, it might assume it incorrectly.

Actually, when one of the connections has more specific information, they don't match unless those information are explicitly ignored by FUZZY/CANDIDATE. Therefore unless there's a real bug, you should only get false positives, not negatives.

That leads me to a particularly interesting idea that we could use the same methods to see whether the connection assumption *would* work. Therefore nm_device_generate_connection() together with nm_connection_compare() and the respective flags could serve another purpose.

> Actually, nm_device_generate_connection() needs to set
> NMSettingConnection:master when appropriate too, or it won't be able to assume
> slaves at all.

Yes. Like it is done for VLAN parent except that one is also wrong. The bad thing is that (1) we support choosing both connections and devices for 'master' and (2) that either way we have interconnection dependencies here which basically require checking the devices in the right order. Looks like this one will be too tricky to properly solve as part of this feature.
Comment 36 Pavel Simerda 2013-08-22 18:38:45 UTC
(In reply to comment #35)
> > The particular thing that I'd noticed before is that in update_ip_config(), you
> > added an NMIP4Config variable that doesn't get used, and made the NMIP6Config
> > be auto-freed, but also left the code to manually free it. So, it seemed pretty
> > half-baked.
> 
> Sounds like a git rebase artefact then. Please ignore it for now then.

Actually the current version is worth looking at, especially the FIXME comments. That's about why we're waiting for Dan Williams.
Comment 37 Pavel Simerda 2013-08-23 13:14:24 UTC
Status update:

* nm_device_generate_connection() is now implemented in git master
* nm-manager uses it for connection-to-connection matching for all NMDevice subclasses that implement update_connection()
* NMDeviceEtheret implements update_connection()

Therefore new-style connection assumption is now testable for Ethernet connections. I've found a couple of related and unrelated bugs that I will file separately.
Comment 38 Pavel Simerda 2013-08-23 22:00:17 UTC
Updated branch 'pavlix/runtime':

* Bridges now properly generate connections

Please review and test. Connection matching will have to be improved using the CANDIDATE flag to ignore unset properties in candidate connections.
Comment 39 Pavel Simerda 2013-08-27 21:13:56 UTC
Updated branch 'pavlix/runtime'. Please review all non-WIP non-TMP patches.
Comment 40 Pavel Simerda 2013-08-28 11:57:28 UTC
Updated branch 'pavlix/runtime':

* Bonds (as well as bridges from previous patches) now assume correctly for my tests.
* VLANS are untested but should work.
Comment 41 Dan Winship 2013-08-28 12:27:01 UTC
comments on some but not all of the patches:


> trivial: add some comments to nm-device's link-changed handlers

sure


> settings: enforce s_ip4 and s_ip6 for eligible connections

Definitely definitely agree with the idea. But I think we should do this in NMConnection itself. Then client-side code would always be assured of the presence of both s_ip4 and s_ip6 as well.

Also, we should now get rid of all of the places that check "if (!s_ip4)", etc, too. (At least file a bug for that.)

>+	/* I'm using IGNORE here on dcbw's request temporarily as we don't have

I don't think we need the comment.


> core: assert non-NULL reason for act_stage1_prepare() in NMDevice

This only works if the subclass calls the parent implementation (which some of them don't) and if they call it before looking at "reason" themselves. The standard way to do this is put the sanity checks in the wrapper method. But in this case, nm_device_activate_stage1_device_prepare() is providing the reason arg itself, so it's always guaranteed to be non-NULL. So I don't think this patch is useful, and we should just remove all "reason != NULL" checks from all the act_stage1_prepare implementations instead.


The bridge and bond patches look good. The vlan patch looks pretty incomplete.
Comment 42 Pavel Simerda 2013-08-28 14:23:30 UTC
(In reply to comment #41)
> > settings: enforce s_ip4 and s_ip6 for eligible connections
> 
> Definitely definitely agree with the idea. But I think we should do this in
> NMConnection itself. Then client-side code would always be assured of the
> presence of both s_ip4 and s_ip6 as well.

I suggest using my patch until you decide what's the best way to go.

> Also, we should now get rid of all of the places that check "if (!s_ip4)", etc,
> too. (At least file a bug for that.)

One of the WIP/TMP patches is an example of that (covering nm-device.c).

> >+	/* I'm using IGNORE here on dcbw's request temporarily as we don't have
> 
> I don't think we need the comment.

I turned it into a FIXME command. It's important because using IGNORE there is just an ugly hack until the noted bug is resolved. Let me know if you prefer different wording, though.

> > core: assert non-NULL reason for act_stage1_prepare() in NMDevice
> 
> This only works if the subclass calls the parent implementation (which some of
> them don't) and if they call it before looking at "reason" themselves. The
> standard way to do this is put the sanity checks in the wrapper method. But in
> this case, nm_device_activate_stage1_device_prepare() is providing the reason
> arg itself, so it's always guaranteed to be non-NULL. So I don't think this
> patch is useful, and we should just remove all "reason != NULL" checks from all
> the act_stage1_prepare implementations instead.

Thanks for pointing it out. Removed the patch.

> The bridge and bond patches look good.

Thanks.

> The vlan patch looks pretty incomplete.

Indeed. I forgot about the parent searching problem.
Comment 43 Pavel Simerda 2013-08-28 15:23:13 UTC
Updated VLAN patch in 'pavlix/runtime'. Ready for review.
Comment 44 Dan Williams 2013-08-29 22:35:03 UTC
Ok, tested with a basic static IP ethernet connection...

1) The generated NMIP4Config doesn't read the gateway, thus won't match the existing connection, but also won't be correct or get marked 'default'.  I guess we should update nm_ip[4|6]_config_capture() to read the default route and if the default route is for this device, set the gateway in the NMIP4Config.  See dcbw/matching-fixes; it's an RFC showing my questions and not for merging.

2) The generated IP4Config address has a lifetime of less than infinity, and thus nm_ip4_config_update_setting() thinks the IPv4 method is 'auto' not static.  This seems to be due to nm_platform_ip[4|6]_address_sync() attempting to shift 'infinity' address lifetimes, when I don't think infinity address lifetimes should be touched at all?  Can you re-verify that the shifting does what we want it to?  Otherwise I suppose we could modify the nm_ip[4|6]_config_update_setting() to assume method = static if the lifetime is within some range of FFFFFFFF, but that would mis-handle RA-provided addresses/routes where teh lifetime was infinity too.

This is a general problem in the code, not specifically related to pavlix/runtime, but using pavlix/runtime causes the problem to show up.

3) And the DNS problem...  Obviously  the DNS info will only be in /etc/resolv.conf and of course, is combined, and thus we have no idea what devices it applies to.

What should probably happen is to read the DNS info out of, add it to the generated connection, then attempt to match saved connections.  If that fails, strip out the DNS info then re-match.  This would at least ensure that we found the right connection between two connections with different DNS information.

4) can we use IGNORE for the IPv6 method for now instead of LINK_LOCAL?  When we do combine the two then we can just use the result, but for now most IPv6 connections saved on disk will be using IGNORE, not LINK_LOCAL.

That's all for now... THanks!
Comment 45 Pavel Simerda 2013-08-30 08:03:45 UTC
(In reply to comment #44)
> Ok, tested with a basic static IP ethernet connection...
> 
> 1) The generated NMIP4Config doesn't read the gateway, thus won't match the
> existing connection, but also won't be correct or get marked 'default'.  I
> guess we should update nm_ip[4|6]_config_capture() to read the default route
> and if the default route is for this device, set the gateway in the
> NMIP4Config.  See dcbw/matching-fixes; it's an RFC showing my questions and not
> for merging.

As the gateway is a result of nm-policy decision, not only application of the NMConnection, there are two possible approaches:

a) As you suggest, read the default gateway from the device and use it for matching. It can only help matching of a connection that actually has a default route.

b) Ignore the gateway when matching. Treat other options as sufficient for matching. When multiple connections are matched, use the most recently used one.

> 2) The generated IP4Config address has a lifetime of less than infinity, and
> thus nm_ip4_config_update_setting() thinks the IPv4 method is 'auto' not
> static.  This seems to be due to nm_platform_ip[4|6]_address_sync() attempting
> to shift 'infinity' address lifetimes, when I don't think infinity address
> lifetimes should be touched at all?

There's a bug. So far I've only seen it with IPv6 and I have it in my notes. For IPv6, it's reproducable immediately upon connection assumption during NetworkManager start.

> Can you re-verify that the shifting does
> what we want it to?  Otherwise I suppose we could modify the
> nm_ip[4|6]_config_update_setting() to assume method = static if the lifetime is
> within some range of FFFFFFFF,

I'd prefer to fix the bug above.

> but that would mis-handle RA-provided
> addresses/routes where teh lifetime was infinity too.

That is mis-handled already. Infinity value for autoconf will be treated as static anyway. There's currently nothing we can do about it except an ugly hack of using a high finite value as a flag meaning 'auto'.

> This is a general problem in the code, not specifically related to
> pavlix/runtime, but using pavlix/runtime causes the problem to show up.
> 
> 3) And the DNS problem...  Obviously  the DNS info will only be in
> /etc/resolv.conf and of course, is combined, and thus we have no idea what
> devices it applies to.

I'd prefer not trying to magically reinterpret /etc/resolv.conf. Until we have proper DNS read/write support (whether with dnsmasq, unbound or whatever else), I would prefer to ignore DNS information in the matching.

> What should probably happen is to read the DNS info out of, add it to the
> generated connection, then attempt to match saved connections.  If that fails,
> strip out the DNS info then re-match.

Sounds like added complexity without solving the problem.

> This would at least ensure that we found
> the right connection between two connections with different DNS information.

Until DNS is read properly from some split DNS capable tool, I suggest to ignore DNS entirely and pick the most recently used connection when multiple connections match.

> 4) can we use IGNORE for the IPv6 method for now instead of LINK_LOCAL?

Actually I still prefer LINK_LOCAL as you already said that it's just another name for DISABLED or DISABLED itself. IGNORE is a hack and using it for anything else than its original purpose will only confuse everyone even more.

> When
> we do combine the two then we can just use the result, but for now most IPv6
> connections saved on disk will be using IGNORE,

I guess most of the connections now use AUTO unless the user/administrator specified otherwise. If he wrongly chose IGNORE because he thought it means DISABLE, that's a problem but using IGNORE in the wrong sense in part of the code and in the orignal sense in the rest of the code will only bring more problems and more confusion.

If you do insist on treating IGNORE as DISABLED, I'm *not* against. But I would like to start using DISABLED at that time and I would like to use DISABLED for *all* connections without global IPv6 addresses.

> not LINK_LOCAL.
> 
> That's all for now... THanks!

Thanks, please comment.
Comment 46 Pavel Simerda 2013-08-30 09:08:15 UTC
(In reply to comment #45)
> (In reply to comment #44)
> > 2) The generated IP4Config address has a lifetime of less than infinity, and
> > thus nm_ip4_config_update_setting() thinks the IPv4 method is 'auto' not
> > static.  This seems to be due to nm_platform_ip[4|6]_address_sync() attempting
> > to shift 'infinity' address lifetimes, when I don't think infinity address
> > lifetimes should be touched at all?
> 
> There's a bug. So far I've only seen it with IPv6 and I have it in my notes.
> For IPv6, it's reproducable immediately upon connection assumption during
> NetworkManager start.

The bug is fairly trivial. See the fix provided in pavlix/runtime:

c922bee platform: fix and simplify address lifetime compensation

Also updated 'pavlix/runtime' according to:

commit ff89e98fac6b7ffdcc20798aaef710e340ab0ef0
Author: Colin Walters <walters@verbum.org>
Date:   Thu Aug 29 15:37:57 2013 -0400

    trivial: consistently #include "libgsystem.h" rather than just gsystem-local-alloc.h
    
    libgsystem contains more than just the local allocation macros; in the
    future we will likely want to make use of some of this such as the
    structured logging support.

Merged the two trivial patches to git master.
Comment 47 Pavel Simerda 2013-08-30 11:20:29 UTC
Updated with Thomas Haller's feedback:

diff --git a/src/devices/nm-device-bond.c b/src/devices/nm-device-bond.c
index 396fc2f..b42c48c 100644
--- a/src/devices/nm-device-bond.c
+++ b/src/devices/nm-device-bond.c
@@ -228,8 +228,8 @@ update_connection (NMDevice *device, NMConnection *connection)
 	const Option *option;
 
 	if (!s_bond) {
-		s_bond = (NMSettingBond *) nm_setting_bond_new ();
-		nm_connection_add_setting (connection, (NMSetting *) s_bond);
+		s_bond = NM_SETTING_BOND (nm_setting_bond_new ());
+		nm_connection_add_setting (connection, NM_SETTING (s_bond));
 	}
 
 	/* FIXME: This is a one-time setting and doesn't need to be updated. */
@@ -240,16 +240,18 @@ update_connection (NMDevice *device, NMConnection *connection)
 			gs_free char *value = nm_platform_master_get_option (ifindex, option->name);
 			char *space;
 
-			/* FIXME: This could be handled in nm-platform. */
-			space = strchr (value, ' ');
-			if (space)
-				*space = '\0';
-
-			if (option->check && !option->check (value))
-				continue;
+			if (value) {
+				/* FIXME: This could be handled in nm-platform. */
+				space = strchr (value, ' ');
+				if (space)
+					*space = '\0';
 
-			if (value)
-				nm_setting_bond_add_option (s_bond, option->name, value);
+				if (!option->check || option->check (value)) {
+					nm_setting_bond_add_option (s_bond, option->name, value);
+					continue;
+				}
+			}
+			nm_setting_bond_remove_option (s_bond, option->name);
 		}
 	}
 }
diff --git a/src/devices/nm-device-bridge.c b/src/devices/nm-device-bridge.c
index 2961373..3dc2367 100644
--- a/src/devices/nm-device-bridge.c
+++ b/src/devices/nm-device-bridge.c
@@ -275,8 +275,8 @@ update_connection (NMDevice *device, NMConnection *connection)
 	const Option *option;
 
 	if (!s_bridge) {
-		s_bridge = (NMSettingBridge *) nm_setting_bridge_new ();
-		nm_connection_add_setting (connection, (NMSetting *) s_bridge);
+		s_bridge = NM_SETTING_BRIDGE (nm_setting_bridge_new ());
+		nm_connection_add_setting (connection, NM_SETTING (s_bridge));
 	}
 
 	/* FIXME: This is a one-time setting and doesn't need to be updated. */
@@ -284,11 +284,15 @@ update_connection (NMDevice *device, NMConnection *connection)
 
 	for (option = master_options; option->name; option++) {
 		gs_free char *str = nm_platform_master_get_option (ifindex, option->sysname);
-		int value = strtol (str, NULL, 10);
+		int value = 0;
 
-		/* See comments in set_sysfs_uint() about centiseconds. */
-		if (option->user_hz_compensate)
-			value /= 100;
+		if (str) {
+			value = strtol (str, NULL, 10);
+
+			/* See comments in set_sysfs_uint() about centiseconds. */
+			if (option->user_hz_compensate)
+				value /= 100;
+		}
 
 		g_object_set (s_bridge, option->name, value, NULL);
 	}
diff --git a/src/devices/nm-device-vlan.c b/src/devices/nm-device-vlan.c
index a183181..aa195a5 100644
--- a/src/devices/nm-device-vlan.c
+++ b/src/devices/nm-device-vlan.c
@@ -272,8 +272,8 @@ update_connection (NMDevice *device, NMConnection *connection)
 	int parent_ifindex;
 
 	if (!s_vlan) {
-		s_vlan = (NMSettingVlan *) nm_setting_vlan_new ();
-		nm_connection_add_setting (connection, (NMSetting *) s_vlan);
+		s_vlan = NM_SETTING_VLAN (nm_setting_vlan_new ());
+		nm_connection_add_setting (connection, NM_SETTING (s_vlan));
 	}
 
 	nm_platform_vlan_get_info (ifindex, &parent_ifindex, &priv->vlan_id);
@@ -283,8 +283,10 @@ update_connection (NMDevice *device, NMConnection *connection)
 	for (list = nm_manager_get_devices (nm_manager_get ()); list ; list = g_slist_next (list)) {
 		NMDevice *item = NM_DEVICE (list->data);
 
-		if (nm_device_get_ifindex (device) == parent_ifindex)
+		if (nm_device_get_ifindex (device) == parent_ifindex) {
 			nm_device_vlan_set_parent (NM_DEVICE_VLAN (device), item);
+			break;
+		}
 	}
 
 	/* FIXME: VLAN parents can be specified by connection uuids as well as well

The bonding part should be tested with external changes, though, as the bonding setting interface is a bit tricky.
Comment 48 Dan Williams 2013-08-30 15:38:48 UTC
(In reply to comment #46)
> (In reply to comment #45)
> > (In reply to comment #44)
> > > 2) The generated IP4Config address has a lifetime of less than infinity, and
> > > thus nm_ip4_config_update_setting() thinks the IPv4 method is 'auto' not
> > > static.  This seems to be due to nm_platform_ip[4|6]_address_sync() attempting
> > > to shift 'infinity' address lifetimes, when I don't think infinity address
> > > lifetimes should be touched at all?
> > 
> > There's a bug. So far I've only seen it with IPv6 and I have it in my notes.
> > For IPv6, it's reproducable immediately upon connection assumption during
> > NetworkManager start.
> 
> The bug is fairly trivial. See the fix provided in pavlix/runtime:
> 
> c922bee platform: fix and simplify address lifetime compensation

Thanks, fix looks right and works for IPv4 too.  Except I think you want to change substract -> subtract.  With that change, please merge that patch to master.
Comment 49 Dan Williams 2013-08-30 15:50:19 UTC
(In reply to comment #45)
> (In reply to comment #44)
> > Ok, tested with a basic static IP ethernet connection...
> > 
> > 1) The generated NMIP4Config doesn't read the gateway, thus won't match the
> > existing connection, but also won't be correct or get marked 'default'.  I
> > guess we should update nm_ip[4|6]_config_capture() to read the default route
> > and if the default route is for this device, set the gateway in the
> > NMIP4Config.  See dcbw/matching-fixes; it's an RFC showing my questions and not
> > for merging.
> 
> As the gateway is a result of nm-policy decision, not only application of the
> NMConnection, there are two possible approaches:
> 
> a) As you suggest, read the default gateway from the device and use it for
> matching. It can only help matching of a connection that actually has a default
> route.
> 
> b) Ignore the gateway when matching. Treat other options as sufficient for
> matching. When multiple connections are matched, use the most recently used
> one.

I would go for (a); the default route tells us what the user/administrator/etc intent was, and I think NM should honor that in the connection details.  Since it's fairly easy to detect, we can use it to determine what the "default" interface is through the D_Bus interface too, to ensure that nmcli, GUI tools, and remote management/status interfaces show the right status.  I pushed a commit to dcbw/matching-fixes for IPv4, but the corresponding IPv6 code depends on me cleaning up and merging dcbw/ip6-change.

> > 2) The generated IP4Config address has a lifetime of less than infinity, and
> > thus nm_ip4_config_update_setting() thinks the IPv4 method is 'auto' not
> > static.  This seems to be due to nm_platform_ip[4|6]_address_sync() attempting
> > to shift 'infinity' address lifetimes, when I don't think infinity address
> > lifetimes should be touched at all?
> 
> There's a bug. So far I've only seen it with IPv6 and I have it in my notes.
> For IPv6, it's reproducable immediately upon connection assumption during
> NetworkManager start.

You've fixed this one as noted already, thanks!  Only change it needs is the substract -> subtract and then it's good to merge to master.

> > Can you re-verify that the shifting does
> > what we want it to?  Otherwise I suppose we could modify the
> > nm_ip[4|6]_config_update_setting() to assume method = static if the lifetime is
> > within some range of FFFFFFFF,
> 
> I'd prefer to fix the bug above.

Yeah, agreed.  I simply didn't know what your recommended course of action was here.

> > but that would mis-handle RA-provided
> > addresses/routes where teh lifetime was infinity too.
> 
> That is mis-handled already. Infinity value for autoconf will be treated as
> static anyway. There's currently nothing we can do about it except an ugly hack
> of using a high finite value as a flag meaning 'auto'.
> 
> > This is a general problem in the code, not specifically related to
> > pavlix/runtime, but using pavlix/runtime causes the problem to show up.
> > 
> > 3) And the DNS problem...  Obviously  the DNS info will only be in
> > /etc/resolv.conf and of course, is combined, and thus we have no idea what
> > devices it applies to.
> 
> I'd prefer not trying to magically reinterpret /etc/resolv.conf. Until we have
> proper DNS read/write support (whether with dnsmasq, unbound or whatever else),
> I would prefer to ignore DNS information in the matching.

Ok, this does mean that if some other connection is started, either automatically, or by the administrator, that it will overwrite whatever information is in /etc/resolv.conf because NM has no idea where the previous DNS information came from.  I'm not sure we can rely on caching nameservers like dnsmasq or unbound here, because we cannot force their use on people, and things would be pretty broken without one.  I still think we need a solution here, at least one which has acceptable fallback.

One other thought: as we were talking about before, some kind of "external DNS config" that would read the existing values from resolv.conf and make the higher priority?  Then add any additional interface DNS information below these.  If the something external modified /etc/resolv.conf then NM would update the external config.

> > What should probably happen is to read the DNS info out of, add it to the
> > generated connection, then attempt to match saved connections.  If that fails,
> > strip out the DNS info then re-match.
> 
> Sounds like added complexity without solving the problem.
> 
> > This would at least ensure that we found
> > the right connection between two connections with different DNS information.
> 
> Until DNS is read properly from some split DNS capable tool, I suggest to
> ignore DNS entirely and pick the most recently used connection when multiple
> connections match.

Yes, the problem here is that the matching code doesn't account for missing DNS information.  So given even a saved static IP connection with some DNS infromation (which would be common) then matching will fail.  One approach is to make the CANDIDATE flag ignore DNS info.

> > 4) can we use IGNORE for the IPv6 method for now instead of LINK_LOCAL?
> 
> Actually I still prefer LINK_LOCAL as you already said that it's just another
> name for DISABLED or DISABLED itself. IGNORE is a hack and using it for
> anything else than its original purpose will only confuse everyone even more.
> 
> > When
> > we do combine the two then we can just use the result, but for now most IPv6
> > connections saved on disk will be using IGNORE,
> 
> I guess most of the connections now use AUTO unless the user/administrator
> specified otherwise. If he wrongly chose IGNORE because he thought it means
> DISABLE, that's a problem but using IGNORE in the wrong sense in part of the
> code and in the orignal sense in the rest of the code will only bring more
> problems and more confusion.
> 
> If you do insist on treating IGNORE as DISABLED, I'm *not* against. But I would
> like to start using DISABLED at that time and I would like to use DISABLED for
> *all* connections without global IPv6 addresses.

I'm not against cleaning up the methods as we've discussed before.  But I"d like to merge the runtime stuff *before* we do that reworking, and thus I don't want the runtime code to depend on whatever future work is there.  We have two choices here:

1) switch the update_setting code over to IGNORE like I originally suggested

2) change the matching code to treat IGNORE and LINK_LOCAL the same when the CANDIDATE flag is passed

I don't really care what we do as long as the matching code works.
Comment 50 Thomas Haller 2013-08-30 16:40:14 UTC
Created attachment 253634 [details] [review]
patch for "platform: fix and simplify address lifetime compensation"
Comment 51 Thomas Haller 2013-08-30 16:41:31 UTC
(In reply to comment #50)

I suggests the attached patch for commit "platform: fix and simplify address lifetime compensation"

Reason: substract_guint32 both receives "now" and "known_address->lifetime" as argument "a". The special treatment for a==MAX_UINT32 makes only sense in the second case.
Comment 52 Dan Williams 2013-08-30 17:18:40 UTC
Also, it appears that for my testing assumption doesn't work, because while the connection is available, in nm-manager.c::add_device() checks:

if (connection && nm_device_can_assume (device, connection))

but nm_device_can_assume() only returns TRUE for devices in the UNAVAILABLE state if they are not set to ignore_carrier:

	} else if (priv->state < NM_DEVICE_STATE_DISCONNECTED) {
		if (priv->state != NM_DEVICE_STATE_UNAVAILABLE || priv->carrier || !priv->ignore_carrier)
			return FALSE;

in my case, priv->state == NM_DEVICE_STATE_UNAVAILABLE, priv->carrier = TRUE, and priv->ignore_carrier = FALSE, so this function returns FALSE and the connection is not assumed.

We could change the add_device() check to simply:

if (connection && nm_device_get_state (device) == NM_DEVICE_STATE_UNAVAILABLE)

and then assumption works.  However, should this assume connections for devices that are default_unmanaged?  Pavel says "no" on IRC, I think NM should assume them since assumption should be a NOP.
Comment 53 Dan Williams 2013-08-30 17:24:15 UTC
Does nm-manager.c::get_connection() need the 'existing' argument?  Doesn't seem to be used anywhere.
Comment 54 Dan Williams 2013-08-30 17:30:08 UTC
Also, if the assumed connection is generated, then add_device() needs to use the cloned NMSettingsConnection object returned from nm_settings_add_connection() instead of the generated connection, otherwise lots of scary warnings + abort because every activated connection must be an NMSettingsConnection.  We hashed this out on IRC already, just adding here for the record.
Comment 55 Dan Williams 2013-08-30 17:40:12 UTC
Next up, for an IPv4-only assumed connection, I get:

NetworkManager[23529]: <info> (p4p1): device state change: ip-config -> ip-check (reason 'none') [70 80 0]
NetworkManager[23529]: <info> Activation (p4p1) Stage 5 of 5 (IPv4 Commit) complete.
NetworkManager[23529]: <error> [1377883611.834177] [rdisc/nm-lndp-rdisc.c:184] send_rs(): (p4p1): cannot send router solicitation: -101.
NetworkManager[23529]: <info> (p4p1): device state change: ip-check -> secondaries (reason 'none') [80 90 0]
NetworkManager[23529]: <info> (p4p1): device state change: secondaries -> activated (reason 'none') [90 100 0]
NetworkManager[23529]: <info> NetworkManager state is now CONNECTED_GLOBAL
NetworkManager[23529]: <info> Activation (p4p1) successful, device activated.
NetworkManager[23529]: <info> startup complete
NetworkManager[23529]: <error> [1377883622.331002] [rdisc/nm-lndp-rdisc.c:184] send_rs(): (p4p1): cannot send router solicitation: -101.
NetworkManager[23529]: <error> [1377883632.335168] [rdisc/nm-lndp-rdisc.c:184] send_rs(): (p4p1): cannot send router solicitation: -101.
<repeats forever>


$ nmcli con show active
NAME  UUID                                  DEVICES  DEFAULT  VPN  MASTER-PATH 
p4p1  48a3c968-68b0-47ad-9e4d-1a954389f488  p4p1     no       no   --          

$ ip route
default via 192.168.1.1 dev p4p1  proto static  metric 1024 
192.168.1.0/24 dev p4p1  proto kernel  scope link  src 192.168.1.20 
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 

$ ip -6 route
fe80::/64 dev br0  proto kernel  metric 256 

$ ip addr
2: p4p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:21:70:be:10:23 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.20/24 scope global p4p1
       valid_lft forever preferred_lft forever

$ nmcli con show config 48a3cxxx
ipv4.method:                            manual
ipv4.dns:                               
ipv4.dns-search:                        
ipv4.addresses:                         { ip = 192.168.1.20/24, gw = 0.0.0.0 }
ipv4.routes:                            
ipv4.ignore-auto-routes:                no
ipv4.ignore-auto-dns:                   no
ipv4.dhcp-client-id:                    --
ipv4.dhcp-send-hostname:                yes
ipv4.dhcp-hostname:                     --
ipv4.never-default:                     no
ipv4.may-fail:                          yes
ipv6.method:                            link-local
ipv6.dns:                               
ipv6.dns-search:                        
ipv6.addresses:                         
ipv6.routes:                            
ipv6.ignore-auto-routes:                no
ipv6.ignore-auto-dns:                   no
ipv6.never-default:                     no
ipv6.may-fail:                          yes
ipv6.ip6-privacy:                       -1 (unknown)
ipv6.dhcp-hostname:                     --

Note that no /proc/sys/net/ipv6/conf/p4p1 exists, due to a kernel bug I think.

Presumably this is because the interface has no link-local address, because NM assumed the connection and thus does not change IP address details at all, and the interface has no IPv6 address...  Are we mishandling the link-local method somewhere in the nm-device.c?  When LINK_LOCAL is used, NM shouldn't be running and of the libndp stuff right?
Comment 56 Pavel Simerda 2013-08-30 20:05:17 UTC
(In reply to comment #49)
> (In reply to comment #45)
> > (In reply to comment #44)
> > As the gateway is a result of nm-policy decision, not only application of the
> > NMConnection, there are two possible approaches:
> > 
> > a) As you suggest, read the default gateway from the device and use it for
> > matching. It can only help matching of a connection that actually has a default
> > route.
> > 
> > b) Ignore the gateway when matching. Treat other options as sufficient for
> > matching. When multiple connections are matched, use the most recently used
> > one.
> 
> I would go for (a); the default route tells us what the user/administrator/etc
> intent was, and I think NM should honor that in the connection details.  Since
> it's fairly easy to detect, we can use it to determine what the "default"
> interface is through the D_Bus interface too, to ensure that nmcli, GUI tools,
> and remote management/status interfaces show the right status.  I pushed a
> commit to dcbw/matching-fixes for IPv4, but the corresponding IPv6 code depends
> on me cleaning up and merging dcbw/ip6-change.

Sounds reasonable but...

While the detection of default gateway on an interface is fairly easy, we'll have to make sure the matching code, notably nm_connection_compare() with the CANDIDATE flag only matches the default gateway under specific circumstances. My opinion is that it should only be checked when both candidate and generated connections have gateway set.

While it's possible to solve this, I don't see how it can be useful in the context of connection matching. I don't think there's a real use case with the following characteristics:

1) Two configured static connections A and B identical except the gateway.

2) A manually set when NetworkManager is not running.

3) B has been used more recently than A.

4) Administrator requires A to be assumed (ruling out B based on the gateway address).

And I don't see any other use case where the gateway would matter.

> > > 3) And the DNS problem...
> I'm not sure we can rely on caching nameservers
> like dnsmasq or unbound here, because we cannot force their use on people, and
> things would be pretty broken without one.  I still think we need a solution
> here, at least one which has acceptable fallback.

The only acceptable fallback possible is to cache the settings outside NetworkManager (on harddisk or ramdisk) but that will only work for take over between NetworkManager instances. We agreed on IRC to just ignore DNS for now, though.

> > Until DNS is read properly from some split DNS capable tool, I suggest to
> > ignore DNS entirely and pick the most recently used connection when multiple
> > connections match.
> 
> Yes, the problem here is that the matching code doesn't account for missing DNS
> information.  So given even a saved static IP connection with some DNS
> infromation (which would be common) then matching will fail.  One approach is
> to make the CANDIDATE flag ignore DNS info.

+1

That's where CANDIDATE flag is useful.

> I'm not against cleaning up the methods as we've discussed before.  But I"d
> like to merge the runtime stuff *before* we do that reworking, and thus I don't
> want the runtime code to depend on whatever future work is there.  We have two
> choices here:
> 
> 1) switch the update_setting code over to IGNORE like I originally suggested

Would switching to LINK_LOCAL instead be a problem? In fact what you suggest would make a LINK_LOCAL connection detected as IGNORE which is not very good.

> 2) change the matching code to treat IGNORE and LINK_LOCAL the same when the
> CANDIDATE flag is passed

This one sounds better to me because...

While I think it won't work well as you suggest, as IGNORE connections will be most often detected as AUTO connections at the next start. In my thought model, the following connections when left over and assumed would be...

[configured -> kernel -> generated]
IGNORE -> dynamic address(es) -> AUTO
AUTO -> dynamic address(es) -> AUTO
DHCP -> dynamic address(es) -> AUTO
LINK_LOCAL -> no address -> LINK_LOCAL (IGNORE according to your #1)
MANUAL -> static address(es) -> MANUAL
SHARED -> static address(es) -> MANUAL

Therefore we need nm_connection_compare(flags=CANDIDATE) to match:

[generated -> configured]
LINK_LOCAL -> LINK_LOCAL (not sure why you also want IGNORE)
MANUAL -> MANUAL, SHARED
AUTO -> IGNORE, AUTO, DHCP

Note: I'm only taking fully activated connections into account.

(In reply to comment #52)
> Also, it appears that for my testing assumption doesn't work, because while the
> connection is available, in nm-manager.c::add_device() checks:

Isolated a fix from your larger patch, it's in pavlix/runtime now.

(In reply to comment #53)
> Does nm-manager.c::get_connection() need the 'existing' argument?  Doesn't seem
> to be used anywhere.

Fixed, inline documentation updated.

(In reply to comment #54)
> Also, if the assumed connection is generated, then add_device() needs to use
> the cloned NMSettingsConnection object returned from
> nm_settings_add_connection() instead of the generated connection, otherwise
> lots of scary warnings + abort because every activated connection must be an
> NMSettingsConnection.  We hashed this out on IRC already, just adding here for
> the record.

Fixed.

(In reply to comment #55)
> Next up, for an IPv4-only assumed connection, I get:
> 
> NetworkManager[23529]: <info> (p4p1): device state change: ip-config ->
> ip-check (reason 'none') [70 80 0]
> NetworkManager[23529]: <info> Activation (p4p1) Stage 5 of 5 (IPv4 Commit)
> complete.
> NetworkManager[23529]: <error> [1377883611.834177] [rdisc/nm-lndp-rdisc.c:184]
> send_rs(): (p4p1): cannot send router solicitation: -101.
> NetworkManager[23529]: <info> (p4p1): device state change: ip-check ->
> secondaries (reason 'none') [80 90 0]
> NetworkManager[23529]: <info> (p4p1): device state change: secondaries ->
> activated (reason 'none') [90 100 0]
> NetworkManager[23529]: <info> NetworkManager state is now CONNECTED_GLOBAL
> NetworkManager[23529]: <info> Activation (p4p1) successful, device activated.
> NetworkManager[23529]: <info> startup complete
> NetworkManager[23529]: <error> [1377883622.331002] [rdisc/nm-lndp-rdisc.c:184]
> send_rs(): (p4p1): cannot send router solicitation: -101.
> NetworkManager[23529]: <error> [1377883632.335168] [rdisc/nm-lndp-rdisc.c:184]
> send_rs(): (p4p1): cannot send router solicitation: -101.
> <repeats forever>

That's a bug I already had in my notes from rewriting the nm-device to use the guaranteed s_ip4 and s_ip6. See:

    if (   strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_AUTO) == 0
        || strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_LINK_LOCAL) == 0) {
        if (!addrconf6_start (self)) {

We're running addrconf6_start() when method is LINK_LOCAL for some reason.

> When LINK_LOCAL is used, NM shouldn't be running
> and of the libndp stuff right?

That's correct. When LINK_LOCAL is used, accept_ra should be set to 0 just as with AUTO, DHCP, MANUAL and SHARED and as opposed to IGNORE method and UNMANAGED/foreign state. In the case of LINK_LOCAL, no other action should be ever performed (except optionally we could wait for the link-local address to become ready).
Comment 57 Pavel Simerda 2013-08-30 20:21:57 UTC
(In reply to comment #50)
> Created an attachment (id=253634) [details] [review]
> patch for "platform: fix and simplify address lifetime compensation"

(In reply to comment #51)
> (In reply to comment #50)
> 
> I suggests the attached patch for commit "platform: fix and simplify address
> lifetime compensation"

I don't see why should we complicate the code just to avoid a guarantee we don't need.

> Reason: substract_guint32 both receives "now" and "known_address->lifetime" as
> argument "a". The special treatment for a==MAX_UINT32 makes only sense in the
> second case.

Although you can see that you broke the second case as it must be checked separately for 'lifetime' and 'preferred'.

Example:

timestamp = 1000
now = 2000
lifetime = infinity
preferred = 10000

The initial condition evaluates FALSE because of the infinite lifetime and in the else clause produces the following output:

lifetime = infinity
preferred = infinity

The correct output is:

lifetime = infinity
preferred = 9000

Where preferred is smaller by 1000 which is a time difference from where the original value came from. 

My suggestion is to keep the current code (as in pavlix/runtime branch).

P.S.: I personally don't see it as critical because (1) the standards are not very useful here and (2) the linux kernel doesn't use the lifetimes properly either unless something changed recently. But I'm still trying to keep the implementation as standards-compliant as possible.
Comment 58 Pavel Simerda 2013-08-30 21:26:42 UTC
(In reply to comment #56)
> > 1) switch the update_setting code over to IGNORE like I originally suggested
> 
> Would switching to LINK_LOCAL instead be a problem? In fact what you suggest
> would make a LINK_LOCAL connection detected as IGNORE which is not very good.
> 
> > 2) change the matching code to treat IGNORE and LINK_LOCAL the same when the
> > CANDIDATE flag is passed
> 
> This one sounds better to me because...
> 
> While I think it won't work well as you suggest, as IGNORE connections will be
> most often detected as AUTO connections at the next start. In my thought model,
> the following connections when left over and assumed would be...
> 
> [configured -> kernel -> generated]
> IGNORE -> dynamic address(es) -> AUTO
> AUTO -> dynamic address(es) -> AUTO
> DHCP -> dynamic address(es) -> AUTO
> LINK_LOCAL -> no address -> LINK_LOCAL (IGNORE according to your #1)
> MANUAL -> static address(es) -> MANUAL
> SHARED -> static address(es) -> MANUAL
> 
> Therefore we need nm_connection_compare(flags=CANDIDATE) to match:
> 
> [generated -> configured]
> LINK_LOCAL -> LINK_LOCAL (not sure why you also want IGNORE)
> MANUAL -> MANUAL, SHARED
> AUTO -> IGNORE, AUTO, DHCP
> 
> Note: I'm only taking fully activated connections into account.

I implemented this as in pavlix/runtime (see settings: compare IPv6 methods carefully).

> That's a bug I already had in my notes from rewriting the nm-device to use the
> guaranteed s_ip4 and s_ip6. See:
> 
>     if (   strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_AUTO) == 0
>         || strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_LINK_LOCAL) == 0) {
>         if (!addrconf6_start (self)) {
> 
> We're running addrconf6_start() when method is LINK_LOCAL for some reason.

I added a proof of concept patch (core: treat NM_SETTING_IP6_CONFIG_METHOD_LINK_LOCAL as disabled) for this to pavlix/runtime.

Hope it helps.
Comment 59 Dan Williams 2013-08-30 22:29:22 UTC
> (In reply to comment #55)
> > Next up, for an IPv4-only assumed connection, I get:
> > 
> > NetworkManager[23529]: <info> (p4p1): device state change: ip-config ->
> > ip-check (reason 'none') [70 80 0]
> > NetworkManager[23529]: <info> Activation (p4p1) Stage 5 of 5 (IPv4 Commit)
> > complete.
> > NetworkManager[23529]: <error> [1377883611.834177] [rdisc/nm-lndp-rdisc.c:184]
> > send_rs(): (p4p1): cannot send router solicitation: -101.
> > NetworkManager[23529]: <info> (p4p1): device state change: ip-check ->
> > secondaries (reason 'none') [80 90 0]
> > NetworkManager[23529]: <info> (p4p1): device state change: secondaries ->
> > activated (reason 'none') [90 100 0]
> > NetworkManager[23529]: <info> NetworkManager state is now CONNECTED_GLOBAL
> > NetworkManager[23529]: <info> Activation (p4p1) successful, device activated.
> > NetworkManager[23529]: <info> startup complete
> > NetworkManager[23529]: <error> [1377883622.331002] [rdisc/nm-lndp-rdisc.c:184]
> > send_rs(): (p4p1): cannot send router solicitation: -101.
> > NetworkManager[23529]: <error> [1377883632.335168] [rdisc/nm-lndp-rdisc.c:184]
> > send_rs(): (p4p1): cannot send router solicitation: -101.
> > <repeats forever>
> 
> That's a bug I already had in my notes from rewriting the nm-device to use the
> guaranteed s_ip4 and s_ip6. See:
> 
>     if (   strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_AUTO) == 0
>         || strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_LINK_LOCAL) == 0) {
>         if (!addrconf6_start (self)) {
> 
> We're running addrconf6_start() when method is LINK_LOCAL for some reason.
> 
> > When LINK_LOCAL is used, NM shouldn't be running
> > and of the libndp stuff right?
> 
> That's correct. When LINK_LOCAL is used, accept_ra should be set to 0 just as
> with AUTO, DHCP, MANUAL and SHARED and as opposed to IGNORE method and
> UNMANAGED/foreign state. In the case of LINK_LOCAL, no other action should be
> ever performed (except optionally we could wait for the link-local address to
> become ready).

I agree it's pavlix/runtime's fault and we hsould file a bug for this and move on so we don't forget it.
Comment 60 Dan Williams 2013-08-31 00:02:02 UTC
ok, as of head d44bda5561 I have no further issues with the branch.  I'd like to do a bit more testing next week on bonding and re-check the IP method stuff, but the branch is in very good shape now.
Comment 61 Pavel Simerda 2013-09-25 11:27:55 UTC
What's the current status? Is there anything needed from me?
Comment 62 Dan Williams 2013-11-07 06:26:48 UTC
Reviews and testing on dcbw/runtime would be appreciated.
Comment 63 Pavel Simerda 2013-11-07 07:31:56 UTC
Is there any summary of changes from the previous version?
Comment 64 Pavel Simerda 2013-11-07 08:18:21 UTC
> TMP: avoid one bug warning

Should be evaluated and either renamed (or squashed if applicable) or removed.

> core: capture initial device DHCP IP configuration

What do we use it for?

> ethernet: handle cloned/permanent MAC when updating connection

Includes libgsystem update.

> core: add nm_active_connection_[get|set]_assumed()
>    
> Various code during the activation paths will want to know whether
> the connection is assumed or not, so that it doesn't do stuff that
> touches the device.

I was actually tempted to do that but then abandoned the idea as it seemed to make more sense to keep this information in the state reason as long as the actions are done in response to the fact that the connection was assumed.

The commit message seems to be wrong. Unless something changed, assumed connections can be just ordinary connections taken over from the previous run and activate almost the same way as a non-assumed connection. Only when the runtime status doesn't match any configuration, the device should stay unmanaged (whatever that means and however it is internally represented) and untouched.

Otherwise, looks like a good deal of work done!
Comment 65 Thomas Haller 2013-11-07 14:58:09 UTC
(In reply to comment #64)
> > core: capture initial device DHCP IP configuration
> 
> What do we use it for?
> 

This commit seems really strang to me. Why would we iterate over *all* compatible connections, searching for DHCP leases for them and use it to initialize the device? Lets say, we have two similar devices and the other one is active with dhcp and connection#1. Wouldn't we then find connection#1 and take the config from there? (although its an unrelated device?)

And is it really correct to remove "update_ip_config" from constructor()?
Comment 66 Dan Williams 2013-11-07 16:06:55 UTC
(In reply to comment #64)
> > TMP: avoid one bug warning
> 
> Should be evaluated and either renamed (or squashed if applicable) or removed.

You're probably the expert in why that patch was there though :)  Do you recall what problem it solved?

> > core: capture initial device DHCP IP configuration
> 
> What do we use it for?

It's not necessary now, but it will be necessary for better connection matching in the future.  Having the DHCP configuration information allows NM to populate the DNS, gateway, searches, etc fields of the IPxConfig objects which will then push through to the generated connection, and then be used to match against existing connections with better resolution than simply matching on IP address alone.

> > ethernet: handle cloned/permanent MAC when updating connection
> 
> Includes libgsystem update.

Grrr.

> > core: add nm_active_connection_[get|set]_assumed()
> >    
> > Various code during the activation paths will want to know whether
> > the connection is assumed or not, so that it doesn't do stuff that
> > touches the device.
> 
> I was actually tempted to do that but then abandoned the idea as it seemed to
> make more sense to keep this information in the state reason as long as the
> actions are done in response to the fact that the connection was assumed.
> 
> The commit message seems to be wrong. Unless something changed, assumed
> connections can be just ordinary connections taken over from the previous run
> and activate almost the same way as a non-assumed connection. Only when the
> runtime status doesn't match any configuration, the device should stay
> unmanaged (whatever that means and however it is internally represented) and
> untouched.

The reason for this patch, and I don't like it that much either, is for "core: make assumed activations go through all stages".  Here's the dilemma:

1) assumed slaves need to be "added" to their masters.  By this I mean only informational adding, not actually setting slave properties since they are already set.  That happens during stage1.  But previously, assumption skipped stage1 completely.  So to keep things simpler, we must make assumed connections go through stage1 and stage2.

2) But we don't want state1 and stage2 to make *any* functional changes to the interface, since these changes would be NOPs and thus completely pointless, and at worst screw things up.  So we use nm_active_connection_get_assumed() in stage1 and stage2 to avoid doing anything to the device except master/slave handling.  We can't use priv->state_reason here, because that will change when the device moves from stage1 -> stage2 -> stage3, losing the information we need.

There's probably a better way to do the master/slave handling, but that's going to take a lot more thought.
Comment 67 Dan Williams 2013-11-07 16:23:00 UTC
(In reply to comment #65)
> (In reply to comment #64)
> > > core: capture initial device DHCP IP configuration
> > 
> > What do we use it for?
> > 
> 
> This commit seems really strang to me. Why would we iterate over *all*
> compatible connections, searching for DHCP leases for them and use it to
> initialize the device? Lets say, we have two similar devices and the other one
> is active with dhcp and connection#1. Wouldn't we then find connection#1 and
> take the config from there? (although its an unrelated device?)

This could happen if (a) the connection is not locked to a specific device, (b) the connection was activated on the device recently but is not currently active on the device, and (c) the lease is still valid.

I see I can improve that by ensuring that at least one address on the interface has a non-infinite lifetime, which I will do.  Otherwise the code will run for interfaces with static-only addresses, which would be pointless.

> And is it really correct to remove "update_ip_config" from constructor()?

It's moved from the constructor to a separate function (nm_device_capture_initial_configuration() or something like that) which the manager calls very, very soon after the device is created.  The problem with calling update_ip_config() from the constructor is that the device's ConnectionProvider isn't set yet, because the manager sets that after construct-time as well.  So the first update_ip_config() needs to be called *after* the ConnectionProvider is set, since the DHCP lease capture functionality requires the ConnectionProvider to get access to the connections.

I would have rather had the connection provider passed in through the device constructor, but unfortunately we're not using g_object_new() to create devices, and I didn't really want to add a 'connection_provider" argument to every device's _new() function yet.  Maybe that's the right way to go in the future?
Comment 68 Thomas Haller 2013-11-07 16:24:44 UTC
>> ifcfg-rh: don't crash when in-memory-only connections don't have paths

Could you rebase/move much commit earlier before generating connections? So, that the commits between are not (obviously) buggy.
Comment 69 Dan Winship 2013-11-07 17:06:50 UTC
> core: connection live reconfiguration upon nm-platform signals

seems out of order, commit-wise, since it's mostly a no-op at this point isn't it?



> core: fix 'hairpin_mode' after 9e19c3db (core: use nm_platform_*_*_option() for bridges)
> core: fix bridge port sysfs directory determination after f5507633 (platform: bridging and bonding options)

feel free to land those immediately



> core: implement update_connection() for bridging
> 
> Acked-by: Dan Winship <danw@gnome.org>

Not sure why it says that...

>+	/* FIXME: This is a one-time setting and doesn't need to be updated. */
>+	g_object_set (s_bridge, NM_SETTING_BRIDGE_INTERFACE_NAME, ifname, NULL);

So move it into the "if (!s_bridge)" part above.



> core: implement update_connection() for bonds

exactly the same comments as for bridging



> core: implement update_connection() for VLANs

The commit message here is quite confusing.

>+	/* VLAN interface name; cannot rely on automatic name generation since the
>+	 * VLAN name doesn't follow any specific format.
>+	 * FIXME: detect when it's in the standard format and don't set it here?
> 	 */
>+	g_object_set (s_vlan, NM_SETTING_VLAN_INTERFACE_NAME, nm_device_get_iface (device), NULL);

nm-connection-editor also always sets :interface-name anyway. I don't think the comment is needed.

(Hm... side note: there is code elsewhere that assumes that nm_connection_get_virtual_iface_name() will always return non-NULL for software devices. Which is apparently wrong.)

>+	/* Try to find the parent if it's already known to the NMManager */
>+	parent = nm_manager_get_device_by_ifindex (nm_manager_get (), parent_ifindex);
>+	g_assert (parent);

Kill the comment, since if it was true, the g_assert() would sometimes fail.



> tun/tap: implement update_connection() for TUN/TAP

(The commit summary is inconsistent with the previous three, which all use "core:")



> core: support slave devices in nm_platform_generate_connection()

It can wait until after the initial commit, but this ought to be using a virtual method. "nm_device_update_slave_connection (master, slave, connection)"

>+	/* No bond slave options yet... */

Probably safe to lose the "yet..." since presumably the bond driver isn't going to get much more development. Or actually, just kill this function entirely, and when we add the vmethod later, just don't implement it for bond.



> dhclient: simplify lease expiration checking

>+lease_valid (const char *str_expire)

I'd call this "lease_expired()" and flip the return value, since there are other things that could be invalid about the lease, and we check those elsewhere. (Oh, and it gets changed in the next commit... OK, whatever.)

>+	/* Skip initial number (day of week?) */

according to the dhclient source code, yes. Any reason to not just include that in the strptime() format?



> dhcp: make dhclient lease parsing code testable

It's too bad that this involved losing all the warning messages...

>+	/* Treat 'interface' specially */
>+	if (g_str_has_prefix (line, "interface")) {
>+		if (*(spc) == '"')

The previous patch treats 'domain-name' the same way; maybe add_lease_option() should just always strip quotes.

>+ * @now: the current UTC date/time; usually a new g_date_time_new_now_utc() is
>+ *  used except for testcases which may need a different value for 'now'

should allow passing %NULL for the normal case

>+++ b/src/dhcp-manager/tests/leases/basic.leases
>+++ b/src/dhcp-manager/tests/leases/expired.leases

You don't need both of these. Just try to parse basic.leases with two different dates.

>+	/* Date from before the least expiration */

s/least/earliest/ so it doesn't look like a typo


It's not immediately obvious what's wrong with the malformed leases, so explaining that somewhere would be good.



> core: capture initial device DHCP IP configuration

>+		if (out_ip6_config && strcmp (method, NM_SETTING_IP6_CONFIG_METHOD_AUTO) == 0) {
>+		}

add a FIXME comment



> core: clean up legacy connection matching; remove match_l2_config
    
> match_l2_config was only used by Infinband and Team, neither of which
> actually implemented the required logic.  Remove it completely.

typo "Infinband". The comment is not quite right either; they both returned TRUE for match_l2_config(), meaning they'd err on the side of false positives, and so InfiniBand and team connections were assumable. After this patch they are not. We need to fix that eventually.



> platform: sort slaves after their master devices

You could just collapse the two cases together now:

  if (link_a->master || link_a->parent)

also, the *3 can be *2 now, since we're only using +1, not -1



> core: allow devices to activate their generated connections

Doesn't this remove the ability to activate connections on default-unmanaged devices?



> core: find assumed connection masters

>+nm_manager_get_device_by_iface (NMManager *manager, const char *iface)

we already have "find_device_by_ip_iface()". One of them should be renamed to match the other.

>+find_assumed_connection_master (NMManager *self,

why can't we just use find_master() here?

>+	/* We might get a NULL master device or master ActiveConnection if
>+	 * the master device cannot generate connections, or the master device
>+	 * wasn't recognized yet (a bug in NM), or the master device is not
>+	 * managed by NM.
>+	 */
>+	master_device = nm_manager_get_device_by_iface (self, master);
>+	g_return_val_if_fail (master_device != NULL, NULL);

The comment makes it sound like that shouldn't be a return-if-fail (and likewise with master_ac below).

>+			nm_active_connection_set_assumed (active, TRUE);

belongs in the previous commit really



> core: make assumed activations go through all the stages

>-	nm_device_master_check_slave_physical_port (device, slave, LOGD_BOND);
>+	if (configure) {
>+		nm_device_master_check_slave_physical_port (device, slave, LOGD_BOND);

That's a non-modifying operation, and I think the warning is as relevant when assuming an existing bond connection as it is when creating our own.
Comment 70 Pavel Simerda 2013-11-07 17:23:02 UTC
(In reply to comment #66)
> (In reply to comment #64)
> > > core: capture initial device DHCP IP configuration
> > 
> > What do we use it for?
> 
> It's not necessary now, but it will be necessary for better connection matching
> in the future.  Having the DHCP configuration information allows NM to populate
> the DNS, gateway, searches, etc fields of the IPxConfig objects which will then
> push through to the generated connection, and then be used to match against
> existing connections with better resolution than simply matching on IP address
> alone.

I'd move it to a separate bug report to get the design and use case right separately, if there's no relation to the current development.

> > > core: add nm_active_connection_[get|set]_assumed()
> > >    
> > > Various code during the activation paths will want to know whether
> > > the connection is assumed or not, so that it doesn't do stuff that
> > > touches the device.
> > 
> > I was actually tempted to do that but then abandoned the idea as it seemed to
> > make more sense to keep this information in the state reason as long as the
> > actions are done in response to the fact that the connection was assumed.
> > 
> > The commit message seems to be wrong. Unless something changed, assumed
> > connections can be just ordinary connections taken over from the previous run
> > and activate almost the same way as a non-assumed connection. Only when the
> > runtime status doesn't match any configuration, the device should stay
> > unmanaged (whatever that means and however it is internally represented) and
> > untouched.
> 
> The reason for this patch, and I don't like it that much either, is for "core:
> make assumed activations go through all stages".  Here's the dilemma:
> 
> 1) assumed slaves need to be "added" to their masters.  By this I mean only
> informational adding, not actually setting slave properties since they are
> already set.

No problem.

> That happens during stage1.  But previously, assumption skipped
> stage1 completely.  So to keep things simpler, we must make assumed connections go through stage1 and stage2.

I always wanted to treat assumed connections the same way as non-assumed connections as closely as possible. Things are much easier then.

> 2) But we don't want state1 and stage2 to make *any* functional changes to the
> interface, since these changes would be NOPs and thus completely pointless,

You can look at the address synchronization code. It avoids unnecessary syscalls as much as possible, eliminating the hassle caused by NOOP configuration changes.

But I understand the same ways may not be possible universally.

> and
> at worst screw things up.  So we use nm_active_connection_get_assumed() in
> stage1 and stage2 to avoid doing anything to the device except master/slave
> handling.  We can't use priv->state_reason here, because that will change when
> the device moves from stage1 -> stage2 -> stage3, losing the information we
> need.

Taken.

> There's probably a better way to do the master/slave handling, but that's going
> to take a lot more thought.

I was maybe more concerned about the wording, it's clear we're on the same line here.

(In reply to comment #69)
> > core: implement update_connection() for bridging
> > 
> > Acked-by: Dan Winship <danw@gnome.org>
> 
> Not sure why it says that...

Unless it has been modified since then, it was marked so after your positive review most probably on IRC #nm.
Comment 71 Thomas Haller 2013-11-07 17:39:38 UTC
>> core: find assumed connection masters

find_assumed_connection_master() should maybe not use g_return_val_if_fail, if it can happen under ~normal~ conditions.
Comment 72 Thomas Haller 2013-11-07 18:20:44 UTC
>> core: make assumed activations go through all the stages

Would you mind moving act_stage1_prepare() one function down (where it was before). That makes a smaller diff, and it's clearer what changed.
Comment 73 Dan Williams 2013-11-07 22:49:59 UTC
repushed, addressing everyone's comments.  The only major changes are the addition of update_connection() support for Infiniband and Team which are in two separate patches with the other update_connection() patches.
Comment 74 Dan Winship 2013-11-08 13:15:28 UTC
> core: implement update_connection() for Infiniband
> core: implement update_connection() for Team

Should remove their match_l2_config()s here too. (And then move the "remove match_l2_config" patch to right after this, and remove the mentions of InfiniBand and Team in its commit message.)

Also, InfiniBand message should have a FIXME for partition support.



> TMP: avoid one bug warning

Still need to figure out what's up with that. (Just changing "TMP" to "platform" if nothing else.)



> core: don't generate connections for some devices

>+           && !(nm_platform_link_get_type (priv->ifindex) & 0x20000)) {

Ahem. From a few lines above where you stole that magic number: "Please don't interpret type numbers outside nm-platform ... use functions like ... nm_platform_supports_slaves()."
Comment 75 Dan Williams 2013-11-08 22:47:17 UTC
(In reply to comment #74)
> > core: implement update_connection() for Infiniband
> > core: implement update_connection() for Team
> 
> Should remove their match_l2_config()s here too. (And then move the "remove
> match_l2_config" patch to right after this, and remove the mentions of
> InfiniBand and Team in its commit message.)
> 
> Also, InfiniBand message should have a FIXME for partition support.
> 
> 
> 
> > TMP: avoid one bug warning
> 
> Still need to figure out what's up with that. (Just changing "TMP" to
> "platform" if nothing else.)
> 
> 
> 
> > core: don't generate connections for some devices
> 
> >+           && !(nm_platform_link_get_type (priv->ifindex) & 0x20000)) {
> 
> Ahem. From a few lines above where you stole that magic number: "Please don't
> interpret type numbers outside nm-platform ... use functions like ...
> nm_platform_supports_slaves()."

Fixed all these up.
Comment 76 Dan Williams 2013-11-08 22:55:43 UTC
Merged to master.  All further improvements (like better connection matching and better behavior of devices !IFF_UP when NM starts) should get new bugs.