After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 705102 - NMPlatform address lifetimes break DHCP renewal because stuff doesn't match
NMPlatform address lifetimes break DHCP renewal because stuff doesn't match
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-07-29 19:48 UTC by Dan Williams
Modified: 2013-07-30 17:01 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dan Williams 2013-07-29 19:48:38 UTC
We need to deal with lifetimes in a less naive way, apparently.

1) DHCP acquires a lease, sets up various NMPlatformIP4Address objects in the NMIP4Config

2) nm_platform_ip4_address_sync() sets the DHCP address on the interface; ip4_address_add() -> add_object() -> refresh_object() -> announce_object() -> init_ip4_address() which sets ->timestamp = rtnl_addr_get_create_time (rtnladdr)

3) time passes...

4) DHCP renews, creates a new NMIP4Config with the the same NMPlatformIP4Address, except these have a longer lifetime because the lease has been renewed

5) nm_platform_ip4_address_sync() compares the existing interface addresses, (which have a timestamp from step #2) to the new DHCP addresses, which have no timestamp/lifetime/preferred, and the comparison fails.  See array_contains_ip4_address():

	if (!memcmp (&g_array_index (addresses, NMPlatformIP4Address, i), address, sizeof (*address)))

oops, the address doesn't match, and nm_platform_ip4_address_sync() removes it, only to re-add it in the second part of the function.  But since the address got removed, the kernel removes the default and subnet routes since there's no longer any interface which can talk to the gateway.

nm_platform_ip4_address_sync() also removes the subnet route (192.168.1.0/24) because it doesn't exist in the IP4Config routes, because it's really a kernel-managed route.  So you're left with no routes through the interface after a DHCP renew.  I'll bet the same thing happens when a new router advertisement comes in too since the same procedure goes on there.

NetworkManager[22981]: <info> (wlan0): DHCPv4 state changed bound -> renew
bound to 192.168.1.175 -- renewal in 233 seconds.
NetworkManager[22981]: <debug> [1375126921.281395] [devices/nm-device.c:2334] dhcp4_state_changed(): (wlan0): new DHCPv4 client state 6
NetworkManager[22981]: <info>   address 192.168.1.175
NetworkManager[22981]: <info>   plen 24 (255.255.255.0)
NetworkManager[22981]: <info>   gateway 192.168.1.1
NetworkManager[22981]: <info>   nameserver '192.168.1.1'
NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: deleting addresses
NetworkManager[22981]:                needle: (0) 192.168.1.175/24  t:2110713 l:4294967295 p:4294967295
NetworkManager[22981]:                   candidate: (0) 192.168.1.175/24  t:0 l:0 p:0
NetworkManager[22981]: ******************      deleing address 192.168.1.175/24
NetworkManager[22981]: <debug> [1375126921.281809] [platform/nm-platform.c:1123] nm_platform_ip4_address_delete(): address: deleting IPv4 address
NetworkManager[22981]: <debug> [1375126921.282384] [platform/nm-platform.c:1713] log_ip4_address(): signal: address removed: 192.168.1.175/24 dev wlan0
NetworkManager[22981]: <debug> [1375126921.283093] [devices/nm-device.c:5922] device_ip_changed(): (wlan0): queued IP config change
NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: DONE deleting addresses
NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: adding addresses
NetworkManager[22981]: <debug> [1375126921.285833] [platform/nm-platform.c:1083] nm_platform_ip4_address_add(): address: adding IPv4 address
NetworkManager[22981]: <debug> [1375126921.286530] [platform/nm-platform.c:1713] log_ip4_address(): signal: address added: 192.168.1.175/24 dev wlan0
NetworkManager[22981]: <debug> [1375126921.287772] [devices/nm-device.c:5922] device_ip_changed(): (wlan0): queued IP config change
NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: DONE adding addresses
NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: deleting routes
NetworkManager[22981]:                needle: (0) 192.168.1.0/24 via 0.0.0.0 metric:0 mss:0
NetworkManager[22981]: ######## ip4_route_delete: will delete route 192.168.1.0/24 0
NetworkManager[22981]: <debug> [1375126921.291377] [platform/nm-platform.c:1775] log_ip4_route(): signal: route removed: 192.168.1.0/24 via 0.0.0.0 dev wlan0 metric 0
NetworkManager[22981]: <debug> [1375126921.291879] [devices/nm-device.c:5922] device_ip_changed(): (wlan0): queued IP config change
NetworkManager[22981]: ######## ip4_route_delete: deleted route
NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: DONE deleting routes
NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: adding routes
NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: DONE adding routes

So what do we do here?  Obviously (to me anyway) array_contains_ip[4|6]_address() need to stop using memcmp() and start just comparing the members directly.  I've pushed that to git master to unbreak addressing at least.

But there's a larger question of what happens when we get a renew.  We somehow need to push the extended lifetimes down to the kernel.

But does the kernel even allow changing the address in-place?  If we remove an address so we can re-add it with the updated lifetime, the kernel will remove routes, and that means that our cache will be out-of-date (it'll think the default route and subnet route are still assigned, but the kernel has just removed them due to the address removal) and you'll still end up without any routes.  Do we need updated kernel API here to modify an address's creation time or something?

Plus, the DHCP server or the IPv6 router could change lease times on us, which means the new addresses will have a shorter lifetime/preferred value than the existing address.
Comment 1 Dan Williams 2013-07-29 19:54:26 UTC
So there's really two issues here:

1) address timestamp/lifetime/preferred not matching

2) platform erroneously removes subnet route (eg 192.168.1.0/24)
Comment 2 Pavel Simerda 2013-07-30 00:52:16 UTC
(In reply to comment #0)
> oops, the address doesn't match, and nm_platform_ip4_address_sync() removes it,
> only to re-add it in the second part of the function.  But since the address
> got removed, the kernel removes the default and subnet routes since there's no
> longer any interface which can talk to the gateway.

Yep, we need to update the lifetimes instead of removing and re-adding the address.

> I'll bet the same thing happens when a new router
> advertisement comes in too since the same procedure goes on there.

Yes but still only IPv4 is affected unless you have IPv6 static routes as only those have a prefix length.

> So what do we do here?  Obviously (to me anyway)
> array_contains_ip[4|6]_address() need to stop using memcmp() and start just
> comparing the members directly.  I've pushed that to git master to unbreak
> addressing at least.

Your patch was the best first step for the complete solution.

> But there's a larger question of what happens when we get a renew.  We somehow
> need to push the extended lifetimes down to the kernel.

Created 'pavlix/platform' branch for that:

commit 836f9f726cb0176bb5239fc443f77382d5236ca3
Author: Pavel Šimerda <psimerda@redhat.com>
Date:   Tue Jul 30 00:03:48 2013 +0200

    platform: update all address lifetimes
    
    The nm_platform_ip[46]_address_sync() functions no longer use
    nm_platform_ip[46]_address_exists() to avoid adding already
    existing addresses. That means nm_platform_ip[46]_address_add()
    is now called for *all* commited addresses and the lifetimes
    are thus always updated.
    
    Because of that, nm_platform_ip[46]_address_add() had to be modified to
    accept existing addresses and update their lifetimes when appropriate.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=705102

> But does the kernel even allow changing the address in-place?

As we already talked on IRC, yes, it's perfectly possible.

> Plus, the DHCP server or the IPv6 router could change lease times on us, which
> means the new addresses will have a shorter lifetime/preferred value than the
> existing address.

That sounds ok to me. We need to update the lifetimes in both cases.

(In reply to comment #1)
> So there's really two issues here:
> 
> 1) address timestamp/lifetime/preferred not matching

As above.

> 2) platform erroneously removes subnet route (eg 192.168.1.0/24)

commit 15338078c556f8e788ba5a44ce53442e9b4a2d02
Author: Pavel Šimerda <psimerda@redhat.com>
Date:   Tue Jul 30 00:24:58 2013 +0200

    platform: ignore kernel-generated routes
    
    This is necessary to avoid tinkering with IPv4 prefix routes
    automatically inserted by the kernel for each IPv4 address.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=705102

Note that the 'pavlix/platform' branch still doesn't pass tests because of an inconsistency in 'ip[46]-address-added' signal handling.
Comment 3 Pavel Simerda 2013-07-30 17:01:04 UTC
(In reply to comment #2)
> Note that the 'pavlix/platform' branch still doesn't pass tests because of an
> inconsistency in 'ip[46]-address-added' signal handling.

Issue was fixed, patches pushed to master.