GNOME Bugzilla – Bug 705102
NMPlatform address lifetimes break DHCP renewal because stuff doesn't match
Last modified: 2013-07-30 17:01:04 UTC
We need to deal with lifetimes in a less naive way, apparently. 1) DHCP acquires a lease, sets up various NMPlatformIP4Address objects in the NMIP4Config 2) nm_platform_ip4_address_sync() sets the DHCP address on the interface; ip4_address_add() -> add_object() -> refresh_object() -> announce_object() -> init_ip4_address() which sets ->timestamp = rtnl_addr_get_create_time (rtnladdr) 3) time passes... 4) DHCP renews, creates a new NMIP4Config with the the same NMPlatformIP4Address, except these have a longer lifetime because the lease has been renewed 5) nm_platform_ip4_address_sync() compares the existing interface addresses, (which have a timestamp from step #2) to the new DHCP addresses, which have no timestamp/lifetime/preferred, and the comparison fails. See array_contains_ip4_address(): if (!memcmp (&g_array_index (addresses, NMPlatformIP4Address, i), address, sizeof (*address))) oops, the address doesn't match, and nm_platform_ip4_address_sync() removes it, only to re-add it in the second part of the function. But since the address got removed, the kernel removes the default and subnet routes since there's no longer any interface which can talk to the gateway. nm_platform_ip4_address_sync() also removes the subnet route (192.168.1.0/24) because it doesn't exist in the IP4Config routes, because it's really a kernel-managed route. So you're left with no routes through the interface after a DHCP renew. I'll bet the same thing happens when a new router advertisement comes in too since the same procedure goes on there. NetworkManager[22981]: <info> (wlan0): DHCPv4 state changed bound -> renew bound to 192.168.1.175 -- renewal in 233 seconds. NetworkManager[22981]: <debug> [1375126921.281395] [devices/nm-device.c:2334] dhcp4_state_changed(): (wlan0): new DHCPv4 client state 6 NetworkManager[22981]: <info> address 192.168.1.175 NetworkManager[22981]: <info> plen 24 (255.255.255.0) NetworkManager[22981]: <info> gateway 192.168.1.1 NetworkManager[22981]: <info> nameserver '192.168.1.1' NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: deleting addresses NetworkManager[22981]: needle: (0) 192.168.1.175/24 t:2110713 l:4294967295 p:4294967295 NetworkManager[22981]: candidate: (0) 192.168.1.175/24 t:0 l:0 p:0 NetworkManager[22981]: ****************** deleing address 192.168.1.175/24 NetworkManager[22981]: <debug> [1375126921.281809] [platform/nm-platform.c:1123] nm_platform_ip4_address_delete(): address: deleting IPv4 address NetworkManager[22981]: <debug> [1375126921.282384] [platform/nm-platform.c:1713] log_ip4_address(): signal: address removed: 192.168.1.175/24 dev wlan0 NetworkManager[22981]: <debug> [1375126921.283093] [devices/nm-device.c:5922] device_ip_changed(): (wlan0): queued IP config change NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: DONE deleting addresses NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: adding addresses NetworkManager[22981]: <debug> [1375126921.285833] [platform/nm-platform.c:1083] nm_platform_ip4_address_add(): address: adding IPv4 address NetworkManager[22981]: <debug> [1375126921.286530] [platform/nm-platform.c:1713] log_ip4_address(): signal: address added: 192.168.1.175/24 dev wlan0 NetworkManager[22981]: <debug> [1375126921.287772] [devices/nm-device.c:5922] device_ip_changed(): (wlan0): queued IP config change NetworkManager[22981]: ****************** nm_platform_ip4_address_sync: DONE adding addresses NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: deleting routes NetworkManager[22981]: needle: (0) 192.168.1.0/24 via 0.0.0.0 metric:0 mss:0 NetworkManager[22981]: ######## ip4_route_delete: will delete route 192.168.1.0/24 0 NetworkManager[22981]: <debug> [1375126921.291377] [platform/nm-platform.c:1775] log_ip4_route(): signal: route removed: 192.168.1.0/24 via 0.0.0.0 dev wlan0 metric 0 NetworkManager[22981]: <debug> [1375126921.291879] [devices/nm-device.c:5922] device_ip_changed(): (wlan0): queued IP config change NetworkManager[22981]: ######## ip4_route_delete: deleted route NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: DONE deleting routes NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: adding routes NetworkManager[22981]: ****************** nm_platform_ip4_route_sync: DONE adding routes So what do we do here? Obviously (to me anyway) array_contains_ip[4|6]_address() need to stop using memcmp() and start just comparing the members directly. I've pushed that to git master to unbreak addressing at least. But there's a larger question of what happens when we get a renew. We somehow need to push the extended lifetimes down to the kernel. But does the kernel even allow changing the address in-place? If we remove an address so we can re-add it with the updated lifetime, the kernel will remove routes, and that means that our cache will be out-of-date (it'll think the default route and subnet route are still assigned, but the kernel has just removed them due to the address removal) and you'll still end up without any routes. Do we need updated kernel API here to modify an address's creation time or something? Plus, the DHCP server or the IPv6 router could change lease times on us, which means the new addresses will have a shorter lifetime/preferred value than the existing address.
So there's really two issues here: 1) address timestamp/lifetime/preferred not matching 2) platform erroneously removes subnet route (eg 192.168.1.0/24)
(In reply to comment #0) > oops, the address doesn't match, and nm_platform_ip4_address_sync() removes it, > only to re-add it in the second part of the function. But since the address > got removed, the kernel removes the default and subnet routes since there's no > longer any interface which can talk to the gateway. Yep, we need to update the lifetimes instead of removing and re-adding the address. > I'll bet the same thing happens when a new router > advertisement comes in too since the same procedure goes on there. Yes but still only IPv4 is affected unless you have IPv6 static routes as only those have a prefix length. > So what do we do here? Obviously (to me anyway) > array_contains_ip[4|6]_address() need to stop using memcmp() and start just > comparing the members directly. I've pushed that to git master to unbreak > addressing at least. Your patch was the best first step for the complete solution. > But there's a larger question of what happens when we get a renew. We somehow > need to push the extended lifetimes down to the kernel. Created 'pavlix/platform' branch for that: commit 836f9f726cb0176bb5239fc443f77382d5236ca3 Author: Pavel Šimerda <psimerda@redhat.com> Date: Tue Jul 30 00:03:48 2013 +0200 platform: update all address lifetimes The nm_platform_ip[46]_address_sync() functions no longer use nm_platform_ip[46]_address_exists() to avoid adding already existing addresses. That means nm_platform_ip[46]_address_add() is now called for *all* commited addresses and the lifetimes are thus always updated. Because of that, nm_platform_ip[46]_address_add() had to be modified to accept existing addresses and update their lifetimes when appropriate. https://bugzilla.gnome.org/show_bug.cgi?id=705102 > But does the kernel even allow changing the address in-place? As we already talked on IRC, yes, it's perfectly possible. > Plus, the DHCP server or the IPv6 router could change lease times on us, which > means the new addresses will have a shorter lifetime/preferred value than the > existing address. That sounds ok to me. We need to update the lifetimes in both cases. (In reply to comment #1) > So there's really two issues here: > > 1) address timestamp/lifetime/preferred not matching As above. > 2) platform erroneously removes subnet route (eg 192.168.1.0/24) commit 15338078c556f8e788ba5a44ce53442e9b4a2d02 Author: Pavel Šimerda <psimerda@redhat.com> Date: Tue Jul 30 00:24:58 2013 +0200 platform: ignore kernel-generated routes This is necessary to avoid tinkering with IPv4 prefix routes automatically inserted by the kernel for each IPv4 address. https://bugzilla.gnome.org/show_bug.cgi?id=705102 Note that the 'pavlix/platform' branch still doesn't pass tests because of an inconsistency in 'ip[46]-address-added' signal handling.
(In reply to comment #2) > Note that the 'pavlix/platform' branch still doesn't pass tests because of an > inconsistency in 'ip[46]-address-added' signal handling. Issue was fixed, patches pushed to master.