GNOME Bugzilla – Bug 775613
WiFi being detected as ethernet when race condition on renaming for persistent name
Last modified: 2016-12-15 18:40:14 UTC
Created attachment 341378 [details] [review] patch for more detailed log Dear NetworkManager maintainers. Version: NetworkManager 1.4.2 This bug happens after power-on with probability about 1/50. That means we need to reboot about 50 times to get into the buggy situation. "nmcli d" shows the device type is ethernet: DEVICE TYPE STATE CONNECTION wlp1s0 ethernet unavailable -- lo loopback unmanaged -- The bug starts from a race condition. But it is not the root cause. I've also attach 2 logs. One is in good situation. Another is in bad situation. This log is generated by applying a "log patch" to network-manager 1.4.2 so we can see more stuff. In the bad situation. The bug starts with race condition. But the race condition is not the root cause. The race condition is: * During the renaming from "wlan0" to "wlp1s0". "wlan0" disappeared. * Inside the NM, it is still using "wlan0" in "_linktype_get_type()". * Since /sys/class/net/wlan0/uevent is disappeared. so the type matching failed in _linktype_get_type(). * Also wifi_utils_is_wifi() failed to because /sys/class/net/wlan0 disappeared. * And finally, devtype and kind are both NULL, so it returns NM_LINK_TYPE_ETHERNET for wlan0. Later, wlan0 is renamed to wlp1s0, and it seems to me that the Object inherit the type so it is still type ethernet. But from the log, I saw _linktype_get_type() is called several times later and return the correct type (wifi). But just, "nmcli d" still shows type ethernet. I'm wondering if we are missing to update the type in the Object created after renaming and re-detecting the type. Yours, Paul
Created attachment 341379 [details] 1.4.2 failed log.
Created attachment 341380 [details] 1.4.2 success log
Additional information: In bug 767317 https://bugzilla.gnome.org/show_bug.cgi?id=767317 There's a patch created by "Shih-Yuan Lee (FourDollars)" which forces returns TYPE_WIFI when the name starts with wlan even if that disappeared. That fixes this problem too.
Created attachment 341388 [details] [review] Proposed patch to fix the issue. We can avoid the situation by checking if the device completes its renaming by g_udev_device_get_is_initialized(), but I am not sure if this patch has any side effect. At least this patch fixes the issue on my testing machine.
(In reply to Kai-Heng Feng from comment #4) > Created attachment 341388 [details] [review] [review] > Proposed patch to fix the issue. this doesn't matter -- at least not for the error scenario in comment 1. Note that usually NM first receives a netlink notification about the new link before the UDEV event. At that point, due to a race it may determine that the device is of type ethernet, which triggers the creation of an NMDeviceEthernet instance -- instead of a NMDeviceWifi. All this happens before handle_udev_event(). Regarding comment 3, see also: https://mail.gnome.org/archives/networkmanager-list/2016-December/msg00004.html
I think all sysctl file lookups inside "/sys/class/net/%s" should be changed to use nmp_utils_open_sysctl() from https://mail.gnome.org/archives/networkmanager-list/2016-December/msg00004.html
Created attachment 341419 [details] log to show that comment #4 doesn't work. (In reply to Kai-Heng Feng from comment #4) > Created attachment 341388 [details] [review] [review] > Proposed patch to fix the issue. I've tried your patch. However, the bug is still reproducible. I agree with Thomas said, it is after the bug happened. I've changed your patch a little bit to get the log. + if (!g_udev_device_get_is_initialized (udev_device)) { + _LOGT ("UDEV event: khfeng: udev_device is not initialized. Break %s()", + __func__); + return; + } else { + _LOGT ("UDEV event: khfeng: udev_device is initialized. Don't break %s()", + __func__); + } + Attachment is the logs (7ziped) for success and fail status. When success, we can see a log: Dec 05 13:02:15 u-Precision-5520 NetworkManager[748]: <trace> [1480960935.6779] platform-linux: UDEV event: khfeng: udev_device is initialized. Don't break hand le_udev_event() When failed, we see nothing. I think we worth to try Thomas's suggestion on comment #6.
Created attachment 341532 [details] [review] [PATCH 1/2] platform: add a new function nmp_utils_open_sysctl()
Created attachment 341533 [details] [review] [PATCH 2/2] platform: wifi: use nmp_utils_open_sysctl() to check if device is wifi
nmp_utils_open_sysctl() works great, the issue is solved. The issue no longer happens when wifi_utils_is_wifi() uses it to check the wifi type.
lgtm
Thanks Kai-Heng Feng But sorry. I cannot verify the patch recently due to I'm going to MiniDebConfJP2016 tomorrow. Will be back at next week.
I took the two patches, and expanded on them... Please review https://cgit.freedesktop.org/NetworkManager/NetworkManager/log/?h=th/sysctl-ifname-race-bgo775613
Your patches should fix all race conditions in NM, but I can only test the WiFi case on my machine. It does fix the issue. Thanks for these patches!
(In reply to Thomas Haller from comment #13) > I took the two patches, and expanded on them... > > Please review > https://cgit.freedesktop.org/NetworkManager/NetworkManager/log/?h=th/sysctl- > ifname-race-bgo775613 Pushed a fixup. LGTM.
branch merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=0d7bf7dee31175d3eeab2bd5a9eef8e7b6618016
partially backported to nm-1-4: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=ea944d5b4c5eac192a95cf0397459df24fee39cf