After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 775613 - WiFi being detected as ethernet when race condition on renaming for persistent name
WiFi being detected as ethernet when race condition on renaming for persisten...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: Wi-Fi
1.4.x
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2016-12-05 04:02 UTC by Ying-Chun Liu
Modified: 2016-12-15 18:40 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch for more detailed log (2.34 KB, patch)
2016-12-05 04:02 UTC, Ying-Chun Liu
none Details | Review
1.4.2 failed log. (504.38 KB, text/plain)
2016-12-05 04:02 UTC, Ying-Chun Liu
  Details
1.4.2 success log (309.89 KB, text/plain)
2016-12-05 04:03 UTC, Ying-Chun Liu
  Details
Proposed patch to fix the issue. (1.20 KB, patch)
2016-12-05 09:56 UTC, Kai-Heng Feng
none Details | Review
log to show that comment #4 doesn't work. (45.04 KB, application/x-7z-compressed)
2016-12-05 18:20 UTC, Ying-Chun Liu
  Details
[PATCH 1/2] platform: add a new function nmp_utils_open_sysctl() (3.18 KB, patch)
2016-12-07 10:50 UTC, Kai-Heng Feng
none Details | Review
[PATCH 2/2] platform: wifi: use nmp_utils_open_sysctl() to check if device is wifi (3.29 KB, patch)
2016-12-07 10:50 UTC, Kai-Heng Feng
none Details | Review

Description Ying-Chun Liu 2016-12-05 04:02:06 UTC
Created attachment 341378 [details] [review]
patch for more detailed log

Dear NetworkManager maintainers.

Version: NetworkManager 1.4.2

This bug happens after power-on with probability about 1/50.
That means we need to reboot about 50 times to get into the buggy situation.
"nmcli d" shows the device type is ethernet:

DEVICE   TYPE      STATE        CONNECTION         
wlp1s0   ethernet  unavailable  --
lo       loopback  unmanaged    --                 

The bug starts from a race condition. But it is not the root cause.
I've also attach 2 logs. One is in good situation. Another is in bad situation.
This log is generated by applying a "log patch" to network-manager 1.4.2 so we can see more stuff.

In the bad situation. The bug starts with race condition. But the race condition is not the root cause. The race condition is:
 * During the renaming from "wlan0" to "wlp1s0". "wlan0" disappeared.
 * Inside the NM, it is still using "wlan0" in "_linktype_get_type()".
 * Since /sys/class/net/wlan0/uevent is disappeared. so the type matching failed in _linktype_get_type().
 * Also wifi_utils_is_wifi() failed to because /sys/class/net/wlan0 disappeared.
 * And finally, devtype and kind are both NULL, so it returns NM_LINK_TYPE_ETHERNET for wlan0.

Later, wlan0 is renamed to wlp1s0, and it seems to me that the Object inherit the type so it is still type ethernet.
But from the log, I saw _linktype_get_type() is called several times later and return the correct type (wifi). But just, "nmcli d" still shows type ethernet.

I'm wondering if we are missing to update the type in the Object created after renaming and re-detecting the type.

Yours,
Paul
Comment 1 Ying-Chun Liu 2016-12-05 04:02:51 UTC
Created attachment 341379 [details]
1.4.2 failed log.
Comment 2 Ying-Chun Liu 2016-12-05 04:03:16 UTC
Created attachment 341380 [details]
1.4.2 success log
Comment 3 Ying-Chun Liu 2016-12-05 04:06:24 UTC
Additional information:

In bug 767317
https://bugzilla.gnome.org/show_bug.cgi?id=767317

There's a patch created by "Shih-Yuan Lee (FourDollars)" which forces returns TYPE_WIFI when the name starts with wlan even if that disappeared. That fixes this problem too.
Comment 4 Kai-Heng Feng 2016-12-05 09:56:54 UTC
Created attachment 341388 [details] [review]
Proposed patch to fix the issue.

We can avoid the situation by checking if the device completes its renaming by g_udev_device_get_is_initialized(), but I am not sure if this patch has any side effect.

At least this patch fixes the issue on my testing machine.
Comment 5 Thomas Haller 2016-12-05 11:37:53 UTC
(In reply to Kai-Heng Feng from comment #4)
> Created attachment 341388 [details] [review] [review]
> Proposed patch to fix the issue.

this doesn't matter -- at least not for the error scenario in comment 1. Note that usually NM first receives a netlink notification about the new link before the UDEV event. At that point, due to a race it may determine that the device is of type ethernet, which triggers the creation of an NMDeviceEthernet instance -- instead of a NMDeviceWifi. All this happens before handle_udev_event().


Regarding comment 3, see also: https://mail.gnome.org/archives/networkmanager-list/2016-December/msg00004.html
Comment 6 Thomas Haller 2016-12-05 11:39:28 UTC
I think all sysctl file lookups inside "/sys/class/net/%s" should be changed to use nmp_utils_open_sysctl() from https://mail.gnome.org/archives/networkmanager-list/2016-December/msg00004.html
Comment 7 Ying-Chun Liu 2016-12-05 18:20:25 UTC
Created attachment 341419 [details]
log to show that comment #4 doesn't work.

(In reply to Kai-Heng Feng from comment #4)
> Created attachment 341388 [details] [review] [review]
> Proposed patch to fix the issue.

I've tried your patch. However, the bug is still reproducible. I agree with Thomas said, it is after the bug happened.

I've changed your patch a little bit to get the log.

+       if (!g_udev_device_get_is_initialized (udev_device)) {
+               _LOGT ("UDEV event: khfeng: udev_device is not initialized. Break %s()",
+                          __func__);
+               return;
+       } else {
+               _LOGT ("UDEV event: khfeng: udev_device is initialized. Don't break %s()",
+                          __func__);
+       }
+

Attachment is the logs (7ziped) for success and fail status.
When success, we can see a log:
Dec 05 13:02:15 u-Precision-5520 NetworkManager[748]: <trace> [1480960935.6779] 
platform-linux: UDEV event: khfeng: udev_device is initialized. Don't break hand
le_udev_event()

When failed, we see nothing.

I think we worth to try Thomas's suggestion on comment #6.
Comment 8 Kai-Heng Feng 2016-12-07 10:50:05 UTC
Created attachment 341532 [details] [review]
[PATCH 1/2] platform: add a new function nmp_utils_open_sysctl()
Comment 9 Kai-Heng Feng 2016-12-07 10:50:45 UTC
Created attachment 341533 [details] [review]
[PATCH 2/2] platform: wifi: use nmp_utils_open_sysctl() to check if device is wifi
Comment 10 Kai-Heng Feng 2016-12-07 10:53:14 UTC
nmp_utils_open_sysctl() works great, the issue is solved.

The issue no longer happens when wifi_utils_is_wifi() uses it to check the wifi type.
Comment 11 Thomas Haller 2016-12-07 10:57:53 UTC
lgtm
Comment 12 Ying-Chun Liu 2016-12-07 18:07:14 UTC
Thanks Kai-Heng Feng
But sorry. I cannot verify the patch recently due to I'm going to MiniDebConfJP2016 tomorrow. Will be back at next week.
Comment 13 Thomas Haller 2016-12-09 12:45:15 UTC
I took the two patches, and expanded on them...

Please review https://cgit.freedesktop.org/NetworkManager/NetworkManager/log/?h=th/sysctl-ifname-race-bgo775613
Comment 14 Kai-Heng Feng 2016-12-12 05:38:53 UTC
Your patches should fix all race conditions in NM, but I can only test the WiFi case on my machine. It does fix the issue.

Thanks for these patches!
Comment 15 Beniamino Galvani 2016-12-12 15:15:46 UTC
(In reply to Thomas Haller from comment #13)
> I took the two patches, and expanded on them...
> 
> Please review
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/log/?h=th/sysctl-
> ifname-race-bgo775613

Pushed a fixup. LGTM.