GNOME Bugzilla – Bug 770504
WiFi authentication fails after upgrade to 1.4.0
Last modified: 2016-09-26 01:35:43 UTC
Verbose debugging in wpa_supplicant reveals the presence of the scanning MAC address 7E:19:8C:39:8B:F0 during authentication, despite a call before authentication to set-hw-addr for the original MAC address F4:F2:6D:10:C5:09. The MLME frame event is ignored as a foreign address and authentication fails. Aug 28 15:01:04 ripley NetworkManager[9018]: <info> [1472353264.6902] device (wlxf4f26d10c509): set-hw-addr: set MAC address to 7E:19:8C:39:8B:F0 (scanning) [...] Aug 28 15:01:06 ripley NetworkManager[9018]: <info> [1472353266.3000] device (wlxf4f26d10c509): set-hw-addr: set-cloned MAC address to F4:F2:6D:10:C5:09 (permanent) [...] Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: New station ec:cb:30:60:58:32 Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: Event message available Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlxf4f26d10c509 Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlxf4f26d10c509(7e:19:8c:39:8b:f0) A1=f4:f2:6d:10:c5:09 A2=ec:cb:30:60:58:32 Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: wlxf4f26d10c509: Ignore MLME frame event for foreign address Workaround is to downgrade to 1.2.4. See also the Debian bug report: network-manager: WiFi authentication fails after upgrade to 1.4.0 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=835553 Kind regards, Ben.
For comparison, a successful authentication with 1.2.4 looks like this (and there are no set-hw-addr entries): Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Authenticate (ifindex=3) Aug 28 15:00:03 ripley wpa_supplicant[8462]: * bssid=ec:cb:30:60:58:32 Aug 28 15:00:03 ripley wpa_supplicant[8462]: * freq=2447 Aug 28 15:00:03 ripley wpa_supplicant[8462]: * IEs - hexdump(len=0): [NULL] Aug 28 15:00:03 ripley wpa_supplicant[8462]: * Auth Type 0 Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Authentication request send successfully Aug 28 15:00:03 ripley wpa_supplicant[8462]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1 Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Event message available Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Drv Event 19 (NL80211_CMD_NEW_STATION) received for wlxf4f26d10c509 Aug 28 15:00:03 ripley NetworkManager[7451]: <info> [1472353203.2475] device (wlxf4f26d10c509): supplicant interface state: inactive -> authenticating Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: New station ec:cb:30:60:58:32 Aug 28 15:00:03 ripley wpa_supplicant[8462]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1/BSSs/0 Aug 28 15:00:03 ripley wpa_supplicant[8462]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1/BSSs/1 Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Event message available Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlxf4f26d10c509 Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlxf4f26d10c509(f4:f2:6d:10:c5:09) A1=f4:f2:6d:10:c5:09 A2=ec:cb:30:60:58:32
In the successful authentication, the MLME event 37 information includes the original (A1) MAC address in the brackets. In the failing authentication, this is the scanning address.
during scanning, NetworkManager 1.4.0 now configures a random MAC address. This can be disabled by configuring NetworkManager.conf with [device] wifi.scan-rand-mac-address=no thus, this may be a better workaround then just uninstalling 1.4.0. See also: https://bugzilla.gnome.org/show_bug.cgi?id=770456#c14 Anyway. During activation, NM will reset the MAC address again. But it does so *before* telling supplicant to connect. At the point where supplicant starts connecting, the new MAC address is already set. It is expected that after resetting the MAC address, we still receive some Wi-Fi frames for the previous MAC address. But it is not cler, why that would be a problem. Can you please attach the full logfile of NetworkManager, with level=TRACE. See https://cgit.freedesktop.org/NetworkManager/NetworkManager/plain/contrib/fedora/rpm/NetworkManager.conf?id=c90ec2d8c8a12b44c908bf7f80b23059c29f68fa Thank you.
Created attachment 334407 [details] NetworkManager TRACE log file Attached NetworkManagerTRACE log file as requested. The log has been slightly sanitised to remove third-party AP information.
I can confirm that the workaround suggested for for me: [device] wifi.scan-rand-mac-address=no This bug looks like a duplicate of #770456. However, I am using the ath9k driver, not broadcom as reported for #770456.
This is a different issue from bug 770456. There is a 25 seconds timeout to associate with the access point. Still unclear why.
Hello, I am experiencing the same issue as Ben with the same daemon.log output. The suggested workaround of disabling random MAC address restores functionality for me, thanks for that! I believe I am using the rt2800usb driver; if TRACE level logging would be helpful from me as well, let me know and I will collect and attach it.
What kernel versions is everyone with this problem running?
I'm running Debian unstable; `uname -r` gives 4.6.0-1-amd64.
(In reply to Trevor Bekolay from comment #7) > Hello, I am experiencing the same issue as Ben with the same daemon.log > output. The suggested workaround of disabling random MAC address restores > functionality for me, thanks for that! > > I believe I am using the rt2800usb driver; if TRACE level logging would be > helpful from me as well, let me know and I will collect and attach it. Trevor, can you provide a TRACE level logfile of NM to see whether you have the same issue? Thanks
I am using kernel 4.6.4 on Debian unstable. The Debian package is linux-image-4.6.0-1-amd64.
Thanks. Thomas, I looked into the wpa_supplicant code (from git master, but I doubt it's changed a ton in the MAC address handling for a while). Here's what I found: The supplicant will only update its internal idea of the MAC address if: 1) it handles MAC address randomization itself 2) the interface moves from DISABLED -> ENABLED state; it only enters DISABLED state if the interface is !IFF_UP and was not disabled, and only enters enabled state if the interface is IFF_UP and was disabled. Unfortunately, there is a race (see src/drivers/driver_nl80211.c::wpa_driver_nl80211_event_rtm_newlink()) because the supplicant code in driver_nl80211.c listens for RTM_NEWLINK messages for IFF_UP, but when it gets the change notification it does a synchronous call to get the interface flags. If the supplicant gets an RTM_NEWLINK for !IFF_UP but the synchronous call reports the device is IFF_UP then the supplicant does nothing and the MAC address is not read. So this might be what's happening: 1) NM sets random MAC for scan 2) scan happens 3) supplicant wins the race and reads random MAC address 3) NM sets interface !IFF_UP 4) kernel broadcasts RTM_NEWLINK #1 with !IFF_UP 5) NM re-sets permanent MAC address 6) NM sets interface IFF_UP 7) kernel broadcasts RTM_NEWLINK #2 with IFF_UP 8) supplicant processes RTM_NEWLINK #1 and sees !IFF_UP 9) supplicant does sync call to get flags, sees IFF_UP 10) supplicant does not read MAC address because interface already IFF_UP We can test this theory by having a reporter enable debug logging in the supplicant (by adding -dddt" to the systemd supplicant service arguments or otherwise doing the magic "kill/move/start supplicant manually" dance) and looking for: "nl80211: Ignore interface down event since interface wlan0 is up" in the logs, which likely indicates the race happened. I'd argue this is a supplicant race condition, but a regression we created by moving the randomization back into NetworkManager. Not sure how easy this would be to work around, since there aren't a lot of ways to get the supplicant to re-read the MAC except sleeping after !IFF_UP or rfkill. Or move randomization back to the supplicant until we can get the race fixed.
Created attachment 334485 [details] Log with NetworkManager TRACE and supplicant debug Attached slightly sanitised logs with NetworkManager TRACE logging and supplicant debug ("-dddt") as requested.
I did not see this in the supplicant debug logs attached : "nl80211: Ignore interface down event since interface wlan0 is up" I do recall one single time while testing in which authentication succeeded; a race condition seems plausible. Kind regards, Ben.
(In reply to Ben Caradoc-Davies from comment #13) > Created attachment 334485 [details] > Log with NetworkManager TRACE and supplicant debug > > Attached slightly sanitised logs with NetworkManager TRACE logging and > supplicant debug ("-dddt") as requested. Hi Ben, strange, in this log there are also no supplicant messages like "Ignore MLME frame event for foreign address" as in your comment 1. Is it possible, that you didn't properly enable debug logging for the supplicant? Or how did you gather the logfile from comment 1?
Created attachment 334494 [details] NetworkManager TRACE and supplicant verbose log Attached (slightly sanitised) NetworkManager TRACE and supplicant verbose (-dddt) log.
Sorry, Thomas, I was carefully adding verbose logging (-dddt) to the wrong service file (dbus not the systemd as I was manually restarting it). Please find attached the new log file with verbose supplicant logging correctly enabled. This contains contains plenty of these: nl80211: wlxf4f26d10c509: Ignore MLME frame event for foreign address It still does not contain "nl80211: Ignore interface down event since interface wlan0 is up" or any "ignore interface" (case insensitive) strings at all. Sanitisation consists only of removing third-party AP information and all lines with "Key Data" or "hexdump".
(In reply to Ben Caradoc-Davies from comment #17) > Sanitisation consists only of removing third-party AP information and all > lines with "Key Data" or "hexdump". btw. you can exclude dumps of Wi-Fi scan by configuring [logging] level=TRACE domains=ALL,WIFI_SCAN:INFO
same issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1374023#c24
I think this is now fixed upstream: master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=32f7c1d4b9aba597a99128631f07c2985149f303 nm-1-4: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=cd8f2ecc617a896d8007e6fe825c676a626a3b8d If the issue still persists with these fixes, please open a new BZ. See also: https://bugzilla.redhat.com/show_bug.cgi?id=1374023#c30
You may also need fixes for wpa_supplicant < 2.3. See https://bugzilla.gnome.org/show_bug.cgi?id=771005#c37
I am using wpasupplicant 2.5-2+v2.4-3 amd64 on debian unstable so should not need a new wpa_supplicant. I will report back when I get a new nm 1.4.
I have the same problem after this patch is applied (Debian network-manager 1.4.0-4). I have opened a new Bugzilla issue as requested: https://bugzilla.gnome.org/show_bug.cgi?id=771966