After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 770504 - WiFi authentication fails after upgrade to 1.4.0
WiFi authentication fails after upgrade to 1.4.0
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: Wi-Fi
1.4.x
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2016-08-28 04:25 UTC by Ben Caradoc-Davies
Modified: 2016-09-26 01:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
NetworkManager TRACE log file (266.26 KB, text/plain)
2016-08-29 23:11 UTC, Ben Caradoc-Davies
Details
Log with NetworkManager TRACE and supplicant debug (209.01 KB, text/plain)
2016-08-30 20:39 UTC, Ben Caradoc-Davies
Details
NetworkManager TRACE and supplicant verbose log (201.96 KB, text/plain)
2016-08-31 03:57 UTC, Ben Caradoc-Davies
Details

Description Ben Caradoc-Davies 2016-08-28 04:25:15 UTC
Verbose debugging in wpa_supplicant reveals the presence of the scanning MAC address 7E:19:8C:39:8B:F0 during authentication, despite a call before authentication to set-hw-addr for the original MAC address F4:F2:6D:10:C5:09. The MLME frame event is ignored as a foreign address and authentication fails.

Aug 28 15:01:04 ripley NetworkManager[9018]: <info>  [1472353264.6902] device (wlxf4f26d10c509): set-hw-addr: set MAC address to 7E:19:8C:39:8B:F0 (scanning)
[...]
Aug 28 15:01:06 ripley NetworkManager[9018]: <info>  [1472353266.3000] device (wlxf4f26d10c509): set-hw-addr: set-cloned MAC address to F4:F2:6D:10:C5:09 (permanent)
[...]
Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: New station ec:cb:30:60:58:32
Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: Event message available
Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlxf4f26d10c509
Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlxf4f26d10c509(7e:19:8c:39:8b:f0) A1=f4:f2:6d:10:c5:09 A2=ec:cb:30:60:58:32
Aug 28 15:01:06 ripley wpa_supplicant[8462]: nl80211: wlxf4f26d10c509: Ignore MLME frame event for foreign address

Workaround is to downgrade to 1.2.4.
 
See also the Debian bug report:
network-manager: WiFi authentication fails after upgrade to 1.4.0
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=835553

Kind regards,
Ben.
Comment 1 Ben Caradoc-Davies 2016-08-28 06:35:09 UTC
For comparison, a successful authentication with 1.2.4 looks like this (and there are no set-hw-addr entries):

Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Authenticate (ifindex=3)
Aug 28 15:00:03 ripley wpa_supplicant[8462]:   * bssid=ec:cb:30:60:58:32
Aug 28 15:00:03 ripley wpa_supplicant[8462]:   * freq=2447
Aug 28 15:00:03 ripley wpa_supplicant[8462]:   * IEs - hexdump(len=0): [NULL]
Aug 28 15:00:03 ripley wpa_supplicant[8462]:   * Auth Type 0
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Authentication request send successfully
Aug 28 15:00:03 ripley wpa_supplicant[8462]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Event message available
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Drv Event 19 (NL80211_CMD_NEW_STATION) received for wlxf4f26d10c509
Aug 28 15:00:03 ripley NetworkManager[7451]: <info>  [1472353203.2475] device (wlxf4f26d10c509): supplicant interface state: inactive -> authenticating
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: New station ec:cb:30:60:58:32
Aug 28 15:00:03 ripley wpa_supplicant[8462]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1/BSSs/0
Aug 28 15:00:03 ripley wpa_supplicant[8462]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1/BSSs/1
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Event message available
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: Drv Event 37 (NL80211_CMD_AUTHENTICATE) received for wlxf4f26d10c509
Aug 28 15:00:03 ripley wpa_supplicant[8462]: nl80211: MLME event 37 (NL80211_CMD_AUTHENTICATE) on wlxf4f26d10c509(f4:f2:6d:10:c5:09) A1=f4:f2:6d:10:c5:09 A2=ec:cb:30:60:58:32
Comment 2 Ben Caradoc-Davies 2016-08-28 06:37:14 UTC
In the successful authentication, the MLME event 37 information includes the original (A1) MAC address in the brackets. In the failing authentication, this is the scanning address.
Comment 3 Thomas Haller 2016-08-29 10:28:23 UTC
during scanning, NetworkManager 1.4.0 now configures a random MAC address.


This can be disabled by configuring NetworkManager.conf with

  [device]
  wifi.scan-rand-mac-address=no

thus, this may be a better workaround then just uninstalling 1.4.0. See also: https://bugzilla.gnome.org/show_bug.cgi?id=770456#c14




Anyway. During activation, NM will reset the MAC address again. But it does so *before* telling supplicant to connect. At the point where supplicant starts connecting, the new MAC address is already set.

It is expected that after resetting the MAC address, we still receive some Wi-Fi frames for the previous MAC address. But it is not cler, why that would be a problem.




Can you please attach the full logfile of NetworkManager, with level=TRACE. See https://cgit.freedesktop.org/NetworkManager/NetworkManager/plain/contrib/fedora/rpm/NetworkManager.conf?id=c90ec2d8c8a12b44c908bf7f80b23059c29f68fa

Thank you.
Comment 4 Ben Caradoc-Davies 2016-08-29 23:11:00 UTC
Created attachment 334407 [details]
NetworkManager TRACE log file

Attached NetworkManagerTRACE log file as requested. The log has been slightly sanitised to remove third-party AP information.
Comment 5 Ben Caradoc-Davies 2016-08-29 23:17:51 UTC
I can confirm that the workaround suggested for for me:

[device]
wifi.scan-rand-mac-address=no

This bug looks like a duplicate of #770456. However, I am using the ath9k driver, not broadcom as reported for #770456.
Comment 6 Thomas Haller 2016-08-30 09:18:09 UTC
This is a different issue from bug 770456.

There is a 25 seconds timeout to associate with the access point. Still unclear why.
Comment 7 Trevor Bekolay 2016-08-30 12:08:20 UTC
Hello, I am experiencing the same issue as Ben with the same daemon.log output. The suggested workaround of disabling random MAC address restores functionality for me, thanks for that!

I believe I am using the rt2800usb driver; if TRACE level logging would be helpful from me as well, let me know and I will collect and attach it.
Comment 8 Dan Williams 2016-08-30 16:16:58 UTC
What kernel versions is everyone with this problem running?
Comment 9 Trevor Bekolay 2016-08-30 16:21:44 UTC
I'm running Debian unstable; `uname -r` gives 4.6.0-1-amd64.
Comment 10 Thomas Haller 2016-08-30 16:22:37 UTC
(In reply to Trevor Bekolay from comment #7)
> Hello, I am experiencing the same issue as Ben with the same daemon.log
> output. The suggested workaround of disabling random MAC address restores
> functionality for me, thanks for that!
> 
> I believe I am using the rt2800usb driver; if TRACE level logging would be
> helpful from me as well, let me know and I will collect and attach it.

Trevor, can you provide a TRACE level logfile of NM to see whether you have the same issue? Thanks
Comment 11 Ben Caradoc-Davies 2016-08-30 17:23:22 UTC
I am using kernel 4.6.4 on Debian unstable. The Debian package is linux-image-4.6.0-1-amd64.
Comment 12 Dan Williams 2016-08-30 19:20:37 UTC
Thanks.

Thomas, I looked into the wpa_supplicant code (from git master, but I doubt it's changed a ton in the MAC address handling for a while).  Here's what I found:

The supplicant will only update its internal idea of the MAC address if:

1) it handles MAC address randomization itself

2) the interface moves from DISABLED -> ENABLED state; it only enters DISABLED state if the interface is !IFF_UP and was not disabled, and only enters enabled state if the interface is IFF_UP and was disabled.  Unfortunately, there is a race (see src/drivers/driver_nl80211.c::wpa_driver_nl80211_event_rtm_newlink()) because the supplicant code in driver_nl80211.c listens for RTM_NEWLINK messages for IFF_UP, but when it gets the change notification it does a synchronous call to get the interface flags.  If the supplicant gets an RTM_NEWLINK for !IFF_UP but the synchronous call reports the device is IFF_UP then the supplicant does nothing and the MAC address is not read.  So this might be what's happening:

1) NM sets random MAC for scan
2) scan happens
3) supplicant wins the race and reads random MAC address
3) NM sets interface !IFF_UP
4) kernel broadcasts RTM_NEWLINK #1 with !IFF_UP
5) NM re-sets permanent MAC address
6) NM sets interface IFF_UP
7) kernel broadcasts RTM_NEWLINK #2 with IFF_UP
8) supplicant processes RTM_NEWLINK #1 and sees !IFF_UP
9) supplicant does sync call to get flags, sees IFF_UP
10) supplicant does not read MAC address because interface already IFF_UP

We can test this theory by having a reporter enable debug logging in the supplicant (by adding -dddt" to the systemd supplicant service arguments or otherwise doing the magic "kill/move/start supplicant manually" dance) and looking for:

"nl80211: Ignore interface down event since interface wlan0 is up"

in the logs, which likely indicates the race happened.

I'd argue this is a supplicant race condition, but a regression we created by moving the randomization back into NetworkManager.  Not sure how easy this would be to work around, since there aren't a lot of ways to get the supplicant to re-read the MAC except sleeping after !IFF_UP or rfkill.  Or move randomization back to the supplicant until we can get the race fixed.
Comment 13 Ben Caradoc-Davies 2016-08-30 20:39:45 UTC
Created attachment 334485 [details]
Log with NetworkManager TRACE and supplicant debug

Attached slightly sanitised logs with NetworkManager TRACE logging and supplicant debug ("-dddt") as requested.
Comment 14 Ben Caradoc-Davies 2016-08-30 20:41:50 UTC
I did not see this in the supplicant debug logs attached : "nl80211: Ignore interface down event since interface wlan0 is up"

I do recall one single time while testing in which authentication succeeded; a race condition seems plausible.

Kind regards,
Ben.
Comment 15 Thomas Haller 2016-08-30 21:38:43 UTC
(In reply to Ben Caradoc-Davies from comment #13)
> Created attachment 334485 [details]
> Log with NetworkManager TRACE and supplicant debug
> 
> Attached slightly sanitised logs with NetworkManager TRACE logging and
> supplicant debug ("-dddt") as requested.

Hi Ben,

strange, in this log there are also no supplicant messages like
  "Ignore MLME frame event for foreign address"
as in your comment 1.

Is it possible, that you didn't properly enable debug logging for the supplicant? Or how did you gather the logfile from comment 1?
Comment 16 Ben Caradoc-Davies 2016-08-31 03:57:25 UTC
Created attachment 334494 [details]
NetworkManager TRACE and supplicant verbose log

Attached (slightly sanitised) NetworkManager TRACE and supplicant verbose (-dddt) log.
Comment 17 Ben Caradoc-Davies 2016-08-31 04:03:14 UTC
Sorry, Thomas, I was carefully adding verbose logging (-dddt) to the wrong service file (dbus not the systemd as I was manually restarting it). Please find attached the new log file with verbose supplicant logging correctly enabled.

This contains contains plenty of these:

nl80211: wlxf4f26d10c509: Ignore MLME frame event for foreign address

It still does not contain "nl80211: Ignore interface down event since interface wlan0 is up" or any "ignore interface" (case insensitive) strings at all.

Sanitisation consists only of removing third-party AP information and all lines with "Key Data" or "hexdump".
Comment 18 Thomas Haller 2016-08-31 11:38:59 UTC
(In reply to Ben Caradoc-Davies from comment #17)

> Sanitisation consists only of removing third-party AP information and all
> lines with "Key Data" or "hexdump".

btw. you can exclude dumps of Wi-Fi scan by configuring

  [logging]
  level=TRACE
  domains=ALL,WIFI_SCAN:INFO
Comment 19 Thomas Haller 2016-09-09 13:26:35 UTC
same issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1374023#c24
Comment 21 Thomas Haller 2016-09-16 08:48:43 UTC
You may also need fixes for wpa_supplicant < 2.3.
See https://bugzilla.gnome.org/show_bug.cgi?id=771005#c37
Comment 22 Ben Caradoc-Davies 2016-09-16 20:26:58 UTC
I am using wpasupplicant 2.5-2+v2.4-3 amd64 on debian unstable so should not need a new wpa_supplicant. I will report back when I get a new nm 1.4.
Comment 23 Ben Caradoc-Davies 2016-09-26 01:35:43 UTC
I have the same problem after this patch is applied (Debian network-manager 1.4.0-4). I have opened a new Bugzilla issue as requested:
https://bugzilla.gnome.org/show_bug.cgi?id=771966