GNOME Bugzilla – Bug 697460
Frequent disconnects from MIT wireless network
Last modified: 2013-04-10 19:57:46 UTC
Created attachment 240864 [details] Interval from syslog (disconnect at 15:59:48) My Intel 6250 wireless card works fine on every wireless network I use except the MIT campus network (MIT SECURE), where it usually disconnects a few minutes after connecting, and fails to reassociate. MIT SECURE has a large number of Cisco access points (I see 13 from where I’m sitting), and uses PEAP+MSCHAPv2 authentication. I’m using NetworkManager 0.9.8.0 with wpa_supplicant 2.0 on Ubuntu raring amd64. I’ve attached a relevant interval from syslog, with NetworkManager at debug level and wpa_supplicant at msgdump level. The disconnect happens at 15:59:48, a few seconds after a background scan was triggered.
The supplicant finds a stronger access point to roam to: Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: wlan0: Considering within-ESS reassociation Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: wlan0: Current BSS: 00:21:d8:49:d1:9c level=-58 Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: wlan0: Selected BSS: 00:21:d8:49:d1:93 level=-45 then the kernel driver attempts to probe the AP and it fails to receive a response: Apr 6 15:59:48 fixed-disk kernel: [ 2144.807693] wlan0: direct probe to 00:21:d8:49:d1:93 (try 2/3) Apr 6 15:59:48 fixed-disk kernel: [ 2145.011202] wlan0: direct probe to 00:21:d8:49:d1:93 (try 3/3) Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: wlan0: Event AUTH_TIMED_OUT (14) received Apr 6 15:59:48 fixed-disk kernel: [ 2145.214699] wlan0: authentication with 00:21:d8:49:d1:93 timed out Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: wlan0: SME: Authentication timed out Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: Added BSSID 00:21:d8:49:d1:93 into blacklist Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: wlan0: Another BSS in this ESS has been seen; try it next Apr 6 15:59:48 fixed-disk wpa_supplicant[6507]: BSSID 00:21:d8:49:d1:93 blacklist count incremented to 2 then the supplicant tries to reassociate to the previous AP, but fails due to a driver bug: Apr 6 15:59:49 fixed-disk wpa_supplicant[6507]: nl80211: Authenticate (ifindex=7) Apr 6 15:59:49 fixed-disk wpa_supplicant[6507]: * bssid=00:21:d8:49:d1:9c Apr 6 15:59:49 fixed-disk wpa_supplicant[6507]: * freq=5805 Apr 6 15:59:49 fixed-disk wpa_supplicant[6507]: * IEs - hexdump(len=0): [NULL] Apr 6 15:59:49 fixed-disk wpa_supplicant[6507]: * Auth Type 0 Apr 6 15:59:49 fixed-disk wpa_supplicant[6507]: wlan0: nl80211: MLME command failed (auth): ret=-114 (Operation already in progress)
What wpa_supplicant version do you have? So the analysis here indicates that: 1) The supplicant's roaming thresholds are very small, and that means the supplicant is trying to roam too often; I've patched wpa_supplicant in Fedora to compensate for this, but it's really a problem in wpa_supplicant upstream. 2) the Cisco wifi infrastructure may be using "client steering" to load balance access points and force clients to associate with less-busy access points; this could be the cause of the failure to probe-request 00:21:d8:49:d1:93. That's not very nice behavior of the network, but it's a fact of life and the supplicant and kernel drivers need to handle this 3) the kernel driver and mac80211 stack have some bugs in your kernel version, which is the cause of the -114 (operation already in progress) error. Basically, all these problems are kernel or supplicant problems; NetworkManager isn't really involved...
Discussing this with kernel wifi developers, it appears the mac80211 bits aren't communicating to the cfg80211 layer about the forced disconnection. A potential fix for that issue is: http://p.sipsolutions.net/84770fab4e3fc6ba.txt So my suggestion is to refile this in Launchpad under the "kernel" component for now, and see if the Ubuntu developers can build you a test kernel with the linked patch, and see if that helps the issue. If you can leave a link to the Launchpad bug in this bug report so we can track the issue too, that would be great. Thanks!
For the record... (johill is a kernel mac80211 developer) (04:52:36 PM) johill: hmm (04:52:40 PM) johill: this looks weird/buggy (04:53:05 PM) johill: it kinda looks like we don't tell cfg80211 we're now disconnected? (04:53:36 PM) dcbw: and so the cfg8021 layer rejects with EINPROGRESS? (04:53:42 PM) johill: -ealready (04:53:44 PM) dcbw: ah (04:54:01 PM) johill: I suspect wdev->current_bss never gets cleared (04:54:15 PM) johill: and because auth fails it never gets set again either (04:54:28 PM) johill: so then it's stuck and that's why it's also reporting the wrong thing in the scan results
(In reply to comment #2) > What wpa_supplicant version do you have? 2.0 (which I packaged myself at https://bugs.launchpad.net/bugs/1099755, to see if it would work better than raring’s 1.0). (In reply to comment #3) > A potential fix for that issue is: > http://p.sipsolutions.net/84770fab4e3fc6ba.txt Thanks. I compiled a patched mac80211.ko. I’ll test it on campus tomorrow, and I’ll file a Launchpad bug with the results.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1166648
Comments from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1166648: me: I just reproduced the failure twice without the patch, and I seem to have no problems with the patch. It’s a definite improvement. Joseph Salisbury (jsalisbury): Thanks for the pointer to the patch. Do you know if this patch will be sent upstream for inclusion in the mainline kernel?
Johannes was able to reproduce the issue in controlled environment today, and verified the patch does fix the issue. Patch submitted to linux-wireless today: [PATCH v3.9] mac80211: fix cfg80211 interaction on auth/assoc request