GNOME Bugzilla – Bug 580185
NetworkManager drops the network every 120 seconds
Last modified: 2010-05-22 16:29:11 UTC
Please describe the problem: I'm running Ubuntu 9.04. Every 120 seconds, like clockwork, this was happening: Apr 24 15:11:10 hackworth NetworkManager: <debug> [1240600271.001053] periodic_update(): Roamed from BSSID 00:13:46:95:E3:8F (fishnet) to (none) ((none)) Apr 24 15:11:23 hackworth NetworkManager: <debug> [1240600283.004541] periodic_update(): Roamed from BSSID (none) ((none)) to 00:13:46:95:E3:8F (fishnet) Apr 24 15:13:11 hackworth NetworkManager: <debug> [1240600391.003921] periodic_update(): Roamed from BSSID 00:13:46:95:E3:8F (fishnet) to (none) ((none)) Apr 24 15:13:23 hackworth NetworkManager: <debug> [1240600403.003504] periodic_update(): Roamed from BSSID (none) ((none)) to 00:13:46:95:E3:8F (fishnet) Each time this happens, for about 10 seconds the network becomes completely unusable. This happens to other people, too, see: https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/291760 As is described in that launchpad bug page, applying the following patch to restore the old "hack" to NetworkManager: http://launchpadlibrarian.net/25829754/dif.txt fixes the problem for me completely. Note that without the patch, my system is effectively unusuable -- on my Atheros a/b/g card, the network scan takes so many seconds that it is pretty much impossible to do interactive work without the patch applied. I understand that the "hack" was perhaps wrong, but in my case without it my network is dead for ten or more seconds every two minutes. Steps to reproduce: 1. Own a laptop with an older atheros like an ath5k Atheros AR5212 chip 2. Run NetworkManager 3. Observe problem 4. Apply the http://launchpadlibrarian.net/25829754/dif.txt patch, observe that the problem vanishes. Actual results: Expected results: Does this happen every time? Other information: This has apparently been hitting a lot of people for quite a while. It was apparently discussed on the NetworkManager mailing list (see the launchpad bug report above) and was thought of as minor on the basis that it would only hit a small number of people for a fraction of a second every two minutes. That perception is incorrect -- this is crippling if you have an 802.11a card and you use 802.11a.
Most A-capable cards (ath5k, iwlagn, ipw2915) don't have a problem with scanning and disconnection. What driver is driving your card? What *exact* kernel version do you have? Both 2.6.27 and 2.6.28 have potential issues with hidden SSIDs too, is your access point broadcasting its SSID? Furthermore, can you do the following and attach the log output from NetworkManager? 1) stop NetworkManager 2) from a root terminal, run: NM_ACTIVE_AP_DEBUG=1 /usr/sbin/NetworkManager --no-daemon 3) connect to the AP, let the problem happen then grab the logs from NM and attach. Thanks!
Some of the information you are looking for is in the Ubuntu bug repository on launchpad. However... 0) Not exactly sure how to identify the precise driver running my card. dmesg says: ath5k phy0: Atheros AR5212 chip found (MAC: 0x56, PHY: 0x41) and lsmod | fgrep ath5k gives me a hit for ath5k 1) $ uname -a Linux hackworth 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/Linux i.e. standard ubuntu Jaunty 2) My base station is not hidden, it broadcasts its SSID BTW, from the looks of it on the Ubuntu lists and bug tracker, this is a common problem. I'll run NetworkManager in debugging mode shortly and get you the log output.
Created attachment 133685 [details] Requested log. Requested log.
Okay, that log should do it. At this point you have what you requested.
I can confirm this bug. If i use an other network management tool (like wicd) this error doesn't occur any mor (including disconnects/timeouts...)
Same here, using ath5k as well. Anything we can do to help get this resolved?
Note that 2.6.32 will probably make this problem a *lot* better, because while connected, the mac80211-based drivers (which drive most of the modern hardware out there) do staggered scanning and return to the operating channel between segments of the scan, meaning you wont' be gone long enough to loose sync with the AP. Also note that there are kernel bugs, including one with nullfunc PS frames during scan, that make scanning while connected somewhat unreliable up until the 2.6.30 kernel. commit a9a6ffffd05f97e6acbdeafc595e269855829751 Author: Kalle Valo <kalle.valo@nokia.com> Date: Wed Mar 18 14:06:44 2009 +0200 mac80211: don't drop nullfunc frames during software scan What kernels are everyone running? If you're not running 2.6.30, you'll want to get it, or have your distro backport that patch to their current kernel. It's quite a small, low-risk patch, so if they don't accept the backport, they need some help in the risk-analysis department...
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for. Thanks!
Gah, I've been commenting on the Ubuntu report and forgot this bug existed. https://bugs.launchpad.net/bugs/291760 As I mentioned there https://bugs.launchpad.net/ubuntu/+source/linux/+bug/291760/comments/96 the behavior is really terrible on my new Dell Precision M4400 laptop with Broadcom BCM4322 on a current Ubuntu, with 2.6.33 kernel. "Much worse" means that it loses the connection every *20* seconds, not just every 120 seconds. Also, even on my other machines that don't lose the connection so frequently, the scans still mean that packets are lost. I have a lot of status update daemons on my network that send notifications using UDP broadcasts (UPS monitor, Caller ID for incoming phone calls, a few other things). The machines doing this background scanning lose those updates if they occur while a scan is happening. Simply put, if you have any realtime activity on your network, background scanning totally breaks it. If you insist on keeping this background scanning feature, please at least provide a config switch to turn it off. It does far more harm than good.
I'm running 10.4 with the stock 2.6.32-21-generic kernel and this is still happening on my macbook 5.1. I'm still getting the May 4 16:52:18 macpro-lnx NetworkManager: <debug> [1273006338.003157] periodic_update(): Roamed from BSSID (none) ((none)) to 00:18:74:4A:22:B0 (blizzard) May 4 16:53:48 macpro-lnx NetworkManager: <debug> [1273006428.003213] periodic_update(): Roamed from BSSID 00:18:74:4A:22:B0 (blizzard) to (none) ((none)) The ssid I'm on is not public but other than this it works. Is there any other information I can provide?
If the card is actually *dropping* the connection during a scan, then the card/driver is clearly broken and needs to be fixed in the upstream kernel. Out-of-kernel drivers will need to be fixed by the vendor. I expect that in some cases periodic scanning will slightly increase latency during the 2 - 5 seconds the scan could be in progress, but it certainly should not be disassociating you from the AP. Again, if you get disassociated, there are driver bugs and those bugs need to be fixed in the drivers.
Those 2-5 second scan intervals can also make the difference between smooth realtime streaming of video and broken video streams, and the problem gets much worse for higher quality/higher bitrate media. Whether the drivers have bugs or not is kind of beside the point. The point is, regardless of the device or driver involved, the current behavior is unsuitable for a number of scenarios and there are a large number of users hitting those scenarios. You're just sitting here going "la-la-la-la-I-can't-hear-you" and ignoring those users. Why are you so opposed to making this behavior configurable?
This is an inaccurate representation of the facts. I'm on the record on the mailing list or here in bugzilla acknowledging the fact that periodic scanning does cause problems in cases where low latency is required. And I've proposed various solutions for that issue that do not need configuration by end-users. Again; if it *doesn't need to be configured* (and most things really don't) then there simply should not be a checkbox or knob for it. And there are two reasons why this doesn't need to be configured at all: 1) applications can tell the kernel about their latency requirements through socket options and various other mechanisms that are currently working their way through the kernel upstream. These will also be used for aggressive power management of wifi cards so that the cards enter low-power mode (which will have nothing to do wtih NM) when they aren't used and when latency is not a problem. This same mechanism can be used to alter the periodic scanning behavior in NetworkManager. 2) In the absence above, traffic patterns can be used to suppress periodic scanning for a time until the device is not being used. This functionality depends on the generic internal framework for connection time & usage tracking which is being handled in a separate bug. The combination of these two things should fix issues with periodic scanning. Periodic scanning is still required (and indeed may well be done by the driver and/or supplicant even without involvement from NetworkManager) to support any type of roaming between access points, which is why with #2 above we'd still do a scan when the device was idle. But this bug, titled "NetworkManager drops the network every 120 seconds" is clearly invalid, because if the driver actually drops the network connection during a scan, the driver absolutely needs fixing. I'm not going to work around broken, crappy drivers. So if you've got a problem with latency during the scan because you're using VOIP or playing a network game, that's great and we can and will fix that, and we have a path (described above) to do so. But if you're getting dropped from the AP every time a scan happens, that's a driver bug, not a bug in NetworkManager. And it should be fixed in the driver. Not worked around.
(In reply to comment #13) > This is an inaccurate representation of the facts. I'm on the record on the > mailing list or here in bugzilla acknowledging the fact that periodic scanning > does cause problems in cases where low latency is required. And I've proposed > various solutions for that issue that do not need configuration by end-users. > > Again; if it *doesn't need to be configured* (and most things really don't) > then there simply should not be a checkbox or knob for it. > > And there are two reasons why this doesn't need to be configured at all: That's a wonderful vision of the future you've painted, but it's only that - a vision. This bug report was opened over a year ago, and the problem has persisted continually since then. Applying automation to solve this is certainly a good idea. But that automation doesn't work yet, and in the absence of it, a manual control is needed. You could have added a manual switch a year ago, these bug reports would have closed, and then you could quietly work on the automated solution at your own pace without anyone having anything to complain about. When it was ready you could smoothly phase out the config switch. Solutions "working their way through the kernel upstream" aren't going to be widely deployed for probably years - after the kernel features stabilize it will still be months to years before the sensitive applications are modified to take advantage of the feature. "Walk first, then run" - instead of letting people walk, you're saying "wait here, eventually you'll be able to run." Wifi roaming is probably irrelevant to the majority of users. People mainly use wifi in stationary locations - cafes, homes, and offices. You really need to re-examine these priorities.
(In reply to comment #14) > (In reply to comment #13) > > This is an inaccurate representation of the facts. I'm on the record on the > > mailing list or here in bugzilla acknowledging the fact that periodic scanning > > does cause problems in cases where low latency is required. And I've proposed > > various solutions for that issue that do not need configuration by end-users. > > > > Again; if it *doesn't need to be configured* (and most things really don't) > > then there simply should not be a checkbox or knob for it. > > > > And there are two reasons why this doesn't need to be configured at all: > > That's a wonderful vision of the future you've painted, but it's only that - a > vision. This bug report was opened over a year ago, and the problem has > persisted continually since then. > > Applying automation to solve this is certainly a good idea. But that automation > doesn't work yet, and in the absence of it, a manual control is needed. You > could have added a manual switch a year ago, these bug reports would have > closed, and then you could quietly work on the automated solution at your own > pace without anyone having anything to complain about. When it was ready you > could smoothly phase out the config switch. Wrong. You can never "smoothly" phase stuff out. The best approach is not to add it at all, but handle the situation more gracefully as I've suggested. > Solutions "working their way through the kernel upstream" aren't going to be > widely deployed for probably years - after the kernel features stabilize it > will still be months to years before the sensitive applications are modified to > take advantage of the feature. Certainly not years away. Which is why I posted #2, which can be done immediately and is not dependent on #1. > "Walk first, then run" - instead of letting people walk, you're saying "wait > here, eventually you'll be able to run." > > Wifi roaming is probably irrelevant to the majority of users. People mainly use > wifi in stationary locations - cafes, homes, and offices. You really need to > re-examine these priorities. This is completely incorrect. Any business or educational institution almost invariably has multiple access points in the same SSID, and thus seamless roaming is absolutely essential. It's not just users in homes.
(In reply to comment #15) > (In reply to comment #14) > > That's a wonderful vision of the future you've painted, but it's only that - a > > vision. This bug report was opened over a year ago, and the problem has > > persisted continually since then. > > > > Applying automation to solve this is certainly a good idea. But that automation > > doesn't work yet, and in the absence of it, a manual control is needed. You > > could have added a manual switch a year ago, these bug reports would have > > closed, and then you could quietly work on the automated solution at your own > > pace without anyone having anything to complain about. When it was ready you > > could smoothly phase out the config switch. > > Wrong. You can never "smoothly" phase stuff out. The best approach is not to > add it at all, but handle the situation more gracefully as I've suggested. The best approach is to solve users' problems today (or yesterday, in this case) and not tomorrow. > > Wifi roaming is probably irrelevant to the majority of users. People mainly use > > wifi in stationary locations - cafes, homes, and offices. You really need to > > re-examine these priorities. > > This is completely incorrect. Any business or educational institution almost > invariably has multiple access points in the same SSID, and thus seamless > roaming is absolutely essential. It's not just users in homes. Nonsense. Even though those workplaces have multiple access points the majority of users sit in one location when they work. They certainly don't need seamless uninterrupted service while they walk from room to room. What people *do* need is seamless uninterrupted service while they're actually working.
(In reply to comment #13) > So if you've got a problem with latency during the scan because you're using > VOIP or playing a network game, that's great and we can and will fix that, and > we have a path (described above) to do so. I have exactly this issue. My network connection will not dropped but while it scans in the background the latency increases to about 200 or 300 ms. This happen about every 10 sec. The icon in the tray change while it happen but I can say for sure that the connection is not dropped. I'm using Ubuntu 10.04 with a broadcom STA driver. I have no other possibility because it's a wireless N adapter. If you need some further information please let me know.
Update: as discussed in bug 513820, I'm open to suppressing periodic scanning when you've locked your connection to a specific BSSID, because in that case roaming is not a concern. *** This bug has been marked as a duplicate of bug 513820 ***