GNOME Bugzilla – Bug 513820
Wireless lags due to scanning
Last modified: 2016-07-01 11:52:16 UTC
Every so often, NetworkManager will scan wireless interfaces for new networks. This is good behavior. However, occasionally users have latency-sensitive tasks, like online videogames. It would be nice if it was possible to temporarily disable scanning without killing NetworkManager and manually configuring interfaces.
*** Bug 519987 has been marked as a duplicate of this bug. ***
Perhaps scanning could be done less frequently if you're connected to a network?
It already is; it follows a back-off algorithm up to two minutes when connected. If the driver has problems scanning in the time allowed, the driver should probably be fixed. There's always going to be a balance between scanning frequently to ensure that roaming and reassociation on transient errors works correctly, and minimizing lag due to the scan when it occurs.
One solution is to back off scanning while there's a lot of active traffic on the interface over a certain period of time.
*** Bug 581412 has been marked as a duplicate of this bug. ***
I have to ask, why scan at all when you are connected to a wireless network? It seems some drivers suck at scanning and processing network traffic...
Dan, you wrote "If the driver has problems scanning in the time allowed, the driver should probably be fixed". What is "the time allowed"? How long do you expect a scan to take? On my Atheros AR5212 using the ath5k driver a scan usually takes between 3-5 seconds and during that time the network seems to be paused. This totally ruins any latency sensitive application such as an interactive ssh connection or VOIP. Even a sub-second pause would be bad for VOIP applications. So I'm wondering if you consider a few seconds to be normal or not. Anyway, I'd suggest changing the behaviour of NetworkManager so that it doesn't do any periodic scanning at all when associated with an AP, unless it is explicitly requested by the user (e.g the user left clicks on the nm-applet icon to view the list of available wireless networks). A workaround would be the "Pause wireless scanning" option in nm-applet that is talked about in Bug 165933, but it seems to be gone in NetworkManager 0.7.1 in Fedora 9.
Created attachment 138719 [details] [review] This patch disables network-manager scanning if the device is already connected.
I also have this problem. I have created a workaround patch that disables scanning if already connected. This patched is based on a suggestion by Christer Weinigel in the Redhat bug report and, of course, is not the ideal solution. Bug report in the Redhat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=490493 Bug report with the patch in Ubuntu Launchpad: https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/373680
Same here, using a simple "don't f-in scan every 2min" patch I did myself before finding this bug report. Christer is right, periodic network disruptions are more then just annoying, with VoIP, streaming, interactive terminals etc it's plain devastating.
I have the same problem, and have resorted to locally patching Network Manager as well. (Atheros AR5212, ath5k driver.) I appreciate that the underlying problem may be that the driver is not very robust wrt scanning during operation, but the fact remains that this problem does not manifest unless you're using Network Manager. I've been setting up my wlan from /etc/network/interfaces for years, but switched to using Network Manager because of the VPN support. (Which is very nice!) The result was that I could no longer stream reliably to my media player over wlan. So, I spend a not insignificant amount of time tracing the problem, only to discover that the bug that causes it was logged here a year and a half ago. Let me gently suggest that this problem be dealt with by Network Manager, and hopefully others might be spared the same frustration.
The problem does not exist when you don't use NetworkManager because the other solutions don't provide the feature set that NetworkManager does. The kernel drivers need to be fixed (and have been, as of 2.6.32, with background scanning functionality). If the drivers suck, NM isn't going to work around the drivers. The drivers need to get fixed. That's the only way things get better. ath5k has been somewhat troublesome of late, I'll grant you, but that doesn't mean NM should hack around the problem. The driver simply needs to get fixed. So what's wrong with the media player that it can't buffer the video/music correctly? I can stream netflix or Pandora or Lala or NPR all day long on my computer and I don't get interruptions. The streaming client that is receiving the stream shouldn't be stupid enough to be playing unbuffered media. That's what buffers are for...
Some applications can't buffer. See also http://www.xkcd.com/654/ . It would make a lot of users with buggy drivers happy if it was possible to disable scanning. I'm not going to fix all the drivers, neither are you, and telling people to only buy hardware that works isn't the solution.
Upstream drivers need to get fixed if they have a problem, and they have gotten fixed every time there's been something that we need of them. Often I've done that fixing myself. The point is, if the driver is buggy, it is the thing that needs to get fixed. I write software for working drivers. I don't hack around shit that's broken. And when the drivers are broken, the correct place to report that is the kernel driver itself, so that it can get fixed where the fix belongs: in the driver. That's about all I've got to say about this subject. That said, there are some things NM can do to delay scanning for a little while when there's lots of traffic happening, and do the scanning after a longer delay or when the interface becomes more idle. But that doesn't change the fact that if the driver or stack is broken, that's the place that it needs to get fixed.
I'm sorry, but this all comes across to me as unwillingness to adapt to an imperfect world. I can sympathise with the sentiment that problems need to be fixed at their source, but if you're really going to take this hardliner approach to it, it would be more honest to let your users in on it as well. Having nm-applet pop up an alert if it detects an ath5k-based interface, for instance. "Warning, using Network Manager with this version of the ath5k driver will cause your wlan connection to experience up to 5% downtime!" should be a fair warning. (Ok, perhaps not precisely those words. But I'm not joking. This is degradation is so bad -- and well known! -- that it seems rather unfair to not at least inform the user.) That my media player should be able to do better buffering is neither here nor there. The fact of the matter is that -- on the same hardware and software that I was previously using without problem -- switching to Network Manager has caused me to experience 7-second dropouts every two minutes. To me as a user, this amounts to seriously degraded reliability.
(In reply to comment #15) > I'm sorry, but this all comes across to me as unwillingness to adapt to an > imperfect world. I can sympathise with the sentiment that problems need to be > fixed at their source, but if you're really going to take this hardliner > approach to it, it would be more honest to let your users in on it as well. > Having nm-applet pop up an alert if it detects an ath5k-based interface, for > instance. "Warning, using Network Manager with this version of the ath5k > driver will cause your wlan connection to experience up to 5% downtime!" should > be a fair warning. Unfortunately that's not possible, simply because you can never be sure what distributions have which patches. So just matching kernel=2.6.30 doesn't work, because Fedora might fix the issue with a patch that Mandriva might not apply, or Ubuntu might apply a patch that breaks the driver. All while keeping the same kernel version. And drivers don't generally have their own versions, because the driver version *is* the kernel version. Welcome to open-source... blacklisting this simply won't work. What *will* work is fixing the bug at the source, and not hacking around the problem. Then you apply the fix to the -stable kernels. Then you get your distribution to apply the fix to the bug to an update kernel and push it out to their users. That's how this process works. > That my media player should be able to do better buffering is neither here nor > there. The fact of the matter is that -- on the same hardware and software > that I was previously using without problem -- switching to Network Manager has > caused me to experience 7-second dropouts every two minutes. To me as a user, > this amounts to seriously degraded reliability. Which is fixed by the 2.6.32 kernel with background scanning.
With 2.6.32, there is still a noticeable drop-off it bandwidth and latency while scanning: 64 bytes from icarus.local (192.168.1.140): icmp_seq=212 ttl=64 time=28.3 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=213 ttl=64 time=9.78 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=214 ttl=64 time=74.4 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=215 ttl=64 time=0.764 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=216 ttl=64 time=97.2 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=217 ttl=64 time=4.67 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=219 ttl=64 time=167 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=220 ttl=64 time=88.6 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=221 ttl=64 time=48.8 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=222 ttl=64 time=2.16 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=223 ttl=64 time=6.20 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=224 ttl=64 time=284 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=225 ttl=64 time=187 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=226 ttl=64 time=125 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=227 ttl=64 time=59.9 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=228 ttl=64 time=0.704 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=230 ttl=64 time=12.1 ms 64 bytes from icarus.local (192.168.1.140): icmp_seq=231 ttl=64 time=7.56 ms I'm on a very common network setup, there is one AP that I want to connect to and one AP only. I set the BSSID of the AP in the network manager config. If I have network manager to only use the BSSID, is there any advantage to scanning? Under what condition would information returned from the scan would cause network manager to change state under these conditions?
My 2 cents. If I'm right, background scanning is done in order to make roaming support available and to enable automatic network connections; some drivers have been fixed, some will be and some others maybe won't. More specifically, the drivers based on the old lib80211 stack, such as ipw2200, will likely require additional time and efforts (and maybe firmware modifications) to make background scanning non obtrusive. Maybe, in the meanwhile, the background scans could be avoided when: - no "auto connect" networks have been configured - a BSSID have been specified for currently active connection (roaming is excluded in this case, right?) If I'm not wrong, satisfying these requirements would make background scanning avoidable until requested by user (i.e. by clicking on the nm-applet icon), thus giving a consistent way out for those users who still need to use drivers whose behaviour is not correct.
Dan, It is time for someone to shit or get off the pot. This ticket is like 2yrs old. If you are still unwilling to add an option to allow/deny background scanning, then close the damn ticket stating so. I am using the broadcom-sta drivers. They are not open-source and not provided by the kernel folks. Chances are they will never be fixed. This will affect any newer DELL user as well. This issue actually causes my connection to drop periodically. For instance my connection has dropped about 7 times now reading and replying to this post. Each time I have to manually connect back to the network.
Dan, Maurizio's suggestion seems right on the money to me. I've just finished reading a whole bunch of tickets (spanning three or four different BTSs) related to this issue, and generally I agree with you both that broken drivers should be fixed and that application developers need to take a stand to force the issue (and I see you have been remarkably successful with such tactics for other problems in the past). I also agree that UI should not present unnecessary and confusing options. However, how is Maurizio wrong in the conclusion that if the user has explicitly specified a BSSID for the currently-associated network then roaming is impossible and therefore scanning is at best unnecessary and at worst inappropriate? I see your position has changed since bug #165933 (five years ago!), although I didn't find a post where you express the reasons for the change. I'm guessing that it's now gone not just in the UI but also in the gconf since the xml I found in my ~/.gconf/system/networking/ doesn't reference anything related to scanning. To add some vaguely useful "me too" data to the thread: I have a Dell Latitude E5500 with its built-in BCM4322. Everything generally worked fine under Ubuntu Intrepid but I started seeing the roaming to and from (none) ((none)) once I upgraded to Lucid (and figured out that I now needed to expressly install bcmwl-kernel-source to get the wl driver that had Just Worked in Intrepid). Since I just upgraded to Lucid yesterday, it's already running 2.6.32 and does the background scanning most of the time without dropping connections; I only suffer high latencies during background scans and seemingly incorrect roaming complaints in syslog. Over the past 5.5 hours (stationary in my home with its single access point and only a few unknown neighboring APs visible), NetworkManager has written 2633 lines to syslog. Most of these are roaming either to or from (none) ((none)), while a few represent the three times that the connection has been dropped, presumably while scanning.
Maurizio's suggestion is acceptable; I actually had the same thought a few weeks ago after back-and-forth in bug 597998 which I'll dupe to this one. If you've locked your connection to a single AP's BSSID, it's acceptable to disable the background scanning because you clearly do not need to keep the scan list updated for quick roaming. Note that this just means that *NM* wouldn't trigger scans, the supplicant may still trigger scans at various points if/when it looses the AP or due to driver issues.
Hmm, not bug 597998. Trying to find the one I mean.
Bug 580185 is the one I mean. Duping to this one.
*** Bug 580185 has been marked as a duplicate of this bug. ***
0ea9329b6e75e86ab64fc29614c911abc8e3658b (master) and also cherry-picked to 0.8.1.