After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 580185 - NetworkManager drops the network every 120 seconds
NetworkManager drops the network every 120 seconds
Status: RESOLVED DUPLICATE of bug 513820
Product: NetworkManager
Classification: Platform
Component: general
0.7.x
Other All
: Normal critical
: ---
Assigned To: Dan Williams
Dan Williams
Depends on:
Blocks:
 
 
Reported: 2009-04-24 22:57 UTC by Perry E. Metzger
Modified: 2010-05-22 16:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Requested log. (31.55 KB, text/plain)
2009-04-30 19:25 UTC, Perry E. Metzger
Details

Description Perry E. Metzger 2009-04-24 22:57:19 UTC
Please describe the problem:
I'm running Ubuntu 9.04.

Every 120 seconds, like clockwork, this was happening:

Apr 24 15:11:10 hackworth NetworkManager: <debug> [1240600271.001053] periodic_update(): Roamed from BSSID 00:13:46:95:E3:8F (fishnet) to (none) ((none)) 
Apr 24 15:11:23 hackworth NetworkManager: <debug> [1240600283.004541] periodic_update(): Roamed from BSSID (none) ((none)) to 00:13:46:95:E3:8F (fishnet) 
Apr 24 15:13:11 hackworth NetworkManager: <debug> [1240600391.003921] periodic_update(): Roamed from BSSID 00:13:46:95:E3:8F (fishnet) to (none) ((none)) 
Apr 24 15:13:23 hackworth NetworkManager: <debug> [1240600403.003504] periodic_update(): Roamed from BSSID (none) ((none)) to 00:13:46:95:E3:8F (fishnet) 

Each time this happens, for about 10 seconds the network becomes completely unusable. This happens to other people, too, see:

https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/291760

As is described in that launchpad bug page, applying the following patch to restore the old "hack" to NetworkManager:

http://launchpadlibrarian.net/25829754/dif.txt

fixes the problem for me completely. Note that without the patch, my system is effectively unusuable -- on my Atheros a/b/g card, the network scan takes so many seconds that it is pretty much impossible to do interactive work without the patch applied. I understand that the "hack" was perhaps wrong, but in my case without it my network is dead for ten or more seconds every two minutes.


Steps to reproduce:
1. Own a laptop with an older atheros like an ath5k Atheros AR5212 chip
2. Run NetworkManager
3. Observe problem
4. Apply the http://launchpadlibrarian.net/25829754/dif.txt patch, observe that the problem vanishes.


Actual results:


Expected results:


Does this happen every time?


Other information:
This has apparently been hitting a lot of people for quite a while. It was apparently discussed on the NetworkManager mailing list (see the launchpad bug report above) and was thought of as minor on the basis that it would only hit a small number of people for a fraction of a second every two minutes. That perception is incorrect -- this is crippling if you have an 802.11a card and you use 802.11a.
Comment 1 Dan Williams 2009-04-30 18:55:29 UTC
Most A-capable cards (ath5k, iwlagn, ipw2915) don't have a problem with scanning and disconnection.  What driver is driving your card?  What *exact* kernel version do you have?  Both 2.6.27 and 2.6.28 have potential issues with hidden SSIDs too, is your access point broadcasting its SSID?

Furthermore, can you do the following and attach the log output from NetworkManager?

1) stop NetworkManager
2) from a root terminal, run:
NM_ACTIVE_AP_DEBUG=1 /usr/sbin/NetworkManager --no-daemon
3) connect to the AP, let the problem happen

then grab the logs from NM and attach.  Thanks!
Comment 2 Perry E. Metzger 2009-04-30 19:17:08 UTC
Some of the information you are looking for is in the Ubuntu bug repository on launchpad. However...

0) Not exactly sure how to identify the precise driver running my card. dmesg says: 
ath5k phy0: Atheros AR5212 chip found (MAC: 0x56, PHY: 0x41)
and lsmod | fgrep ath5k gives me a hit for ath5k

1) $ uname -a
Linux hackworth 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/Linux
i.e. standard ubuntu Jaunty

2) My base station is not hidden, it broadcasts its SSID

BTW, from the looks of it on the Ubuntu lists and bug tracker, this is a common problem.

I'll run NetworkManager in debugging mode shortly and get you the log output.
Comment 3 Perry E. Metzger 2009-04-30 19:25:06 UTC
Created attachment 133685 [details]
Requested log.

Requested log.
Comment 4 Perry E. Metzger 2009-04-30 19:30:53 UTC
Okay, that log should do it. At this point you have what you requested.
Comment 5 Matthias Lohr 2009-05-26 19:13:55 UTC
I can confirm this bug. If i use an other network management tool (like wicd) this error doesn't occur any mor (including disconnects/timeouts...)
Comment 6 Nicholas J Kreucher 2009-06-15 21:32:40 UTC
Same here, using ath5k as well.

Anything we can do to help get this resolved?
Comment 7 Dan Williams 2009-09-04 01:26:58 UTC
Note that 2.6.32 will probably make this problem a *lot* better, because while connected, the mac80211-based drivers (which drive most of the modern hardware out there) do staggered scanning and return to the operating channel between segments of the scan, meaning you wont' be gone long enough to loose sync with the AP.

Also note that there are kernel bugs, including one with nullfunc PS frames during scan, that make scanning while connected somewhat unreliable up until the 2.6.30 kernel.

commit a9a6ffffd05f97e6acbdeafc595e269855829751
Author: Kalle Valo <kalle.valo@nokia.com>
Date:   Wed Mar 18 14:06:44 2009 +0200

    mac80211: don't drop nullfunc frames during software scan

What kernels are everyone running?  If you're not running 2.6.30, you'll want to get it, or have your distro backport that patch to their current kernel.  It's quite a small, low-risk patch, so if they don't accept the backport, they need some help in the risk-analysis department...
Comment 8 Tobias Mueller 2010-04-02 11:47:28 UTC
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for.
Thanks!
Comment 9 Howard Chu 2010-04-16 23:41:24 UTC
Gah, I've been commenting on the Ubuntu report and forgot this bug existed.

https://bugs.launchpad.net/bugs/291760

As I mentioned there https://bugs.launchpad.net/ubuntu/+source/linux/+bug/291760/comments/96 the behavior is really terrible on my new Dell Precision M4400 laptop with Broadcom BCM4322 on a current Ubuntu, with 2.6.33 kernel. "Much worse" means that it loses the connection every *20* seconds, not just every 120 seconds.

Also, even on my other machines that don't lose the connection so frequently, the scans still mean that packets are lost. I have a lot of status update daemons on my network that send notifications using UDP broadcasts (UPS monitor, Caller ID for incoming phone calls, a few other things). The machines doing this background scanning lose those updates if they occur while a scan is happening. Simply put, if you have any realtime activity on your network, background scanning totally breaks it.

If you insist on keeping this background scanning feature, please at least provide a config switch to turn it off. It does far more harm than good.
Comment 10 John Russell 2010-05-04 21:06:22 UTC
I'm running 10.4 with the stock 2.6.32-21-generic kernel and this is still happening on my macbook 5.1. 


I'm still getting the

May  4 16:52:18 macpro-lnx NetworkManager: <debug> [1273006338.003157] periodic_update(): Roamed from BSSID (none) ((none)) to 00:18:74:4A:22:B0 (blizzard)
May  4 16:53:48 macpro-lnx NetworkManager: <debug> [1273006428.003213] periodic_update(): Roamed from BSSID 00:18:74:4A:22:B0 (blizzard) to (none) ((none))

The ssid I'm on is not public but other than this it works.  

Is there any other information I can provide?
Comment 11 Dan Williams 2010-05-05 21:43:46 UTC
If the card is actually *dropping* the connection during a scan, then the card/driver is clearly broken and needs to be fixed in the upstream kernel.  Out-of-kernel drivers will need to be fixed by the vendor.

I expect that in some cases periodic scanning will slightly increase latency during the 2 - 5 seconds the scan could be in progress, but it certainly should not be disassociating you from the AP.  Again, if you get disassociated, there are driver bugs and those bugs need to be fixed in the drivers.
Comment 12 Howard Chu 2010-05-05 22:02:35 UTC
Those 2-5 second scan intervals can also make the difference between smooth realtime streaming of video and broken video streams, and the problem gets much worse for higher quality/higher bitrate media.

Whether the drivers have bugs or not is kind of beside the point. The point is, regardless of the device or driver involved, the current behavior is unsuitable for a number of scenarios and there are a large number of users hitting those scenarios. You're just sitting here going "la-la-la-la-I-can't-hear-you" and ignoring those users. Why are you so opposed to making this behavior configurable?
Comment 13 Dan Williams 2010-05-06 22:44:34 UTC
This is an inaccurate representation of the facts.  I'm on the record on the mailing list or here in bugzilla acknowledging the fact that periodic scanning does cause problems in cases where low latency is required.  And I've proposed various solutions for that issue that do not need configuration by end-users.

Again; if it *doesn't need to be configured* (and most things really don't) then there simply should not be a checkbox or knob for it.

And there are two reasons why this doesn't need to be configured at all:

1) applications can tell the kernel about their latency requirements through socket options and various other mechanisms that are currently working their way through the kernel upstream.  These will also be used for aggressive power management of wifi cards so that the cards enter low-power mode (which will have nothing to do wtih NM) when they aren't used and when latency is not a problem.  This same mechanism can be used to alter the periodic scanning behavior in NetworkManager.

2) In the absence above, traffic patterns can be used to suppress periodic scanning for a time until the device is not being used.  This functionality depends on the generic internal framework for connection time & usage tracking which is being handled in a separate bug.

The combination of these two things should fix issues with periodic scanning.  Periodic scanning is still required (and indeed may well be done by the driver and/or supplicant even without involvement from NetworkManager) to support any type of roaming between access points, which is why with #2 above we'd still do a scan when the device was idle.

But this bug, titled "NetworkManager drops the network every 120 seconds" is clearly invalid, because if the driver actually drops the network connection during a scan, the driver absolutely needs fixing.  I'm not going to work around broken, crappy drivers.

So if you've got a problem with latency during the scan because you're using VOIP or playing a network game, that's great and we can and will fix that, and we have a path (described above) to do so.

But if you're getting dropped from the AP every time a scan happens, that's a driver bug, not a bug in NetworkManager.  And it should be fixed in the driver.  Not worked around.
Comment 14 Howard Chu 2010-05-06 23:28:13 UTC
(In reply to comment #13)
> This is an inaccurate representation of the facts.  I'm on the record on the
> mailing list or here in bugzilla acknowledging the fact that periodic scanning
> does cause problems in cases where low latency is required.  And I've proposed
> various solutions for that issue that do not need configuration by end-users.
> 
> Again; if it *doesn't need to be configured* (and most things really don't)
> then there simply should not be a checkbox or knob for it.
> 
> And there are two reasons why this doesn't need to be configured at all:

That's a wonderful vision of the future you've painted, but it's only that - a vision. This bug report was opened over a year ago, and the problem has persisted continually since then.

Applying automation to solve this is certainly a good idea. But that automation doesn't work yet, and in the absence of it, a manual control is needed. You could have added a manual switch a year ago, these bug reports would have closed, and then you could quietly work on the automated solution at your own pace without anyone having anything to complain about. When it was ready you could smoothly phase out the config switch.

Solutions "working their way through the kernel upstream" aren't going to be widely deployed for probably years - after the kernel features stabilize it will still be months to years before the sensitive applications are modified to take advantage of the feature.

"Walk first, then run" - instead of letting people walk, you're saying "wait here, eventually you'll be able to run."

Wifi roaming is probably irrelevant to the majority of users. People mainly use wifi in stationary locations - cafes, homes, and offices. You really need to re-examine these priorities.
Comment 15 Dan Williams 2010-05-06 23:58:10 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > This is an inaccurate representation of the facts.  I'm on the record on the
> > mailing list or here in bugzilla acknowledging the fact that periodic scanning
> > does cause problems in cases where low latency is required.  And I've proposed
> > various solutions for that issue that do not need configuration by end-users.
> > 
> > Again; if it *doesn't need to be configured* (and most things really don't)
> > then there simply should not be a checkbox or knob for it.
> > 
> > And there are two reasons why this doesn't need to be configured at all:
> 
> That's a wonderful vision of the future you've painted, but it's only that - a
> vision. This bug report was opened over a year ago, and the problem has
> persisted continually since then.
> 
> Applying automation to solve this is certainly a good idea. But that automation
> doesn't work yet, and in the absence of it, a manual control is needed. You
> could have added a manual switch a year ago, these bug reports would have
> closed, and then you could quietly work on the automated solution at your own
> pace without anyone having anything to complain about. When it was ready you
> could smoothly phase out the config switch.

Wrong.  You can never "smoothly" phase stuff out.  The best approach is not to add it at all, but handle the situation more gracefully as I've suggested.

> Solutions "working their way through the kernel upstream" aren't going to be
> widely deployed for probably years - after the kernel features stabilize it
> will still be months to years before the sensitive applications are modified to
> take advantage of the feature.

Certainly not years away.  Which is why I posted #2, which can be done immediately and is not dependent on #1.

> "Walk first, then run" - instead of letting people walk, you're saying "wait
> here, eventually you'll be able to run."
> 
> Wifi roaming is probably irrelevant to the majority of users. People mainly use
> wifi in stationary locations - cafes, homes, and offices. You really need to
> re-examine these priorities.

This is completely incorrect.  Any business or educational institution almost invariably has multiple access points in the same SSID, and thus seamless roaming is absolutely essential.  It's not just users in homes.
Comment 16 Howard Chu 2010-05-10 18:30:52 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > That's a wonderful vision of the future you've painted, but it's only that - a
> > vision. This bug report was opened over a year ago, and the problem has
> > persisted continually since then.
> > 
> > Applying automation to solve this is certainly a good idea. But that automation
> > doesn't work yet, and in the absence of it, a manual control is needed. You
> > could have added a manual switch a year ago, these bug reports would have
> > closed, and then you could quietly work on the automated solution at your own
> > pace without anyone having anything to complain about. When it was ready you
> > could smoothly phase out the config switch.
> 
> Wrong.  You can never "smoothly" phase stuff out.  The best approach is not to
> add it at all, but handle the situation more gracefully as I've suggested.

The best approach is to solve users' problems today (or yesterday, in this case) and not tomorrow.

> > Wifi roaming is probably irrelevant to the majority of users. People mainly use
> > wifi in stationary locations - cafes, homes, and offices. You really need to
> > re-examine these priorities.
> 
> This is completely incorrect.  Any business or educational institution almost
> invariably has multiple access points in the same SSID, and thus seamless
> roaming is absolutely essential.  It's not just users in homes.

Nonsense. Even though those workplaces have multiple access points the majority of users sit in one location when they work. They certainly don't need seamless uninterrupted service while they walk from room to room. What people *do* need is seamless uninterrupted service while they're actually working.
Comment 17 Michael R. 2010-05-11 09:20:50 UTC
(In reply to comment #13)

> So if you've got a problem with latency during the scan because you're using
> VOIP or playing a network game, that's great and we can and will fix that, and
> we have a path (described above) to do so.

I have exactly this issue. My network connection will not dropped but while it scans in the background the latency increases to about 200 or 300 ms. This happen about every 10 sec. The icon in the tray change while it happen but I can say for sure that the connection is not dropped. 

I'm using Ubuntu 10.04 with a broadcom STA driver. I have no other possibility because it's a wireless N adapter.

If you need some further information please let me know.
Comment 18 Dan Williams 2010-05-22 16:29:11 UTC
Update: as discussed in bug 513820, I'm open to suppressing periodic scanning when you've locked your connection to a specific BSSID, because in that case roaming is not a concern.

*** This bug has been marked as a duplicate of bug 513820 ***