After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 766482 - [review] dcbw/wifi-scan: delegate periodic scanning behavior to wpa_supplicant and reduce periodic scans
[review] dcbw/wifi-scan: delegate periodic scanning behavior to wpa_supplican...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: Wi-Fi
unspecified
Other Linux
: Normal major
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
http://blog.cerowrt.org/post/disablin...
Depends on:
Blocks: nm-review nm-next 778152
 
 
Reported: 2016-05-15 18:57 UTC by Jim Gettys
Modified: 2017-08-11 17:39 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Jim Gettys 2016-05-15 18:57:48 UTC
As described in http://blog.cerowrt.org/post/disabling_channel_scans/
network manager scans every several minutes, whether it needs to or not.

This causes major hickups in network performance (for of order five seconds), even if you are in a stable WiFi environment.

This destroys any use of the network (e.g. VOIP, etc), even if there aren't other bad side effects (I suspect the occasional unwanted reconfiguration to a different ESSID in my house is caused by this, but that isn't the purpose of this report).

See Dave Taht's extensive blog posting for more information and how to easily test for this behavior.
Comment 1 Joshua Johnson 2016-05-20 20:49:44 UTC
A workaround for this issue (described on Dan William's blog*) is to "lock your WiFi connection profile to the BSSID of your access point". The steps to do this for NetworkManager 1.0.12 (currently included with Fedora 23) are:

1. Open NetworkManager GUI
2. Click the gear icon next to the SSID
3. Click on the "Identity" tab
4. Select or type the BSSID in the "BSSID" dropdown
5. Click "Apply"

* https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/
Comment 2 dave taht 2016-06-04 20:37:38 UTC
There was some discussion on what the "right" behavior should be post 802.11-2012, here:

https://plus.google.com/u/0/107942175615993706558/posts/WA915Pt4SRN

My take on it was (and remains) that the system should not be scanning when signal strength is good, and absolutely should never switch to another SSID without user intervention. Switching to another BSSID for the same SSID when a scan indicates it is better is ok.

Thank you for the pointer to how to lock things to a given BSSID. For the record, (as my test clients have no guis), the relevant parameters needed to be added to /etc/NetworkManager/system-connections/fqcodel

[wifi]
bssid=04:F0:21:1F:36:E2 # your bssid
mac-address-blacklist=
mac-address-randomization=0
mode=infrastructure
seen-bssids=04:F0:21:1F:36:E2; # still your bssid
ssid=FQCODEL


The resulting test series (in progress) of the fq_codel code for ath10k and ath9k is now MUCH saner, and competes much more effectively against the osx box also under test, thank you.
Comment 3 Dan Williams 2016-10-20 15:45:11 UTC
Note that NM will never switch to a different SSID unless the user has told NetworkManager to connect to that SSID at least once.  We cannot require manual user intervention to switch SSIDs since often there are multiple available SSIDs in the same location, usually different due to authentication or band reasons.  For example, many home wifi routers set up different 2.4 and 5ghz SSIDs (no idea why).  Many enterprises set up different SSIDs for different authentication methods.  In any case, NM never ever connects to an SSID without the user having told NM to do so.

Simplified list of action items from https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/:

1) advocate that UIs (like GNOME Shell, nm-applet, KDE plasma-nm, etc) use RequestScan() when their WiFi network list UI is shown

2) enable wpa_supplicant background scanning (with bgscan_simple) for all cases except locked BSSID (not just 802.1x/Enterprise), using a fairly large "long interval".

3) kill the internal while-connected NM periodic scanning code in favor of #2.  We must still keep the periodic while-disconnected scanning logic so that NM can decide when to connect to saved SSIDs.

The reason for enabling bgscan is that if we do not use some kind of background scanning when signal is low, we'll never "make before break" when needing to roam due to low signal strength.  So when signal falls below a given bgscan_simple threshold, scanning should happen.  When signal is above the threshold, scanning would not happen.

The only problem with this is if your connection to the AP is always below the threshold, then background scanning would happen more frequently than it currently does.  However, in this case either (a) there are multiple SSIDs and scanning *should* be happening to allow roaming, or (b) there is only one SSID and the use could lock to the marginal SSID to disable background scanning.
Comment 4 dave taht 2016-10-25 16:58:29 UTC
I'm not sure if we're talking past each other or not on "2".

"2) enable wpa_supplicant background scanning (with bgscan_simple) for all cases except locked BSSID (not just 802.1ix/Enterprise), using a fairly large "long interval"."

My take on all crypto/auth types was that there is no need to initiate a scan until connectivity was poor, and even then it should stay/scan within the connected SSID and not wander off it. Example - I have multiple APs in my apartment complex with the Xfinity SSID, none of which I desire to connect to. I choose "my" SSID and would prefer to lose connectivity and have to manually choose another if something changes. As it was, the channel scan interrupted traffic every 2 minutes (which osx did not do) in a setup where the laptop was within a few feet of my AP and getting good signal, and in certain places in the apt, the xfinity ssid ends up winning, leading to loss of network entirely, even tho the ssid I intentionally connected to is more than good enough.

There's a level of connectivity between total loss of signal and "poor", that would be a region where "make before break" had to take place.
Comment 5 Dan Williams 2016-11-07 22:30:03 UTC
(In reply to dave taht from comment #4)
> I'm not sure if we're talking past each other or not on "2".
> 
> "2) enable wpa_supplicant background scanning (with bgscan_simple) for all
> cases except locked BSSID (not just 802.1ix/Enterprise), using a fairly
> large "long interval"."
> 
> My take on all crypto/auth types was that there is no need to initiate a
> scan until connectivity was poor, and even then it should stay/scan within

Yes.  That's exactly what the supplicant's bgscan does.  It has two scan intervals, long and short.  When signal is better than a given threshold it uses the "long" interval.  When signal is worse than the threshold it uses the "short" interval on the theory that it should roam to a better AP very soon as signal is marginal.

Note that when connected, NM has *always* stayed on the requested SSID unless that connection is lost entirely, in which case it looks through the list of networks it is allowed to connect to and picks the most recently used one (or the highest priority one, if priorities are used).  NM has *never* jumped between SSIDs while connected, only APs in the same SSID (which was actually handled by wpa_supplicant's roaming logic, not NM).

The only caveat here is that when signal is poor, that's not a great time to find a roaming candidate by jumping away from your operating channel.  So finding a suitable threshold is the key.

> the connected SSID and not wander off it. Example - I have multiple APs in
> my apartment complex with the Xfinity SSID, none of which I desire to
> connect to. I choose "my" SSID and would prefer to lose connectivity and
> have to manually choose another if something changes. As it was, the channel

The supplicant's connected bgscan function (which is what I'm proposing to use here instead of the NM logic) specifically scans for the connected SSID.  Thus any background scan done when signal gets poor will only scan for other APs in the same SSID.  Since this scan is only being done to find roam candidates, I think this satisfies your point here.

> scan interrupted traffic every 2 minutes (which osx did not do) in a setup
> where the laptop was within a few feet of my AP and getting good signal, and
> in certain places in the apt, the xfinity ssid ends up winning, leading to
> loss of network entirely, even tho the ssid I intentionally connected to is
> more than good enough.

What I'm proposing here will only do connected periodic scans when signal is poor and the connection is not locked to a BSSID.  When the signal is poor, the wpa_supplicant background scanning will probe-scan for a specific SSID to find roaming candidates.  This can of course still be disabled by locking the connection to a specific BSSID, indciating that you never want this connection to roam.

> There's a level of connectivity between total loss of signal and "poor",
> that would be a region where "make before break" had to take place.

Yes, that's what I'm proposing here.

NM-controlled periodic scans will still happen regularly when *dis*connected so that NM can pick an available network from its configured network list.
Comment 6 Dan Williams 2016-11-07 22:53:05 UTC
Pushed dcbw/wifi-scan implementing these ideas.

https://cgit.freedesktop.org/NetworkManager/NetworkManager/log/?h=dcbw/wifi-scan

This branch does *not* change:

1) disconnected behavior; periodic scans are still done so NM can find WiFi networks it has been told to connect to automatically

2) connected periodic scans are still suppressed for BSSID-locked connections and shared (eg AP/adhoc) connections

While connected, this branch has these changes:

1) uses the supplicant's bgscan_simple function to handle periodic scanning while connected, instead of NM's internal logic.

2) for WPA-Enterprise and Dynamic WEP configurations, uses the existing background scanning interval of short=30/long=300 with a threshold of -65dBm.  This means that when signal is better than -65dBm the supplicant will probe-scan the current SSID for roam candidates every 300 seconds, and when it's worse than -65 will probe-scan every 30 seconds.  We can debate the threshold level, but it's been -65dBm already for years and in a multi-AP environment that seems to work fairly well so far.

3) for all other connections, enables background scanning with short=30/long=86400 with a threshold of -80dBm.  This means that when signal is quite poor it will probe-scan the current SSID scan every 30 seconds.  When signal is OK/good it will only probe-scan once per day.

4) disables roaming decisions based on user/dbus scan requests.  Previously these might trigger a within-ESS roam in the supplicant which caused flip-flopping between APs for no good reason, even when signal between the two APs was only a few points better.

David, does the branch appear to address your issues with the current behavior?
Comment 7 Thomas Haller 2016-11-08 08:26:50 UTC
>> devices/wifi: flip meaning of scanning allowed signal

In general, inverse logic like SCANNING_PROHIBITED (or UNMANAGED_FLAGS) hurt my brain. So, not clear to me that this is an improvement, but totally fine with me.


>> devices/wifi: delegate connected periodic scanning to the supplicant (bgo 
    
commit message:

  Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=766482

Usually we mark the bug URL that is fixed by the commit without "Fixes".
On the contrary, we mark bug URLs that do not fix the bug via "Related: http...".

There is script origin/automation:contrib/rh-utils/find-backports.sh which parses "Fixes: $COMMIT_ID".
And the script origin/automation:contrib/rh-utils/bzutil.py interprets "Related: $URL" comments specially.
Probably both scripts would work with a "Fixes: $URL" too (or could be fixed to do so), but maybe just don't use yet another scheme here?



branch lgtm
Comment 8 Dan Williams 2016-11-08 16:03:15 UTC
I still need to fix up the testcases, so the branch isn't quite ready yet.
Comment 9 Dan Williams 2016-11-08 18:17:53 UTC
(In reply to Thomas Haller from comment #7)
> >> devices/wifi: flip meaning of scanning allowed signal
> 
> In general, inverse logic like SCANNING_PROHIBITED (or UNMANAGED_FLAGS) hurt
> my brain. So, not clear to me that this is an improvement, but totally fine
> with me.

Yeah, I know. It was just simpler (I thought) to flip the meaning rather than add another GValue parameter to the request.  I feel like being able to use g_signal_emit() and not having to deal with g_signal_emitv() and return GValue default values is a win though.

> >> devices/wifi: delegate connected periodic scanning to the supplicant (bgo 
>     
> commit message:
> 
>   Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=766482
> 
> Usually we mark the bug URL that is fixed by the commit without "Fixes".
> On the contrary, we mark bug URLs that do not fix the bug via "Related:
> http...".
> 
> There is script origin/automation:contrib/rh-utils/find-backports.sh which
> parses "Fixes: $COMMIT_ID".
> And the script origin/automation:contrib/rh-utils/bzutil.py interprets
> "Related: $URL" comments specially.
> Probably both scripts would work with a "Fixes: $URL" too (or could be fixed
> to do so), but maybe just don't use yet another scheme here?

I switched it to Related, since it doesn't really fix a bug so much as optimize behavior.

Reposted.
Comment 10 Dan Williams 2016-11-09 15:49:29 UTC
Fixups look good, thanks!  Squashed and repushed.
Comment 11 Dan Williams 2016-11-09 16:01:33 UTC
Note that at least GNOME Shell has included request-scan logic from its wifi dialog since June 2016 and version 3.22.1, see bug https://bugzilla.gnome.org/show_bug.cgi?id=767918.
Comment 12 Beniamino Galvani 2016-12-05 17:28:48 UTC
LGTM
Comment 13 Thomas Haller 2017-02-06 11:33:51 UTC
https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/ says:

When you do [[lock connection to BSSID]], NetworkManager understands that you do not want to roam and will disable the periodic scanning behavior.  Explicitly requested scans are still allowed.

looking at current master, that is currently not true.


TODO: check that this is properly handled with the branch
Comment 14 Thomas Haller 2017-08-02 10:53:31 UTC
Branch rebased. See https://mail.gnome.org/archives/networkmanager-list/2017-August/msg00001.html
Comment 15 Thomas Haller 2017-08-02 10:59:38 UTC
+    ip4_method = nm_utils_get_ip_config_method (connection, NM_TYPE_SETTING_IP4_CONFIG);
+    if (!strcmp (ip4_method, NM_SETTING_IP4_CONFIG_METHOD_SHARED))
+         return TRUE;

this wouldn't work correctly for ipv6.method=shared. Also, it may make sense to run AP mode with static IP addressing. I think we should detect AP mode based on nm_setting_wireless_get_mode().




Pushed a few fixups.


rest lgtm.
Comment 16 Beniamino Galvani 2017-08-03 16:38:51 UTC
LGTM
Comment 17 Vít Ondruch 2017-08-04 08:09:24 UTC
I was pointed at this BZ from https://bugzilla.redhat.com/show_bug.cgi?id=1471126

(In reply to Dan Williams from comment #3)
> The only problem with this is if your connection to the AP is always below
> the threshold, then background scanning would happen more frequently than it
> currently does.  However, in this case either (a) there are multiple SSIDs
> and scanning *should* be happening to allow roaming, or (b) there is only
> one SSID and the use could lock to the marginal SSID to disable background
> scanning.

I as a regular user don't want to lock anything. So let me quote my question from the above mentioned BZ:

~~~
Is it the roaming algorithm really that simple choosing the AP just by signal strength? E.g. if strength was between 0-100, while AP 1 had strength 57 while AP 2 had strength 58, the AP 2 would be chosen? Is there some range, where the current connection had preference? Is there a way to adjust the range?
~~~
Comment 18 Dan Williams 2017-08-07 14:24:14 UTC
(In reply to Vít Ondruch from comment #17)
> I was pointed at this BZ from
> https://bugzilla.redhat.com/show_bug.cgi?id=1471126
> 
> (In reply to Dan Williams from comment #3)
> > The only problem with this is if your connection to the AP is always below
> > the threshold, then background scanning would happen more frequently than it
> > currently does.  However, in this case either (a) there are multiple SSIDs
> > and scanning *should* be happening to allow roaming, or (b) there is only
> > one SSID and the use could lock to the marginal SSID to disable background
> > scanning.
> 
> I as a regular user don't want to lock anything. So let me quote my question
> from the above mentioned BZ:
> 
> ~~~
> Is it the roaming algorithm really that simple choosing the AP just by
> signal strength? E.g. if strength was between 0-100, while AP 1 had strength
> 57 while AP 2 had strength 58, the AP 2 would be chosen? Is there some
> range, where the current connection had preference? Is there a way to adjust
> the range?
> ~~~

The in-SSID connected-state roaming has always been controlled by wpa_supplicant, which considers a number of different factors before deciding to switch APs.  That includes signal strength, throughput, radio band (eg 2.4 vs 5) and others.  It also contains some hysteresis such that it will only roam to an AP that is not trivially "better" than the current AP.

NetworkManager handles roaming from one SSID to another, and that is unaffected by this branch/patch.  Scanning when disconnected still happens at the normal back-off interval that NM has used for the past 10 or so years.
Comment 19 Dan Williams 2017-08-07 14:41:19 UTC
(In reply to Thomas Haller from comment #13)
> https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/ says:
> 
> When you do [[lock connection to BSSID]], NetworkManager understands that
> you do not want to roam and will disable the periodic scanning behavior. 
> Explicitly requested scans are still allowed.
> 
> looking at current master, that is currently not true.

It should be.  There are two parts:

(a) that periodic scanning is disabled when locked to a BSSID
-- this is handled by scanning_prohibited() in this block to stop all periodic scanning while connected:

	case NM_DEVICE_STATE_ACTIVATED:
		/* Prohibit periodic scans when connected; we ask the supplicant to
		 * background scan for us, unless the connection is locked to a specifc
		 * BSSID.
		 */
		if (periodic)
			return TRUE;
		break;

and then periodic scans are only re-enabled by nm_supplicant_config_add_bgscan() if the connection is *not* locked to a BSSID:

	/* Don't scan when the connection is locked to a specifc AP, since
	 * intra-ESS roaming (which requires periodic scanning) isn't being
	 * used due to the specific AP lock. (bgo #513820)
	 */
	if (nm_setting_wireless_get_bssid (s_wifi))
		return TRUE;


(b) that explicitly requested scans are allowed while connected
-- dbus_request_scan_cb() calls check_scanning_prohibited() with periodic=FALSE, then calls request_wireless_scan(periodic=FALSE) which will handle the actual scan.  check_scanning_prohibited() allows a scan when periodic=FALSE.
Comment 20 Dan Williams 2017-08-07 15:38:50 UTC
Fixups LGTM; squashed and addressed Thomas' comment about "check for mode instead of SHARED".  Though this does mean that we will no longer periodically scan in adhoc mode when *not* shared, but this can disrupt connections if the card is has beaconing responsibility for the network at that time, so I think it's OK.  (in adhoc mode devices randomly delegate beaconing responsibility for the network to a node and then change it periodically).

Also added one more cleanup commit "devices/wifi: move scan option processing to D-Bus request scan handler".
Comment 21 Thomas Haller 2017-08-11 09:33:41 UTC
pushed trivial fixup!.

Branch lgtm
Comment 22 Dan Williams 2017-08-11 17:39:18 UTC
Squashed and merged to git master as merge commit d989be823569f701beac7911ff59c8ddb0eef38e