After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 776042 - Checking server reachability too often
Checking server reachability too often
Status: RESOLVED FIXED
Product: geary
Classification: Other
Component: engine
master
Other Linux
: Normal normal
: 0.13.0
Assigned To: Geary Maintainers
Geary Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-12-13 12:10 UTC by Gautier Pelloux-Prayer
Modified: 2017-11-17 02:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
network-change minimal listener (400 bytes, text/x-vala)
2016-12-13 12:10 UTC, Gautier Pelloux-Prayer
Details
network-change minimal listener v2 (561 bytes, text/x-vala)
2016-12-14 12:59 UTC, Gautier Pelloux-Prayer
Details
network-change minimal listener v3 (1.13 KB, text/x-vala)
2016-12-30 22:31 UTC, Gautier Pelloux-Prayer
Details

Description Gautier Pelloux-Prayer 2016-12-13 12:10:22 UTC
Created attachment 341877 [details]
network-change minimal listener

I can see dozen of these in logs:

>  [deb] 13:05:32 0,000005 imap-client-session-manager.vala:528: Checking if IMAP host imap.gmail.com/default:993 reachable...
> [msg] 13:05:32 0,000638 imap-client-session-manager.vala:539: IMAP host imap.gmail.com/default:993 considered reachable

This is because of imap-client-session-manager.vala which connects to network monitor "network changed" signal.

However, this signal is actually invoked every time *any* property of the network change, and in my case for some reasons the signal strength of the wifi connection change every second or less, leading to a tons of "reachability checking" in Geary.

I think that something must be done here, but I am unsure what:

1. Either use "bool available" signal's argument and store the previous value in the class. If no change, don't do anything.
2. Use some delay between 2 consecutive checks (30 sec or …).

1. would be cleaner but maybe there are cases where network change signal requires to check again the server reachibility (route change, etc.)?
Comment 1 Niels De Graef 2016-12-13 16:32:45 UTC
I'm not really sure if this is something we want.

First of all, if we look at the description of the network_changed signal:
>  If available is true, then some hosts may be reachable that were not
> reachable before, while others that were reachable before may no longer be
> reachable. If available is false, then no remote hosts are reachable.
That means that if we get the network_changed-signal is fired and available is true, that a server still might have become (un-)reachable. One use case I might think of is setting up or exiting a VPN which restricts what websites you might go to.

Second, I do think that we should keep this for debugging purposes, especially when people attach their log here in Bugzilla and we need to find out whether something is buggy in our code.

Now, maybe we can find a middle ground and add a --no-network-log option to not output these messages? That could also come in handy if we are debugging a log that we are sure has nothing to do with networking code.
Comment 2 Michael Gratton 2016-12-14 01:48:31 UTC
Yeah, looking at the docs for network-changed: "If available is TRUE, then [...] others that were reachable before may no longer be reachable". So it seems we may still need to do a check here.

On the other hand, the fact that signal is being fired so often is problematic, and this has been noticed by at least one other user (Bug 766547).

Gautier, if it is being fired on signal-strength change then that may be a bug. Can you have a look and see if there is an existing bug about that against glib-networking, and if not file it and reference this one?

Maybe having some period of quiescence between network changes is a compromise? It doesn't seem like a bad idea in any case, since if the network is being flaky, we don't want Geary taking connections up and down like a yo-yo. So maybe when network-changed is emitted, we should set a short (1s? 2s?) timeout before checking reachability, and if another is received within that time, the timeout is reset, up to some max (30s?), so that changes are eventually actually picked up in the face many notifications due to fluctuating signal strength.

A few other points:

- It may well be worth doing the same for network-available, for the same reason
- In ::check_endpoint_reachable_async(), the if clause at the end will call adjust_session_pool.begin() even if the network was previously already reachable. This should perhaps check to see if it has instead changed to become reachable before calling that method, which may save some CPU cycles.
- In 0.13, if we can start depending on GLib 2.44 we should also start using NetworkMonitor.get_connectivity(), and only decide to be up once full connectivity is reported
Comment 3 Gautier Pelloux-Prayer 2016-12-14 10:19:49 UTC
Niels, the problem is not the logs spam here, I genuinely think their is a bigger problem underneath. Basically Geary is never going to idle, it is pinging imap servers every 6 secs. I did not make a benchmark but I guess it is not that good for (at least) battery ;).

Michael, I checked and the problem happens on 3 laptops on 3 totally different Wifis. I assume you also encounter the same behavior if your are on wifi (try running "sudo dbus-monitor --system  | grep Strength -B 3 -A 1" - you should see 2 Strength signals emitted every 6 seconds).

I am not not sure where should I report this "bug" (intended behavior?) since I could not find the code that is actually responsible of firing this signal - is it glib project, network category?

BTW this does NOT happens for network-available. Probably because the network availability does not change while network properties does all the time?
Comment 4 Michael Gratton 2016-12-14 12:24:23 UTC
(In reply to Gautier Pelloux-Prayer from comment #3)
> Michael, I checked and the problem happens on 3 laptops on 3 totally
> different Wifis. I assume you also encounter the same behavior if your are
> on wifi (try running "sudo dbus-monitor --system  | grep Strength -B 3 -A 1"
> - you should see 2 Strength signals emitted every 6 seconds).

Hmm, yeah if I run that, I do see an update every 4-5s. However, after coming up with a trivial program that only hooks up to network-changed, I only see that signal firing when the network changes, e.g. if I disconnect/reconnect to wifi. I'll attach that program here so you can check as well.

This might be a dumb question, but are you 100% sure it is the network-changed hander that is causing check_endpoint_reachable to be called? It might be worth setting a gdb breakpoint or some debug logging to confirm.

> I am not not sure where should I report this "bug" (intended behavior?)
> since I could not find the code that is actually responsible of firing this
> signal - is it glib project, network category?

Probably a good place to start, yep. The glib source gio/gnetworkmonitorbase.c contains an abstract implementation of GNetworkMonitor that causes network-changed to be emitted if a network is added or removed. The source gio/gnetworkmonitornetlink.c is a concrete implementation that uses linux's netlink subsystem to detect when that happens. I don't know if that's what your system is using though.
Comment 5 Michael Gratton 2016-12-14 12:32:39 UTC
Oh wait, you already attached a minimal test case. Whoops! AFter getting that to compile, I'm still only seeing network-changed emitted when the network changes:

> mjg@payens:~/Projects/GNOME/geary$ ./network-change

Waited ~10s here, then disabled wifi:

> on_network_changed true
> on_network_changed true
> on_network_changed false

Waited ~5s, then renabled it:

> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed false
> on_network_changed true
> on_network_changed true

Waited ~20s, then:

> ^C
Comment 6 Gautier Pelloux-Prayer 2016-12-14 12:59:20 UTC
Created attachment 341956 [details]
network-change minimal listener v2

Fixing Compilation.
Comment 7 Gautier Pelloux-Prayer 2016-12-14 13:03:32 UTC
I am not sure what is going on exactly here; but indeed today I get only a few periodic callbacks (every 50sec or so), except for network up/down where it goes to 10 in a few seconds. 
I think yesterday something was going wrong. If you don't mind I'd like to monitor it a bit more to see if that is reproducible (maybe it was due to heavy network load, or high CPU usage or…).
Comment 8 Michael Gratton 2016-12-14 14:07:01 UTC
(In reply to Gautier Pelloux-Prayer from comment #7)
> I am not sure what is going on exactly here; but indeed today I get only a
> few periodic callbacks (every 50sec or so), except for network up/down where
> it goes to 10 in a few seconds. 
> I think yesterday something was going wrong. If you don't mind I'd like to
> monitor it a bit more to see if that is reproducible (maybe it was due to
> heavy network load, or high CPU usage or…).

Yeah, no problem.

After a bit of searching, I found this in an older Ubuntu bug: https://bugs.launchpad.net/whoopsie/+bug/991481 :
> Andy Whitcroft points out that the NETLINK_ROUTE socket set up by
> GNetworkMonitor will fire events every time an ARP entry appears
> or disappears.

There's also some good advice there: using `sudo ip monitor` might let you see the same netlink messages that gnetworkmonitornetlink.c sees. Maybe worth running that in parallel with the test case, so you know what is causing changes?
Comment 9 Gautier Pelloux-Prayer 2016-12-30 22:30:45 UTC
I cannot reproduce this issue for now, so I'll close it. I'm still monitoring it so if it happens again I'll reopen it.
Comment 10 Gautier Pelloux-Prayer 2016-12-30 22:31:31 UTC
Created attachment 342646 [details]
network-change minimal listener v3

Just a minor update to the monitoring tool in case someone need it in the future…
Comment 11 Gautier Pelloux-Prayer 2017-01-02 16:49:35 UTC
Interesting: I'm back at home and the problem starts appearing again. This is SSID specific and has to do with my Internet access provider for sure - anyone living in France with Orange might see the problem.
Anyway, I'm reopening this issue and will try to fix it.
Comment 12 Gautier Pelloux-Prayer 2017-02-07 14:28:24 UTC
I posted a bug in glib to check if this is something on this side: Bug 778288.
Comment 13 Michael Gratton 2017-11-17 02:32:42 UTC
FWIW, I just pushed commit 6182f46 to master that delays doing a check for 60s if we are already connected and a check recently happened. If we're not connected, in the process of connecting, or if a check hasn't happened in the last 60s, then it gets triggered straight away. And if the network goes away altogether, that is also noticed immediately as well.

I think this will keep CPU utilisation down as much as is possible with a flapping network, while also ensuring the important transitions (down->up and up->down) are noticed immediately.

I'm happy to haggle over the exact length of that timeout, but it has to be a decent compromise between utilisation and usability. To be honest, if a network is flapping once a minute there's likely to a lot of other kernel and userland processes sucking the battery dry more so than Geary sending out a ping.