GNOME Bugzilla – Bug 776042
Checking server reachability too often
Last modified: 2017-11-17 02:32:42 UTC
Created attachment 341877 [details] network-change minimal listener I can see dozen of these in logs: > [deb] 13:05:32 0,000005 imap-client-session-manager.vala:528: Checking if IMAP host imap.gmail.com/default:993 reachable... > [msg] 13:05:32 0,000638 imap-client-session-manager.vala:539: IMAP host imap.gmail.com/default:993 considered reachable This is because of imap-client-session-manager.vala which connects to network monitor "network changed" signal. However, this signal is actually invoked every time *any* property of the network change, and in my case for some reasons the signal strength of the wifi connection change every second or less, leading to a tons of "reachability checking" in Geary. I think that something must be done here, but I am unsure what: 1. Either use "bool available" signal's argument and store the previous value in the class. If no change, don't do anything. 2. Use some delay between 2 consecutive checks (30 sec or …). 1. would be cleaner but maybe there are cases where network change signal requires to check again the server reachibility (route change, etc.)?
I'm not really sure if this is something we want. First of all, if we look at the description of the network_changed signal: > If available is true, then some hosts may be reachable that were not > reachable before, while others that were reachable before may no longer be > reachable. If available is false, then no remote hosts are reachable. That means that if we get the network_changed-signal is fired and available is true, that a server still might have become (un-)reachable. One use case I might think of is setting up or exiting a VPN which restricts what websites you might go to. Second, I do think that we should keep this for debugging purposes, especially when people attach their log here in Bugzilla and we need to find out whether something is buggy in our code. Now, maybe we can find a middle ground and add a --no-network-log option to not output these messages? That could also come in handy if we are debugging a log that we are sure has nothing to do with networking code.
Yeah, looking at the docs for network-changed: "If available is TRUE, then [...] others that were reachable before may no longer be reachable". So it seems we may still need to do a check here. On the other hand, the fact that signal is being fired so often is problematic, and this has been noticed by at least one other user (Bug 766547). Gautier, if it is being fired on signal-strength change then that may be a bug. Can you have a look and see if there is an existing bug about that against glib-networking, and if not file it and reference this one? Maybe having some period of quiescence between network changes is a compromise? It doesn't seem like a bad idea in any case, since if the network is being flaky, we don't want Geary taking connections up and down like a yo-yo. So maybe when network-changed is emitted, we should set a short (1s? 2s?) timeout before checking reachability, and if another is received within that time, the timeout is reset, up to some max (30s?), so that changes are eventually actually picked up in the face many notifications due to fluctuating signal strength. A few other points: - It may well be worth doing the same for network-available, for the same reason - In ::check_endpoint_reachable_async(), the if clause at the end will call adjust_session_pool.begin() even if the network was previously already reachable. This should perhaps check to see if it has instead changed to become reachable before calling that method, which may save some CPU cycles. - In 0.13, if we can start depending on GLib 2.44 we should also start using NetworkMonitor.get_connectivity(), and only decide to be up once full connectivity is reported
Niels, the problem is not the logs spam here, I genuinely think their is a bigger problem underneath. Basically Geary is never going to idle, it is pinging imap servers every 6 secs. I did not make a benchmark but I guess it is not that good for (at least) battery ;). Michael, I checked and the problem happens on 3 laptops on 3 totally different Wifis. I assume you also encounter the same behavior if your are on wifi (try running "sudo dbus-monitor --system | grep Strength -B 3 -A 1" - you should see 2 Strength signals emitted every 6 seconds). I am not not sure where should I report this "bug" (intended behavior?) since I could not find the code that is actually responsible of firing this signal - is it glib project, network category? BTW this does NOT happens for network-available. Probably because the network availability does not change while network properties does all the time?
(In reply to Gautier Pelloux-Prayer from comment #3) > Michael, I checked and the problem happens on 3 laptops on 3 totally > different Wifis. I assume you also encounter the same behavior if your are > on wifi (try running "sudo dbus-monitor --system | grep Strength -B 3 -A 1" > - you should see 2 Strength signals emitted every 6 seconds). Hmm, yeah if I run that, I do see an update every 4-5s. However, after coming up with a trivial program that only hooks up to network-changed, I only see that signal firing when the network changes, e.g. if I disconnect/reconnect to wifi. I'll attach that program here so you can check as well. This might be a dumb question, but are you 100% sure it is the network-changed hander that is causing check_endpoint_reachable to be called? It might be worth setting a gdb breakpoint or some debug logging to confirm. > I am not not sure where should I report this "bug" (intended behavior?) > since I could not find the code that is actually responsible of firing this > signal - is it glib project, network category? Probably a good place to start, yep. The glib source gio/gnetworkmonitorbase.c contains an abstract implementation of GNetworkMonitor that causes network-changed to be emitted if a network is added or removed. The source gio/gnetworkmonitornetlink.c is a concrete implementation that uses linux's netlink subsystem to detect when that happens. I don't know if that's what your system is using though.
Oh wait, you already attached a minimal test case. Whoops! AFter getting that to compile, I'm still only seeing network-changed emitted when the network changes: > mjg@payens:~/Projects/GNOME/geary$ ./network-change Waited ~10s here, then disabled wifi: > on_network_changed true > on_network_changed true > on_network_changed false Waited ~5s, then renabled it: > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed false > on_network_changed true > on_network_changed true Waited ~20s, then: > ^C
Created attachment 341956 [details] network-change minimal listener v2 Fixing Compilation.
I am not sure what is going on exactly here; but indeed today I get only a few periodic callbacks (every 50sec or so), except for network up/down where it goes to 10 in a few seconds. I think yesterday something was going wrong. If you don't mind I'd like to monitor it a bit more to see if that is reproducible (maybe it was due to heavy network load, or high CPU usage or…).
(In reply to Gautier Pelloux-Prayer from comment #7) > I am not sure what is going on exactly here; but indeed today I get only a > few periodic callbacks (every 50sec or so), except for network up/down where > it goes to 10 in a few seconds. > I think yesterday something was going wrong. If you don't mind I'd like to > monitor it a bit more to see if that is reproducible (maybe it was due to > heavy network load, or high CPU usage or…). Yeah, no problem. After a bit of searching, I found this in an older Ubuntu bug: https://bugs.launchpad.net/whoopsie/+bug/991481 : > Andy Whitcroft points out that the NETLINK_ROUTE socket set up by > GNetworkMonitor will fire events every time an ARP entry appears > or disappears. There's also some good advice there: using `sudo ip monitor` might let you see the same netlink messages that gnetworkmonitornetlink.c sees. Maybe worth running that in parallel with the test case, so you know what is causing changes?
I cannot reproduce this issue for now, so I'll close it. I'm still monitoring it so if it happens again I'll reopen it.
Created attachment 342646 [details] network-change minimal listener v3 Just a minor update to the monitoring tool in case someone need it in the future…
Interesting: I'm back at home and the problem starts appearing again. This is SSID specific and has to do with my Internet access provider for sure - anyone living in France with Orange might see the problem. Anyway, I'm reopening this issue and will try to fix it.
I posted a bug in glib to check if this is something on this side: Bug 778288.
FWIW, I just pushed commit 6182f46 to master that delays doing a check for 60s if we are already connected and a check recently happened. If we're not connected, in the process of connecting, or if a check hasn't happened in the last 60s, then it gets triggered straight away. And if the network goes away altogether, that is also noticed immediately as well. I think this will keep CPU utilisation down as much as is possible with a flapping network, while also ensuring the important transitions (down->up and up->down) are noticed immediately. I'm happy to haggle over the exact length of that timeout, but it has to be a decent compromise between utilisation and usability. To be honest, if a network is flapping once a minute there's likely to a lot of other kernel and userland processes sucking the battery dry more so than Geary sending out a ping.