GNOME Bugzilla – Bug 735325
NM repeatedly updates default route, causing GNetworkMonitor using apps to go crazy
Last modified: 2014-11-07 15:23:09 UTC
I don't use Evolution. I have my calendar synced because I set my google account in GOA, but it's a small calendar with not much data. Recently, when not doing ANYTHING that's related to calendars or evolution in any way, I noticed my computer is overheating (99°C !). According to `top`, evolution-calendar-factory is using 100% CPU during that time for very long periods of time. This wastes energy, overheats laptops, and drains batteries. I attached gdb to the process to see what the hell it's doing, and got this backtrace:
+ Trace 233995
This is extremely annoying and should be fixed soon, preferably before my laptop melts down :)
Created attachment 284342 [details] strace Attached strace dump of few seconds of that process. It keeps coming back after I kill it, that's really not good!
Looks like GNetworkMonitor (or something below it) is stuck in a loop. strace shows something stat()'ing "/etc/resolv.conf", followed by some I/O, followed by a read() call returning EAGAIN, then it all repeats. E-D-S is apparently being spammed with GNetworkMonitor::network-changed signals, and is simply reacting to them. Dan Winship can probably make more sense of the strace.
You might be able to suppress the problem for now by restarting the calendar factory from the command-line like so: GIO_USE_NETWORK_MONITOR=base /usr/libexec/evolution-calendar-factory -r
It appears that the GNetworkMonitor::network-changed is spamming other applications as well, I can see it in Software too.
(In reply to comment #2) > E-D-S is apparently being spammed with GNetworkMonitor::network-changed > signals, and is simply reacting to them. Which is because GNetworkMonitor is apparently being spammed with routing change notifications from the kernel, and is simply reacting to them... The strace output is truncated so I can't fully decode the messages, but the headers at least are there: recvmsg(9, {msg_name(0)=NULL, msg_iov(1)=[{"l\0\0\0\30\0\0\0\216F[T\243\2\300=\n\0\0\0\376\4\0\1\0\0\0\0\10\0\17\0"..., 108}], which is: 6c 00 00 00 [ length 108 ] 18 00 00 00 [ type RTM_NEWROUTE, flags 0 ] 8e 46 5b 54 [ serial # ] a3 02 c0 3d [ pid 2734866493 ?? ] 0a 00 00 00 [ AF_INET6, src len 0, dst len 0, TOS 0 ] fe 04 00 01 [ RT_TABLE_MAIN, RTPROT_STATIC, RT_SCOPE_UNIVERSE, RTN_UNICAST ] 00 00 00 00 [ flags 0 ] 08 00 0f 00 ... (The "pid" field should be 0 according to the docs, but maybe those docs are out of date?) RTPROT_STATIC indicates that this is a route added by a userland tool, not created by the kernel automatically. src len 0 / dst len 0 means that it's the default route. So, something appears to be constantly changing the IPv6 default route. (Or else, the kernel is confused and is repeatedly sending out notifications about the default route changing even though it isn't.) The netlink messages aren't all identical; there seems to be a repeating cycle of four messages: The second one has serial 0, pid 0, and advertises that a cloned route to a single IP address has been created. The one after that is identical to the first message, except with a higher serial number. The last one is an RTM_DELROUTE for the route added by the second message. Then the cycle repeats.
Now when I look at it more closely, it happens only when I'm connected to ethernet at work, where I have IPv4 assigned by DHCP and staticly configured IPv6... I have no idea why the default route would change like that. Would a more complete strace help diagnose this problem?
(In reply to comment #6) > Would a more complete strace help diagnose this problem? Possibly. You could pass "-s 256" to get the full netlink packet contents, and "-f" to get all threads. It might also be useful to run NetworkManager with debug logging (to see what it's doing/noticing about the routing). Add this to /etc/NetworkManager/NetworkManager.conf: [logging] level=DEBUG domains=DEFAULT and restart NM.
I can't reproduce this issue anymore. Can anyone else? If not I'll close this bug.
seems like a no
It turns out this is NM's fault; it's constantly requesting updates from the router and then constantly updating things to reflect that.
pushed branch danw/rdisc-spam for review (see also https://bugzilla.redhat.com/show_bug.cgi?id=1151665)
Looks good to me. Tag the commit with the Gnome and RH bug numbers when you merge it?
OK. I built a test package for the Fedora reporters... waiting to hear back on that.
The patch looks right to me.
LGTM too
The previous fix did not fix the bug for the reporter. After more investigation, I found another potential problem, and fixing that did make the bug go away. Branch is now at danw/routing-spam-bgo735325. Note that this means that we are somehow messing up and ending up with a default route in an NMIP6Config somehow. There are more logs in the Fedora bug, but nothing jumps out. (It's not because the router is pushing us a default route.) The patch doesn't fix the default-route-in-NMIP6Config bug, it just makes us not get stuck in a loop when it does happen.
(In reply to comment #16) > The previous fix did not fix the bug for the reporter. After more > investigation, I found another potential problem, and fixing that did make the > bug go away. > > Branch is now at danw/routing-spam-bgo735325. > > Note that this means that we are somehow messing up and ending up with a > default route in an NMIP6Config somehow. There are more logs in the Fedora bug, > but nothing jumps out. (It's not because the router is pushing us a default > route.) The patch doesn't fix the default-route-in-NMIP6Config bug, it just > makes us not get stuck in a loop when it does happen. The second commit looks good to me too. I actually do the same on th/bgo735512_route_metric
Second commit also looks good to me. Ready to merge?
pushed to master and nm-0-9-10