GNOME Bugzilla – Bug 748223
transition offline to online doesn't complete
Last modified: 2017-08-09 09:31:47 UTC
Created attachment 302041 [details] gdb stack trace output [see with 3.16.1 on Arch] when Evolution goes offline then online (for example, when my laptop suspends and resumes) it usually can't finish going online; there will be one server connection that is stuck in the bar at the bottom (though not always the same server), e.g. Reconnecting to 'Gmail' (cancelling) after which point fetching mail and sending mail works, but loading images (Ctrl-I) doesn't, and online/offline toggling appears to do nothing. I have to close and restart to restore normal operation. In this state close doesn't work right away; after 60 seconds a dialog box appears: Close Evolution with pending background operations? Evolution is taking a long time to shut down, possibly due to network connectivity issues. Would you like to cancel all pending operations and close immediately, or keep waiting? Close Immediately Keep Waiting This problem happens most times that Evolution goes offline, but seems more likely when a server folder is selected (as opposed to one from "On This Computer"). Milan asked for the output of 'gdb --batch --ex "t a a bt"' when running evolution and evolution-data-server built with debugging symbols, which I've attached. Let me know if I can provide other potentially useful information.
Created attachment 302068 [details] another stack trace this one is the same problem, but the "server" that Evolution is stuck reconnecting to is "On This Computer". I've only seen this happen once; every other time it's been a remote IMAP server.
Thanks for a bug report. It seems to me that you've got out of free threads in the GTask thread pool. The GTask thred pool is limited to 10 running threads. It means that some operations are using these GTask threads, but also require other operations to be run in yet another thread, thus one operation can require two threads from the GTask pool (sometimes even more). If there are enough accounts configured and enabled, then they can use all free GTask threads and make the followup threads starve in the GTask thread pool. There was this issue in the past, but it's always coming back with new releases of GLib. What is your exact version of the GLib (glib2), please? Could you install debuginfo package for it and gather the backtrace again, please? It'll show whether this is the issue with the GTask thread starving or not.
Created attachment 302196 [details] stack trace w/glib2 debug my version of glib2 is 2.44.0-1
Thanks for the update. This backtrace looks slightly differently. There are running 6 imapx_parser_thread threads, then 4 threads waiting for the services to be connected and finally 3 IMAPx threads in an IDLE state (that is when the IMAPx account has set to "Listen for server change notifications"). There is no starving on the GTask thread queue, as it seemed to me from the previous backtrace. I'll try to reproduce it there and return back to you when/if I'd find anything.
I tried to reproduce this by enabling and disabling my WiFi connection (the only active on the machine) and, as always, I wasn't able to reproduce this.I had enabled 8 accounts, while 3 of them require VPN, which I didn't run, and I didn't get any similar "freeze" as you face.
I cannot explicitly and reliably "enforce" this to happen on my (affected) machine either. The only pointers I can give so far is that it mostly happens for me - in case it does happen - when I resume from suspend. Simply en- and disabling WIFI or switching routes (e.g. by en- and disabling a VPN) doesn't (seem to) trigger it. A bold (and by no means supported by any research or debugging) assumption would be that it's simply starting to early, not having a DNS and/or route working yet so threads pile up. The only support for this notion would be that it's more likely to fail/hang when the network changed, e.g. it got suspended in network A and resumes in network B. Maybe someone else can confirm this behavioral pattern/observation?
In case it's of any help: I have 12 accounts (including "On this computer") but only 4 are active (5, if you count "on this computer" along). 3 Accounts are of type ImapX, configured to use SSL on dedicated port. They all hit the same server using two different hostnames. 1 active is of type "none" to allow for sending under that name.
(In reply to theseer from comment #6) > I cannot explicitly and reliably "enforce" this to happen on my (affected) > machine either. nor can I, though it seems to happen most of the time > A bold (and by no means supported by any research or debugging) assumption > would be that it's simply starting to early, not having a DNS and/or route > working yet so threads pile up. The only support for this notion would be > that it's more likely to fail/hang when the network changed, e.g. it got > suspended in network A and resumes in network B. > > Maybe someone else can confirm this behavioral pattern/observation? I can reproduce it with no change in my laptop's network state by toggling Evolution offline then online, though the problem does seem more likely to happen after resume from suspend.
(In reply to theseer from comment #6) > A bold (and by no means supported by any research or debugging) assumption > would be that it's simply starting to early, not having a DNS and/or route > working yet so threads pile up. I hope it's not the case, because the 3.16.x has a timeout set on the network change to not start "network discovery" right after the change is noticed, but rather later, to give the time to fully establish the connection and work with it only after it is fully initialized. The timeout is like 5 seconds, if I recall correctly.
Just a small update: I was researching this problem the other day and found a seemingly related issue where the solution was to compare the sources files on the raw level (https://mail.gnome.org/archives/evolution-list/2015-April/msg00090.html). Comparing various accounts showed that the "problematic" one had a relatively high value for concurrent connections of 7. I reduced that to 5 since that was the value for the other accounts and so far the problem seems to have vanished. (vanished == didn't re-occur within the last couple of days)
I don't think I've seen this problem since upgrading to 3.18
I did have it happen to me even with 3.18 but so far not after adjusting settings.
This is happening for me in 3.20.3 in accounts with only 3 concurrent connections.
Hi, Rann, could you install debuginfo packages for the evolution-data-server, evolution, glib2, glib-networking, eventually also gnutls library and then capture a backtrace with the ever-waiting evolution, please? You can get the backtrace with command like this: $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only). The evolution-data-server 3.20.4, to be released the next week, contains this [1] commit, which can be related, though the backtrace at comment #3 doesn't contain that particular function. [1] https://git.gnome.org/browse/evolution-data-server/commit/?id=594c548fa8
I'd be glad to. I'm running Debian, and have added the Automatic Debug Packages repos (https://wiki.debian.org/AutomaticDebugPackages). However, glib2 does not appear there, so I can't get a debug package for that for now.
(or maybe I'm entirely misunderstanding you!)
Created attachment 330843 [details] Requested backtrace (without glib2 debug package)
Thanks for the update. It can be that your distribution packages the glib library differently, possibly libglib (as this /lib/x86_64-linux-gnu/libglib-2.0.so.0 can be seen in the backtrace). Your backtrace is partly useful, partly weird. But again, it can be that your distribution packages things in that way (I know Fedora, which is different from this). I see that libcamel symbols are missing, even though they are part of the evolution-data-server, the same as IMAPx code, which shows line numbers. Anyway, the backtrace shows two threads disconnecting, one connecting and three threads reading (or waiting for) data from a server. I'd guess you face the issue fixed by commit [1], but that is part of the evolution-data-server 3.20.2 and later, which you should have installed, if you use 3.20.3 (unless you've 3.20.3 evolution, but older evolution-data-server). [1] https://git.gnome.org/browse/evolution-data-server/commit/?h=gnome-3-20&id=b59863d88
Thank you! I have 3.20.3 installed. $ dpkg -s evolution-data-server Package: evolution-data-server Status: install ok installed Priority: optional Section: gnome Installed-Size: 1906 Maintainer: Debian Evolution Maintainers <pkg-evolution-maintainers@lists.alioth.debian.org> Architecture: amd64 Version: 3.20.3-1 I've installed the debugging symbols for libcamel and libglib. I'll attach the new backtrace in a second.
Created attachment 330847 [details] Updated backtrace with libcamel and libglib debugging symbols
Created attachment 330860 [details] Backtrace with two stuck connections, in case it's useful
Thanks for the update. I see that the Thread 19 is waiting for a lock which it already holds. That causes the deadlock. I fixed it for the next release. I'm not closing this bug yet, because I'm not sure whether it fixes the initial report. Created commit_8bbffdb in eds master (3.21.4+) [1] Created commit_e6dca36 in eds gnome-3-20 (3.20.4+) [1] https://git.gnome.org/browse/evolution-data-server/commit/?id=8bbffdb
(In reply to Milan Crha from comment #22) > Thanks for the update. I see that the Thread 19 is waiting for a lock which > it already holds. That causes the deadlock. I fixed it for the next release. > I'm not closing this bug yet, because I'm not sure whether it fixes the > initial report. Carl, Rann, theseer: Does the problem still happen in 3.22 or later?
I can't reproduce this problem with 3.22, and haven't seen it happen in a long time
As I cannot really enforce this to happen, it's hard to say whether or not the fix did the trick. I can confirm though, that I at least don't recall any hangs in this regard for quite a while. Currently using 3.24.4 (3.24.4-1.fc26) - on Fedora 26 obviously. I'll report in case it happens again.
Thanks for the quick feedback everybody! Based on the last two comments and comment 22 I declare victory on this ticket for the time being. If this happens again, please file a new ticket.