After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 774448 - GTask tasks starving in the pool, slowly processing
GTask tasks starving in the pool, slowly processing
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gio
2.50.x
Other Linux
: Normal critical
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2016-11-15 04:07 UTC by Russell Stuart
Modified: 2018-05-24 19:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Evolution hung (64.56 KB, image/png)
2016-11-15 04:07 UTC, Russell Stuart
  Details
Evolutioon Background Operation Pending (92.11 KB, image/png)
2016-11-15 04:08 UTC, Russell Stuart
  Details
Evolution backtrace whenhung as requested (71.71 KB, text/plain)
2016-11-30 06:30 UTC, Russell Stuart
  Details
Screenshot when backtrace was taken (1013.75 KB, image/png)
2016-11-30 06:31 UTC, Russell Stuart
  Details
Strace of hung evolution when backtrace was taken (464.59 KB, text/plain)
2016-11-30 06:41 UTC, Russell Stuart
  Details
lsof of evolution when backtrace was done (68.47 KB, text/plain)
2016-11-30 06:52 UTC, Russell Stuart
  Details
debug prints (12.61 KB, text/plain)
2016-12-08 17:33 UTC, Milan Crha
  Details
Backtrace (17.78 KB, text/plain)
2016-12-11 08:28 UTC, Russell Stuart
  Details
proposed patch (2.82 KB, patch)
2017-03-22 15:28 UTC, Milan Crha
none Details | Review

Description Russell Stuart 2016-11-15 04:07:12 UTC
Created attachment 339875 [details]
Evolution hung

Version: Debian evolution_3.22.2-1_amd64

Evolution hangs sometimes on startup.  I suspect it starts successfully if there is no email to fetch, and always hangs if there is email to fetch.

Screen shot in the hung state attached as "evolution-hung.png".

It is not completely hung.  File --> Quit prompts with "Close Evolution with Pending Background Operations?" after a wile, and "Close Immediately" does just that.  Whether it restarts without hanging is a lottery, but my guess is if it has finished collecting email it is successful.

Screen short with ie asking to exit with Pending Background Operations is attached as: "evolution-pending.png".

It will from time to time hang in an identical fashion during normal operation.
Comment 1 Russell Stuart 2016-11-15 04:08:26 UTC
Created attachment 339876 [details]
Evolutioon Background Operation Pending
Comment 2 Milan Crha 2016-11-29 10:01:55 UTC
Thanks for a bug report. I guess you got out of free threads, which causes the hang. Could you install debuginfo packages for the evolution-data-server, evolution, glib2 and eventually also for the glib-netowrking, please?

Then, when your evolution will exhibit the behaviour again, open a terminal and get a backtrace of it, to see where it is stuck/what it tries to accomplish. You can get the backtrace with the command like this:
   $ gdb --batch --ex "t a a bt" -pid=`pidof evolution` &>bt.txt
Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).

Thanks in advance.
Comment 3 Russell Stuart 2016-11-30 06:30:09 UTC
Created attachment 341022 [details]
Evolution backtrace whenhung as requested

I recompiled with debugging symbols on.

It is supposedly fetching emails (see attached screen shot), but a tcpdump shows no email related network traffic whatsoever.
Comment 4 Russell Stuart 2016-11-30 06:31:10 UTC
Created attachment 341023 [details]
Screenshot when backtrace was taken
Comment 5 Russell Stuart 2016-11-30 06:41:04 UTC
Created attachment 341024 [details]
Strace of hung evolution when backtrace was taken
Comment 6 Russell Stuart 2016-11-30 06:52:48 UTC
Created attachment 341025 [details]
lsof of evolution when backtrace was done
Comment 7 Russell Stuart 2016-11-30 07:01:20 UTC
I also have a core dump.  At 852MB it's a bit large to post here, but I have it and the debug symbols.
Comment 8 Milan Crha 2016-11-30 15:31:14 UTC
Thanks for the update. The backtrace was the only thing I was looking for and that yours look pretty fine. I see in it that there are opened 10 different GTask threads (g_task_thread_pool_thread) each of them waiting for another GTask thread to be opened and finished, but the GTask itself has a limit of 10 threads, thus the follow-up operations starve in the queue. That's a problem in the GLib itself.

Personally, it would be great if the g_task_run_in_thread_sync() could be smart enough and if it'll recognize that it's already run in a dedicated GTask thread, then it'll not try to run another one, but just run the task "inline", synchronously, within that particular thread, rather than "pause" the current thread and trying to create a new one. The current behaviour looks backward in this regard.

Your only option is to either:
a) compile GLib yourself and increase the thread pool limit for the GTask
b) do not have so many accounts configured in the evolution (I do not think
   you've many, it's the GTask which does)
c) do not let the accounts update/refresh on start

but none of those cannot guarantee that the starving will not get back again.
Comment 9 Milan Crha 2016-11-30 16:47:22 UTC
Russell, by the way, what is your glib2 (glib) version, please? There is some code which should add more threads when the tasks are blocking, which doesn't seem to work in your version.
Comment 10 Dan Winship 2016-11-30 18:33:27 UTC
(In reply to Milan Crha from comment #8)
> Personally, it would be great if the g_task_run_in_thread_sync() could be
> smart enough and if it'll recognize that it's already run in a dedicated
> GTask thread, then it'll not try to run another one, but just run the task
> "inline", synchronously, within that particular thread, rather than "pause"
> the current thread and trying to create a new one. The current behaviour
> looks backward in this regard.

It can't do that because the inner sync task might get cancelled, in which case the call to g_task_run_in_thread_sync() that is running it should immediately return. (IOW, we can't run the task in the same thread in this case for the same reason that g_task_run_in_thread_sync() exists at all.)

But as you note, this problem is supposed to have been fixed as of glib 2.46.2+ (bug 687223).
Comment 11 Russell Stuart 2016-11-30 22:04:27 UTC
> what is your glib2 (glib) version, please?

The debian stretch version (recompiled by me to get symbols, but the source is untouched).  glib2_2.50.2-2_amd64

> this problem is supposed to have been fixed as of glib 2.46.2+

hmmm.
Comment 12 Russell Stuart 2016-11-30 22:07:32 UTC
> c) do not let the accounts update/refresh on start

The hang in the backtrace happened after it had been running for several hours, not at startup :(
Comment 13 Russell Stuart 2016-11-30 22:30:19 UTC
> a) compile GLib yourself and increase the thread pool limit for the GTask

It irritates me enough that I've done this.  I've upped the task limit to 200.  If it hangs again I'll post an update here.

For what it's worth, hiding the status bar reduces the problem somewhat (particularly at startup).  I'm guessing the spinners in the status bar require a task.
Comment 14 Milan Crha 2016-12-01 08:54:39 UTC
(In reply to Russell Stuart from comment #13)
> For what it's worth, hiding the status bar reduces the problem somewhat
> (particularly at startup).  I'm guessing the spinners in the status bar
> require a task.

Nope, they are updated in the main thread, on timeout. They are significantly less "CPU hungry" than spinners in the folder tree (and GtkSpinner as such). Hiding the status bar might be just a coincidence, though I cannot explain why (it's a UI element, thus everything it does it does in the UI thread, the main thread).
Comment 15 Milan Crha 2016-12-08 17:33:57 UTC
Created attachment 341628 [details]
debug prints

I tried to investigate it further and the problem is that there are really many pending tasks, but those running are slowly processing, only that slowly that it looks like they are not moving at all. I have enabled more than 15 mail accounts in the evolution and defined more than 50 (approximately) in total. They all do something on startup. I added some debug prints around and the atached is its output. I can share the prints with you, of course. The number at the beginning is the g_get_monotonic_time() / 1000 (thus in milliseconds), when the print was done. then it says where it was printed, then some hopefully useful information. This is after some initial phase of the evolution after start. The log ends in the time where I just gave up, after 88.788 seconds.

As you can see, there are more than 300 pending tasks, but only up to 12 threads in the GTasks pool, with the wait-time increasing constantly.
Comment 16 Milan Crha 2016-12-08 17:40:42 UTC
I tried to set 20 to G_TASK_POOL_SIZE and the same startup sequence looks like:
33323144    task_pool_manager_timeout: tasks_running:21 + 1 max threads; pending:6
33323145 g_task_thread_setup: tasks_running:22 wait-time:115926
33323145    g_task_thread_cleanup: tasks_running:22 - 1 max threads; pending:5
33323261    task_pool_manager_timeout: tasks_running:21 + 1 max threads; pending:14
33323261 g_task_thread_setup: tasks_running:22 wait-time:119403
33323261    g_task_thread_cleanup: tasks_running:22 - 1 max threads; pending:13
33323334    g_task_thread_cleanup: turning off ready time; pending:0
33323335    g_task_thread_cleanup: turning off ready time; pending:0
33323381    task_pool_manager_timeout: tasks_running:21 + 1 max threads; pending:239
33323381 g_task_thread_setup: tasks_running:22 wait-time:122985
33323381    g_task_thread_cleanup: tasks_running:22 - 1 max threads; pending:238
33323464    g_task_thread_cleanup: tasks_running:21 - 1 max threads; pending:239
33323504    task_pool_manager_timeout: tasks_running:20 + 1 max threads; pending:239
33323504 g_task_thread_setup: tasks_running:21 wait-time:126674
33323504    g_task_thread_cleanup: tasks_running:21 - 1 max threads; pending:238
33323577 g_task_thread_setup: tasks_running:20 wait-time:100000
33323577 g_task_thread_setup: tasks_running:20 wait-time:100000
33323577 g_task_thread_setup: tasks_running:20 wait-time:100000
Comment 17 Russell Stuart 2016-12-11 08:28:50 UTC
Created attachment 341730 [details]
Backtrace

A Xmas present.  Another backtrace of evolution hung.

The differences from the previous time is the UI is working - it's just that fetching emails has frozen.  And glib2 has been been recompiled, with symbols, but also with gio/gtask.c:G_TASK_POOL_SIZE changed to 100.

So it's likely unrelated so the previous one, but since I have symbols I thought it would be nice to have.
Comment 18 Milan Crha 2016-12-12 17:11:55 UTC
(In reply to Russell Stuart from comment #17)
> So it's likely unrelated so the previous one, but since I have symbols I
> thought it would be nice to have.

Right, it's unrelated. You have debug symbols for glib, but not for the evolution-data-server, nor evolution itself, though even then there are running basically two interesting threads, one is for your IMAPx account, which can also be an IDLE command being processed, but that's not visible in the backtrace, and another thread waiting for the bogofilter, or at least calling some wait from that module. The main thread is showing a dialog, I do not know which though.

I would start with the bogofilter thread, maybe bogofilter had some issue, or is waiting for something for too long. In any case, better to use a different bug report or it, if any.
Comment 19 Milan Crha 2017-03-22 15:28:47 UTC
Created attachment 348511 [details] [review]
proposed patch

This glib patch makes my evolution properly run with enabled 17 mail accounts. According to my debugging, evolution starts with around 250 pending tasks (I have there many disabled mail accounts too), and this patch makes it max_threads up to ~90 at one time (after few iterations), but then getting lower as the tasks are finishing (most tasks come from GNetworkManager when checking whether the accounts can reach their destination, it's not evolution itself requiring so many tasks on its own (the address availability check is split into multiple GTask-s, which is the biggest pita here).

Anyway, it can be either this patch, or a one-liner to increase (default) thread pool size, because when I increase it from 10 to 20, then the evolution is fully initialized even quicker than with this complicated patch. I think it's due to overhead when adding and removing max-threads to the pool during the start.

We spoke about it in other bug(s), but an ideal side solution would be to let the application (not any random library) define what the min-threads is supposed to be with some public API, because I do not think that glib can cover all the needs, even it's causing part of the problem on its own (especially that GNetworkManager, as mentioned above). My opinion, though.
Comment 20 Russell Stuart 2017-06-01 11:15:45 UTC
Original reporter here.  It's been a while.  Understandable I guess because tracing through a deadlock caused by running out of threads in the pool would be a bitch to debug.

May I humbly suggest a solution to my immediate problem.  My immediate problem is: when evolution has a large number of accounts to monitor it freezes.  The solution seems simple enough: just limit the number of concurrent account refresh's to whatever your threading model can handle.  From my point of view 1 would be just fine, because even though fetching them serially would take a long time it is still much faster then attempting to fetch 20, but hanging half way through.
Comment 21 pingo 2017-10-02 14:43:16 UTC
Hi,

I am not a programmer so pardon me. Just want to share my experience and workaround on this.

While managing 5 email accounts in evolution (evo), 80% of the time I opened rvo, it got stuck due to this problem requiring app to be force closed. 

I found a workaround that if evo is put in 'offline mode' before closing/exiting then this problem does not happen. Working for me since 3 weeks at least. Adding delay is helping the fetching of emails.

Hope this helps.

Thanks
Comment 22 Milan Crha 2017-11-28 13:17:38 UTC
*** Bug 790635 has been marked as a duplicate of this bug. ***
Comment 23 GNOME Infrastructure Team 2018-05-24 19:14:41 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1223.