After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 746276 - e_client_cache_get_client_sync() gets stuck
e_client_cache_get_client_sync() gets stuck
Status: RESOLVED FIXED
Product: evolution
Classification: Applications
Component: Mailer
3.12.x (obsolete)
Other Linux
: Normal critical
: ---
Assigned To: evolution-mail-maintainers
Evolution QA team
: 746500 747009 747271 747647 748290 752353 753308 766378 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2015-03-16 11:22 UTC by Brian J. Murrell
Modified: 2016-05-16 14:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
addressbook-factory stack trace (7.66 KB, text/plain)
2015-04-01 10:35 UTC, Brian J. Murrell
Details
evo-gn746276-01.patch (3.24 KB, text/plain)
2015-04-07 06:35 UTC, Milan Crha
Details
evo-gn746276-02.patch (3.66 KB, text/plain)
2015-04-07 16:29 UTC, Milan Crha
Details

Description Brian J. Murrell 2015-03-16 11:22:02 UTC
I frequently get a task stuck "Filtering new messages in 'INBOX' (n% complete)" where n is some value less than 100.  Sometimes the "(%n complete)" part is not included in the message.

When this happens, I am unable to expunge the folder.  The only way to resolve it is to restart evolution.

Here's the stacktrace of the evolution process:

Thread 7 (Thread 0x7f32f6e18700 (LWP 15022))

  • #0 syscall
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
  • #1 g_mutex_lock_slowpath
  • #2 e_mail_ui_session_check_known_address_sync
    at e-mail-ui-session.c line 894
  • #3 mail_ui_session_lookup_addressbook
    at e-mail-ui-session.c line 574
  • #4 junk_test
    at camel-filter-search.c line 901
  • #5 camel_sexp_term_eval
    at camel-sexp.c line 810
  • #6 camel_sexp_eval
    at camel-sexp.c line 1728
  • #7 camel_filter_search_match
  • #8 camel_filter_driver_filter_message
    at camel-filter-driver.c line 1742
  • #9 folder_filter
    at camel-folder.c line 433
  • #10 session_do_job_cb
    at camel-session.c line 197
  • #11 g_task_thread_pool_thread
  • #12 g_thread_pool_thread_proxy
  • #13 g_thread_proxy
  • #14 start_thread
    at pthread_create.c line 310
  • #15 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Thread 2 (Thread 0x7f32f5615700 (LWP 19802))

  • #0 syscall
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
  • #1 g_cond_wait_until
  • #2 g_async_queue_pop_intern_unlocked
  • #3 g_async_queue_timeout_pop
  • #4 g_thread_pool_thread_proxy
  • #5 g_thread_proxy
  • #6 start_thread
    at pthread_create.c line 310
  • #7 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Comment 1 Milan Crha 2015-03-31 13:57:29 UTC
Thanks for a bug report. The backtrace shows two threads running a junk test during the message filtering. Each is run on a different message of the same account. One thread is waiting for a response from some addressbook, another one is waiting on an EClientCache to get an ESource corresponding to some address book. I suppose the problem is that the address book doesn't respond and the junk test code is waiting for it indefinitely - the operation cannot be even cancelled, according to the backtrace. Finding out why the address book is waiting for so long may help to identify the issue.

Or maybe to figure out what else failed here. What is your exact evolution version, please? Evolution 3.12.11 contains a change in the EClientCache [1], which can have influence on this.

[1] https://git.gnome.org/browse/evolution/commit/?id=0e6ad402fdd0
Comment 2 Brian J. Murrell 2015-03-31 14:42:43 UTC
(In reply to Milan Crha from comment #1)
> Finding out why the address
> book is waiting for so long may help to identify the issue.

Indeed.  Any hints on finding that out?
 
> Or maybe to figure out what else failed here. What is your exact evolution
> version, please?

evolution-3.12.11-1.01.fc21

Which is upstream's 3.12.11-1 + the upstream patch to allow expunging in vfolders.

> Evolution 3.12.11 contains a change in the EClientCache
> [1], which can have influence on this.
> 
> [1] https://git.gnome.org/browse/evolution/commit/?id=0e6ad402fdd0

Looking in the tarball that the RPM was built from, that patch seems to be there.
Comment 3 Milan Crha 2015-03-31 17:22:31 UTC
Try to get a backtrace of the running evolution-addressbook-factory, (have installed debuginfo package for evolution-data-server), to see what it does. The command can look like:
>   $ gdb --batch --ex "t a a bt" -pid=`pidof evolution-addressbook-factory` &>bt.txt
Please check the bt.txt for any private information, like passwords, email address, server addresses,... I usually search for "pass" at least (quotes for clarity only).
Comment 4 Milan Crha 2015-04-01 06:04:56 UTC
*** Bug 746500 has been marked as a duplicate of this bug. ***
Comment 5 Milan Crha 2015-04-01 06:57:35 UTC
I tried to reproduce it here, but no luck. I also checked the code, whether the change mentioned in comment #1 could cause any harm (it's one of the possibilities), but the calls to e_client_cache_get_client_sync() seem to be correct, at least on the first look.

I thought there is some issue with the main context, like a wrong thread default context being used for the EBookClient, or the main thread context being blocked for some reason, causing this stuck state, but none of it had been seen here.

I can try to create a debugging patch, which would print some hopefully-useful information on the console, and create a scratch build for Fedora 21, if you'd be willing to give it a try. If you get this stuck state frequently, then it'll be easier for you to provide the debug log. Could I do it, please?
Comment 6 Brian J. Murrell 2015-04-01 10:35:33 UTC
Created attachment 300730 [details]
addressbook-factory stack trace

addressbook-factory stack trace
Comment 7 Brian J. Murrell 2015-04-01 10:39:18 UTC
(In reply to Milan Crha from comment #5)
> 
> I can try to create a debugging patch, which would print some
> hopefully-useful information on the console, and create a scratch build for
> Fedora 21, if you'd be willing to give it a try.

Most certainly!
Comment 8 Milan Crha 2015-04-01 13:20:15 UTC
*** Bug 747009 has been marked as a duplicate of this bug. ***
Comment 9 Milan Crha 2015-04-02 09:58:12 UTC
Thanks for the update. The address book factory is idle, which might mean that the response from it is hidden somewhere. I'll try to cook the debug patches and provide packages for Fedora 21 with them (I will most likely touch both evolution-data-server and evolution).
Comment 10 Milan Crha 2015-04-02 11:06:39 UTC
Here [1] is the scratch build. It's the evolution only, for now. Please download it, install it and run it. It'll print many things into the console. Capture it by usual means, like:

   $ evolution &>log.txt

It logs only pointers, source (book/calendar/...) names and some other hopefully useful information, but nothing really private, I hope.

Thanks in advance.

[1] http://koji.fedoraproject.org/koji/taskinfo?taskID=9398945
Comment 11 Brian J. Murrell 2015-04-02 11:46:34 UTC
Hrm.  It is strange.  My system is not seeing the koji build you did as upgrading my existing version:

Examining /var/tmp/yum-root-vZb5zL/evolution-3.12.11-1.1.fc21.x86_64.rpm: evolution-3.12.11-1.1.fc21.x86_64
/var/tmp/yum-root-vZb5zL/evolution-3.12.11-1.1.fc21.x86_64.rpm: does not update installed package.
Error: Nothing to do

and rpm even doesn't see it as upgrading the existing evolution installation:

	file /usr/share/locale/zh_TW/LC_MESSAGES/evolution-3.12.mo from install of evolution-3.12.11-1.1.fc21.x86_64 conflicts with file from package evolution-3.12.11-1.01.fc21.x86_64

But my installed evolution looks fine to me:

# rpm -qi evolution
Name        : evolution
Version     : 3.12.11
Release     : 1.01.fc21
Architecture: x86_64

I've never seen RPM this confused before.  Certainly I can force the install/upgrade of the koji build but the above is just slightly concerning.
Comment 12 Milan Crha 2015-04-02 15:27:24 UTC
it will be due to the version. If you downgrade first, then it will work. By the way, what does that "1.01" contain?
Comment 13 Brian J. Murrell 2015-04-02 19:22:49 UTC
(In reply to Milan Crha from comment #12)
> it will be due to the version.

Why is that?  Both packages have the same name.

> If you downgrade first, then it will work.

If the problem were that RPM thinks that your package is older than mine it does not complain of file conflicts but says that a newer package is already installed.

RPM is not understanding that your and my packages are both "evolution" for some strange reason.

> By
> the way, what does that "1.01" contain?

The patch to allow vfolders to be expunged.

Can you attach your debug patch to this bug?
Comment 14 Milan Crha 2015-04-07 06:35:28 UTC
Created attachment 301046 [details]
evo-gn746276-01.patch

The debug patch for evolution.

To answer the rest, it was only my first idea of the things involved. No doubt, I can be wrong.
Comment 15 Milan Crha 2015-04-07 09:55:59 UTC
*** Bug 747271 has been marked as a duplicate of this bug. ***
Comment 16 Milan Crha 2015-04-07 16:29:06 UTC
Created attachment 301083 [details]
evo-gn746276-02.patch

Updated debug patch, which may have the issue addressed. The problem is thread unsafety of the EAsynClosure (or the way it is used here). Bad timing made the EAsyncClosure finish before it was run, thus the call to run didn't know it's actually over.
Comment 17 Milan Crha 2015-04-08 08:13:34 UTC
Just a note, a patch for evolution-data-server from [1] would be a better choice, adding thread safety on each usage of the EAsyncClosure, not only to this exact place in evolution.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1207661#c5
Comment 18 Milan Crha 2015-04-14 18:42:00 UTC
Thanks to Brian I was able to investigate this further and it led me to more findings and related circumstances. The EAsyncClosure thread safety is needed:

Created commit_d22de17 in eds master (3.17.1+) [1]
Created commit_7473c39 in eds gnome-3-16 (3.16.2+)

Here [2] is a test build of evolution-data-server for Fedora 21 with the change applied.

I made a similar change in evolution itself too:

Created commit 07a0565 in evo master (3.17.1+)
Created commit 13d5477 in evo gnome-3-16 (3.16.2+)

[1] https://git.gnome.org/browse/evolution-data-server/commit/?id=d22de17
[2] http://koji.fedoraproject.org/koji/taskinfo?taskID=9478633
Comment 19 Milan Crha 2015-04-20 10:54:39 UTC
*** Bug 747647 has been marked as a duplicate of this bug. ***
Comment 20 Milan Crha 2015-04-23 10:24:07 UTC
*** Bug 748290 has been marked as a duplicate of this bug. ***
Comment 21 Milan Crha 2015-07-15 15:50:39 UTC
*** Bug 752353 has been marked as a duplicate of this bug. ***
Comment 22 Milan Crha 2015-07-15 15:51:36 UTC
I'm reopening this. All the above changes are still wrong, as can be reproduced with steps from bug #752353.
Comment 23 Milan Crha 2015-07-15 16:04:47 UTC
Let's avoid the e_async_closure() in this function completely, for a price of a semi-duplicated code. It helps a lot, thus it worth it.

Created commit a6aa24f in evo master (3.17.4+)
Created commit edd9a7f in evo gnome-3-16 (3.16.5+)
Comment 24 Milan Crha 2015-08-06 13:26:06 UTC
*** Bug 753308 has been marked as a duplicate of this bug. ***
Comment 25 Milan Crha 2016-05-16 14:22:11 UTC
*** Bug 766378 has been marked as a duplicate of this bug. ***