Bug 595480 – Since adding MAPI, evolution-data-server crashes, 100% CPU useage, or OOM kills.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 595480 - Since adding MAPI, evolution-data-server crashes, 100% CPU useage, or OOM kills.


Summary:	Since adding MAPI, evolution-data-server crashes, 100% CPU useage, or OOM kills.


Status:	RESOLVED FIXED

Product:	evolution-mapi
Classification:	Applications
Component:	Contacts (Addressbook)
Version:	0.27.x
Hardware:	Other Linux

Importance:	Normal critical
Target Milestone:	---
Assigned To:	evolution-mapi-maint
QA Contact:	evolution-mapi-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-09-17 15:33 UTC by Nemo
Modified:	2009-11-23 05:51 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
proposed ema patch (for a crash) (1.37 KB, patch) 2009-09-17 16:28 UTC, Milan Crha	committed	Details \| Review
proposed ema patch (partial) (4.09 KB, patch) 2009-10-12 10:49 UTC, Milan Crha	committed	Details \| Review

Description Nemo 2009-09-17 15:33:08 UTC

Using 0.27.92 plus a few mcrha patches for message headers.
Yesterday, my CPU useage maxed out for a long time due to EDS.  Connecting to the EDS process yielded this interesting activity, every time I connected.

+ Trace 217645

Comment 1 Nemo 2009-09-17 15:34:12 UTC

I'd like to note that the trace parsing didn't realise I had two separate traces in there.  Please expand Trace 217645 above - there is quite a bit more description of the bug in it.

Comment 2 Nemo 2009-09-17 15:46:34 UTC

strings ~/.evolution/addressbook/local/system/addressbook.db | grep 'EMAIL' | sort | uniq | wc -l

yields 75 entries, all of which I recognise.

So I'm not sure what's up with EDS now, and also where all those unrecognised addresses came from.

I'm going to disable MAPI and retest.

Comment 3 Nemo 2009-09-17 15:49:04 UTC

With mapi disabled, EDS works perfectly, personal address list appears to be just fine.  GAL also works perfectly if I switch to that (using OWA).

Note that above I had *not* switched to GAL in the address book dialog.

I do have duplicate addresses in my personal address book that appeared sometime after using mapi that I will have to clean up, but it does not seem serious.

Comment 4 Milan Crha 2009-09-17 16:23:10 UTC

Aha, I see I'm wrong, I thought the build_cache function is missing e_file_cache_freeze_changes/e_file_cache_thaw_changes, but I see it's there.
So the CPU usage is something else.

For the crash itself, two threads are trying to use the global_mapi_session pointer, which might result it this. I added locking there (attaching patch now).

Comment 5 Milan Crha 2009-09-17 16:28:49 UTC

Created attachment 143369 [details] [review]
proposed ema patch (for a crash)

for evolution-mapi;

Please try with this patch. Should help with the crash itself.

Comment 6 Nemo 2009-09-17 19:11:57 UTC

Yep. Crash is fixed, address book works again, however periodically EDS still hits 100% CPU usage while sucking up every scrap of memory until OOM killed or --force-shutdown

Comment 7 Milan Crha 2009-09-18 07:09:41 UTC

Could you try few snapshots (backtraces) while eds is in stuck, please?

When it happens something like:
  $ gdb --batch --ex "t a a bt" -pid=PID &>/tmp/bt.log

where PID is a process ID of running (and CPU chewing) evolution-data-server?
Say around 3 snapshots in different time? Maybe I overlooked something. Also, do you know how many contacts are stored in your GAL? We can try to figure out later, probably.

Comment 8 Nemo 2009-09-18 14:06:57 UTC

The first trace that I posted (the one I had posted in IRC) was from repeatedly connecting to the EDS process and doing a backtrace.

Each time, the same trace was displayed, the polling loop and this e_book_backend_cache_add_contact thing.

As for the number of people in the GAL.
No idea. Thousands? Maybe tens of thousands?

Comment 9 Nemo 2009-09-18 15:31:02 UTC

FWIW, I tried your snapshot suggestion.
for i in `seq 10`;do gdb --batch --ex "t a a bt" -pid=$(ps auwx | grep evolution-data-server | grep -v grep | awk '{print $2}') &>/tmp/eds${i}.log;sleep 2;done

I fired this off the instant my CPU, network and memory started skyrocketing.
It wasn't too hard to trigger, I just started a new mail and fiddled w/ autocomplete of addresses, personal vs GAL address lists and such until it happened.  (it happens even without my doing this unfortunately, so MAPI is fairly unreliable right now).
I did --force-shutdown after, to beat the OOM killer to the punch.

All 10 traces were identical.  Just looked like polling stuff.  Unlike the time I connected and suspended, where every single time I did that and resumed it was in e_book_backend_cache_add_contact.

Also, the "9644": No such file or directory seems odd.  Pretty sure my fetching a pid above is reasonable, and I wouldn'tve had much time to act after it went wild.

9644: No such file or directory.
[Thread debugging using libthread_db enabled]
[New Thread 0xb66dcb70 (LWP 8263)]
[New Thread 0xb7f20b70 (LWP 8224)]
0x00fb7422 in __kernel_vsyscall ()

+ Trace 217669

Comment 10 Nemo 2009-09-18 18:14:24 UTC

EDS just started spazzing out again.
This time no network activity and memory usage was relatively stable.
10678 nemo      20   0  972m 864m 9268 S 87.1 43.0   3:21.20 evolution-data-    

The traces were also more varied, and they included similar content as my first trace.

This full trace was the first one.  I'll go over the others to see if they were doing anything at all different.

[Thread debugging using libthread_db enabled]
[New Thread 0xb771eb70 (LWP 10963)]
[New Thread 0xb6f1db70 (LWP 10895)]
[New Thread 0xb2cfeb70 (LWP 10809)]
[New Thread 0xb04f9b70 (LWP 10808)]
[New Thread 0xb0cfab70 (LWP 10807)]
[New Thread 0xb3ef0b70 (LWP 10803)]
[New Thread 0xb5f1bb70 (LWP 10719)]
[New Thread 0xb7f60b70 (LWP 10679)]
0x00404422 in __kernel_vsyscall ()

+ Trace 217674

Thread 3 (Thread 0xb6f1db70 (LWP 10895))

#0 find_object_in_hash
at e-file-cache.c line 306
#1 foreach_hash_func
at e-xml-hash-utils.c line 298
#2 IA__g_hash_table_foreach
at /build/buildd/glib2.0-2.21.6/glib/ghash.c line 1211
#3 e_xmlhash_foreach_key
at e-xml-hash-utils.c line 319
#4 e_file_cache_get_object
at e-file-cache.c line 326
#5 e_file_cache_add_object
#6 e_book_backend_cache_add_contact
at e-book-backend-cache.c line 290
#7 build_cache
at e-book-backend-mapi-gal.c line 166
#8 g_thread_create_proxy
at /build/buildd/glib2.0-2.21.6/glib/gthread.c line 635
#9 start_thread
from /lib/tls/i686/cmov/libpthread.so.0
#10 clone
from /lib/tls/i686/cmov/libc.so.6

Thread 2 (Thread 0xb771eb70 (LWP 10963))

#0 strcmp
from /lib/tls/i686/cmov/libc.so.6
#1 find_object_in_hash
at e-file-cache.c line 302
#2 foreach_hash_func
at e-xml-hash-utils.c line 298
#3 IA__g_hash_table_foreach
at /build/buildd/glib2.0-2.21.6/glib/ghash.c line 1211
#4 e_xmlhash_foreach_key
at e-xml-hash-utils.c line 319
#5 e_file_cache_get_object
at e-file-cache.c line 326
#6 e_file_cache_add_object
#7 e_file_cache_replace_object
#8 e_book_backend_cache_add_contact
at e-book-backend-cache.c line 288
#9 build_cache
at e-book-backend-mapi-gal.c line 166
#10 g_thread_create_proxy
at /build/buildd/glib2.0-2.21.6/glib/gthread.c line 635
#11 start_thread
from /lib/tls/i686/cmov/libpthread.so.0
#12 clone
from /lib/tls/i686/cmov/libc.so.6

Comment 11 Nemo 2009-09-18 18:21:57 UTC

Most of the other traces were indeed essentially the same apart with fetching other entries from our huge GAL.

A couple of the differences:
One thread on one of the logs was in:

+ Trace 217675

Thread 3 (Thread 0xb6f1db70 (LWP 10895))

#0 IA__g_unicode_canonical_ordering
at /build/buildd/glib2.0-2.21.6/glib/gunidecomp.c line 90
#1 _g_utf8_normalize_wc
at /build/buildd/glib2.0-2.21.6/glib/gunidecomp.c line 426
#2 IA__g_utf8_collate
at /build/buildd/glib2.0-2.21.6/glib/gunicollate.c line 106
#3 e_name_western_get_one_prefix_at_str
at e-name-western.c line 159
#4 e_name_western_get_prefix_at_str
at e-name-western.c line 193
#5 e_name_western_extract_prefix
#6 e_name_western_parse
#7 e_contact_name_from_string
#8 fn_setter
at e-contact.c line 554
#9 e_contact_set_property
at e-contact.c line 1036
#10 object_set_property
at /build/buildd/glib2.0-2.21.6/gobject/gobject.c line 942
#11 IA__g_object_set_valist
at /build/buildd/glib2.0-2.21.6/gobject/gobject.c line 1431
#12 IA__g_object_set
at /build/buildd/glib2.0-2.21.6/gobject/gobject.c line 1537
#13 e_contact_set
at e-contact.c line 1644
#14 build_cache
at e-book-backend-mapi-gal.c line 163
#15 g_thread_create_proxy
at /build/buildd/glib2.0-2.21.6/glib/gthread.c line 635
#16 start_thread
from /lib/tls/i686/cmov/libpthread.so.0
#17 clone
from /lib/tls/i686/cmov/libc.so.6


All the rest were essentially the same with minor address differences.

Comment 12 Milan Crha 2009-09-21 11:05:06 UTC

Thanks for the update. I guess the comment #10 has a clue. Two threads are trying to build a local cache of GAL contacts simultaneously. If one thread finished the operation then in unlocks the cache (the EFileCache doesn't use "recursive" locking), and any other operation is on an unfrozen cache, thus makes all the direct saving to file.

Bharath, you wrote this part recently, what is your opinion on it?

Comment 13 Milan Crha 2009-09-23 12:18:08 UTC

Comment on attachment 143369 [details] [review]
proposed ema patch (for a crash)

Created commit 056b109 in ema master (0.29.1+)
Created commit 31252e2 in ema gnome-2-28 (0.28.1+)

The part of the build_cache thread still applies.

Comment 14 Bharath Acharya 2009-09-24 04:06:22 UTC

Thanks Milan. I'll take a look at the thread issues.

Comment 15 Milan Crha 2009-10-12 10:49:48 UTC

Created attachment 145274 [details] [review]
proposed ema patch (partial)

for evolution-mapi;

This is an interim partial solution to the CPU usage issue, but the whole cache system in ema seems to need a little rewrite. Note the GAL will not work probably, as this is just for not-to-chew-my-CPU hot-fix. Please give it a try, maybe together with a GAL testing. Thanks in advance.

Comment 16 Chenthill P 2009-10-28 06:46:57 UTC

Comment on attachment 145274 [details] [review]
proposed ema patch (partial)

I feel the build_thread needs to be locked in authenticate_user. isn it?

Comment 17 Chenthill P 2009-10-28 08:58:22 UTC

whole cache in gal or in ema ? would be good if you put more specifically..

Comment 18 Milan Crha 2009-10-29 11:54:54 UTC

(In reply to comment #16)
> I feel the build_thread needs to be locked in authenticate_user. isn it?

What do you mean with "locked"? It is created only in the authenticate user function, and the authenticate user function ignores re-creating of the thread when there is some already.

(In reply to comment #17)
> whole cache in gal or in ema ? would be good if you put more specifically..

I'm not sure what exactly I meant. Might be for ema's GAL only. By the way, the book_view_thread function is not using the cache at all, it's always fetching all contacts from GAL. The proper way might be to use cache or search for them, than reading whole GAL.

Comment 19 Akhil Laddha 2009-11-02 08:05:56 UTC

see bug 598769 also

Comment 20 Chenthill P 2009-11-02 11:27:36 UTC

authenticate_user can be called by multiple clients/threads at the same time isn it?

Bharath will be looking into the second part..

Comment 21 Milan Crha 2009-11-02 18:14:46 UTC

(In reply to comment #20)
> authenticate_user can be called by multiple clients/threads at the same time
> isn it?

There should be always one backend per server/uri, thus technically this shouldn't happen, as long as the build_cache thread will be stopped properly. What I guess happened to Nemo is that the build thread was kept running while one backend stopped, and the second started to update the cache, but the first build_cache thread just finished, thus unlocked the cache nd then the mess began. (I hope it makes sense.)

Comment 22 Bharath Acharya 2009-11-16 05:20:01 UTC

Committed to master and gnome-2-28 branch
http://bit.ly/IRYbT

I will look at delta fetching for GAL in libmapi. But with this fix and Milan's cache locking, things should be better.

Comment 23 Bharath Acharya 2009-11-23 05:51:03 UTC

Comment on attachment 145274 [details] [review]
proposed ema patch (partial)

Committed to master
a05fc8481bfd48c941efb19a3a592bc30f297577

Committed to gnome-2-28 branch
http://git.gnome.org/cgit/evolution-mapi/commit/?h=gnome-2-28&id=598f318109f44b8649269c100f51f98fc554d3c4

Comment 24 Bharath Acharya 2009-11-23 05:51:47 UTC

Closing the report