After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 628139 - Thread-safety issues in libical time zone loading
Thread-safety issues in libical time zone loading
Status: RESOLVED FIXED
Product: evolution-data-server
Classification: Platform
Component: Calendar
2.32.x (obsolete)
Other Linux
: High critical
: ---
Assigned To: evolution-calendar-maintainers
Evolution QA team
: 621169 628304 634024 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2010-08-27 15:28 UTC by David Woodhouse
Modified: 2013-09-14 16:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
libical locking patch (969 bytes, patch)
2010-09-18 13:43 UTC, David Woodhouse
none Details | Review
better patch (2.21 KB, patch)
2010-09-24 10:25 UTC, David Woodhouse
none Details | Review

Description David Woodhouse 2010-08-27 15:28:23 UTC
Or maybe in libical

==6588== Invalid read of size 1
==6588==    at 0x4A06A87: memcpy (mc_replace_strmem.c:497)
==6588==    by 0x3EFB4274F3: icalvalue_set_x (icalderivedvalue.c:871)
==6588==    by 0x3EFB42757D: icalvalue_new_x (icalderivedvalue.c:862)
==6588==    by 0x3EFB420BF5: icalproperty_set_x (icalderivedproperty.c:2781)
==6588==    by 0x3EFB420C4D: icalproperty_new_x (icalderivedproperty.c:2773)
==6588==    by 0x3EFB437908: icaltzutil_fetch_timezone (icaltz-util.c:378)
==6588==    by 0x3EFB438C8C: icaltimezone_load_builtin_timezone (icaltimezone.c:1801)
==6588==    by 0x3EFB438CD4: icaltimezone_get_component (icaltimezone.c:1243)
==6588==    by 0x6D871BA: e_cal_set_default_timezone (e-cal.c:3968)
==6588==    by 0x1110316D: e_memo_shell_sidebar_add_source (e-memo-shell-sidebar.c:937)
==6588==    by 0x11103F8C: memo_shell_sidebar_row_changed_cb (e-memo-shell-sidebar.c:370)
==6588==    by 0x817250D: g_closure_invoke (gclosure.c:766)
==6588==  Address 0x8eb496a is 10 bytes inside a block of size 14 free'd
==6588==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==6588==    by 0x3EFB438B9B: icaltimezone_get_vtimezone_properties (icaltimezone.c:321)
==6588==    by 0x3EFB438CD4: icaltimezone_get_component (icaltimezone.c:1243)
==6588==    by 0x6D871BA: e_cal_set_default_timezone (e-cal.c:3968)
==6588==    by 0xF8F8F65: load_cal_source_thread (authentication.c:229)
==6588==    by 0x7EB1BAB: run_in_thread (gsimpleasyncresult.c:783)
==6588==    by 0x7EA4695: io_job_thread (gioscheduler.c:181)
==6588==    by 0x8825EC3: g_thread_pool_thread_proxy (gthreadpool.c:314)
==6588==    by 0x8823745: g_thread_create_proxy (gthread.c:1897)
==6588==    by 0x359B007760: start_thread (pthread_create.c:301)
==6588==    by 0x359A8E14EC: clone (clone.S:115)
Comment 1 Matthew Barnes 2010-08-27 17:22:08 UTC
Looks like this is coming from libical.  IIUC, libical provides a fixed set of all possible timezone objects.  We don't create them or free them ourselves.
Comment 2 David Woodhouse 2010-09-08 10:18:04 UTC
Perhaps related:
  • #0 raise
    at ../nptl/sysdeps/unix/sysv/linux/raise.c line 64
  • #1 abort
    at abort.c line 92
  • #2 __libc_message
    at ../sysdeps/unix/sysv/linux/libc_fatal.c line 186
  • #3 malloc_printerr
  • #4 icaltimezone_get_vtimezone_properties
    at icaltimezone.c line 321
  • #5 icaltimezone_get_component
    at icaltimezone.c line 1243
  • #6 e_cal_set_default_timezone
    at e-cal.c line 4017
  • #7 e_memo_shell_sidebar_add_source
    at e-memo-shell-sidebar.c line 937
  • #8 memo_shell_sidebar_row_changed_cb
    at e-memo-shell-sidebar.c line 370
  • #9 g_closure_invoke
    at gclosure.c line 766

Comment 3 David Woodhouse 2010-09-18 11:01:39 UTC
The original valgrind trace persists after I update to libical 0.46, fwiw.
Comment 4 David Woodhouse 2010-09-18 12:24:35 UTC
It's a race. We're calling icaltimezone_get_component() simultaneously from multiple threads. And that function will *create* the component for us, for built-in time zones. Twice, if we get the timing right, because of the
  if (!zone->component) 
    icaltimezone_load_builtin_timezone(zone->location);
construct in icaltimezone_get_component().

And then, just for added fun, it *frees* the location we'd set for ourselves in the first place and replaces it with the location string from the component... which it had looked up from the location string we'd given it.

This bug (at least the valgrind trace; I haven't yet confirmed that the crash is really the same thing) occurs when one thread has freed zone->location to replace it with an identical string, while another is in the process of looking up a builtin timezone based on it.

Not entirely sure how to fix it, except for a global lock around icaltimezone_load_builtin_timezone(), along the lines of..

 if (!zone->component) {
    lock();
    if (!zone->component)
      icaltimezone_load_builtin_timezone(zone->location);
    unlock();
 }

... or perhaps if libical doesn't have threading and locking functions, a blunter approach in Evolution, with locking around *every* call to icaltimezone_get_component() ?
Comment 5 Matthew Barnes 2010-09-18 13:37:10 UTC
Would it help to just load all possible builtin timezone data up front before the application starts?  Perhaps in modules/calendar/e-cal-shell-backend.c?  Not sure if that would be too much of a memory hog, but seems like it would eliminate the load-on-demand race.

Looks like we could call icaltimezone_get_builtin_timezones() to get an array of all builtin icaltimezone structs, then call icaltimezone_get_component() on each of them.
Comment 6 David Woodhouse 2010-09-18 13:43:48 UTC
Created attachment 170538 [details] [review]
libical locking patch

Or we could add the necessary locking in libical...?
Comment 7 David Woodhouse 2010-09-18 17:36:06 UTC
That patch is broken -- a caller could think that icaltimezone_load_builtin_timezone() has already happened, when in fact it's still in progress but just happens to already have set zone->component. We need the locking on *every* check.
Comment 8 Akhil Laddha 2010-09-23 03:23:05 UTC
*** Bug 628304 has been marked as a duplicate of this bug. ***
Comment 9 Akhil Laddha 2010-09-24 08:03:31 UTC
Do you think bug 621169 can be a duplicate ?
Comment 10 Akhil Laddha 2010-09-24 09:41:03 UTC
*** Bug 621169 has been marked as a duplicate of this bug. ***
Comment 11 Akhil Laddha 2010-09-24 09:42:34 UTC
https://bugzilla.gnome.org/show_bug.cgi?id=621169#c1 has valgrind traces as well.
Comment 12 David Woodhouse 2010-09-24 10:25:01 UTC
Created attachment 171013 [details] [review]
better patch

This is a saner patch. Rather than trying to be clever and avoid 'unnecessary' locking when the component is already set, I'm shushing the kernel hacker voices in my head and taking the simpler approachi which actually gets it right -- although it means a lock on every call path which needs to check that the component is set.

If we wanted to work around this in Evolution rather than fixing it in libical (which might make sense), then we'd need to find *every* call path which could end up in icaltimezone_load_builtin_timezone() and add our own locking around it.
Comment 13 Matthew Barnes 2010-10-31 15:41:11 UTC
I've worked around the problem as described in comment #5.  Seems to work -- I haven't been able to trigger the crash after starting Evolution in calendar view about 20 times now, whereas before, the crash occured on startup about 30% of the time for me.

This is just an interim solution until David's patch lands in libical and a new release trickles down to distros.

Committed to master for 2.91.2 in:
http://git.gnome.org/browse/evolution/commit/?id=855ae5eed1eeae0cfe271a5a47d6fb8abf21d691

I can't remember if this issue is present in 2.32 or if it's just 2.91.  It should apply cleanly to gnome-2-32 if needed.
Comment 14 Milan Crha 2010-11-05 10:50:46 UTC
Yup, it's for 2.32.0 too.

Created commit 7510623 in evo gnome-2-32 (2.32.1+)
Comment 15 Milan Crha 2011-01-20 11:42:06 UTC
*** Bug 634024 has been marked as a duplicate of this bug. ***