GNOME Bugzilla – Bug 628139
Thread-safety issues in libical time zone loading
Last modified: 2013-09-14 16:53:54 UTC
Or maybe in libical ==6588== Invalid read of size 1 ==6588== at 0x4A06A87: memcpy (mc_replace_strmem.c:497) ==6588== by 0x3EFB4274F3: icalvalue_set_x (icalderivedvalue.c:871) ==6588== by 0x3EFB42757D: icalvalue_new_x (icalderivedvalue.c:862) ==6588== by 0x3EFB420BF5: icalproperty_set_x (icalderivedproperty.c:2781) ==6588== by 0x3EFB420C4D: icalproperty_new_x (icalderivedproperty.c:2773) ==6588== by 0x3EFB437908: icaltzutil_fetch_timezone (icaltz-util.c:378) ==6588== by 0x3EFB438C8C: icaltimezone_load_builtin_timezone (icaltimezone.c:1801) ==6588== by 0x3EFB438CD4: icaltimezone_get_component (icaltimezone.c:1243) ==6588== by 0x6D871BA: e_cal_set_default_timezone (e-cal.c:3968) ==6588== by 0x1110316D: e_memo_shell_sidebar_add_source (e-memo-shell-sidebar.c:937) ==6588== by 0x11103F8C: memo_shell_sidebar_row_changed_cb (e-memo-shell-sidebar.c:370) ==6588== by 0x817250D: g_closure_invoke (gclosure.c:766) ==6588== Address 0x8eb496a is 10 bytes inside a block of size 14 free'd ==6588== at 0x4A04D72: free (vg_replace_malloc.c:325) ==6588== by 0x3EFB438B9B: icaltimezone_get_vtimezone_properties (icaltimezone.c:321) ==6588== by 0x3EFB438CD4: icaltimezone_get_component (icaltimezone.c:1243) ==6588== by 0x6D871BA: e_cal_set_default_timezone (e-cal.c:3968) ==6588== by 0xF8F8F65: load_cal_source_thread (authentication.c:229) ==6588== by 0x7EB1BAB: run_in_thread (gsimpleasyncresult.c:783) ==6588== by 0x7EA4695: io_job_thread (gioscheduler.c:181) ==6588== by 0x8825EC3: g_thread_pool_thread_proxy (gthreadpool.c:314) ==6588== by 0x8823745: g_thread_create_proxy (gthread.c:1897) ==6588== by 0x359B007760: start_thread (pthread_create.c:301) ==6588== by 0x359A8E14EC: clone (clone.S:115)
Looks like this is coming from libical. IIUC, libical provides a fixed set of all possible timezone objects. We don't create them or free them ourselves.
Perhaps related:
+ Trace 223591
The original valgrind trace persists after I update to libical 0.46, fwiw.
It's a race. We're calling icaltimezone_get_component() simultaneously from multiple threads. And that function will *create* the component for us, for built-in time zones. Twice, if we get the timing right, because of the if (!zone->component) icaltimezone_load_builtin_timezone(zone->location); construct in icaltimezone_get_component(). And then, just for added fun, it *frees* the location we'd set for ourselves in the first place and replaces it with the location string from the component... which it had looked up from the location string we'd given it. This bug (at least the valgrind trace; I haven't yet confirmed that the crash is really the same thing) occurs when one thread has freed zone->location to replace it with an identical string, while another is in the process of looking up a builtin timezone based on it. Not entirely sure how to fix it, except for a global lock around icaltimezone_load_builtin_timezone(), along the lines of.. if (!zone->component) { lock(); if (!zone->component) icaltimezone_load_builtin_timezone(zone->location); unlock(); } ... or perhaps if libical doesn't have threading and locking functions, a blunter approach in Evolution, with locking around *every* call to icaltimezone_get_component() ?
Would it help to just load all possible builtin timezone data up front before the application starts? Perhaps in modules/calendar/e-cal-shell-backend.c? Not sure if that would be too much of a memory hog, but seems like it would eliminate the load-on-demand race. Looks like we could call icaltimezone_get_builtin_timezones() to get an array of all builtin icaltimezone structs, then call icaltimezone_get_component() on each of them.
Created attachment 170538 [details] [review] libical locking patch Or we could add the necessary locking in libical...?
That patch is broken -- a caller could think that icaltimezone_load_builtin_timezone() has already happened, when in fact it's still in progress but just happens to already have set zone->component. We need the locking on *every* check.
*** Bug 628304 has been marked as a duplicate of this bug. ***
Do you think bug 621169 can be a duplicate ?
*** Bug 621169 has been marked as a duplicate of this bug. ***
https://bugzilla.gnome.org/show_bug.cgi?id=621169#c1 has valgrind traces as well.
Created attachment 171013 [details] [review] better patch This is a saner patch. Rather than trying to be clever and avoid 'unnecessary' locking when the component is already set, I'm shushing the kernel hacker voices in my head and taking the simpler approachi which actually gets it right -- although it means a lock on every call path which needs to check that the component is set. If we wanted to work around this in Evolution rather than fixing it in libical (which might make sense), then we'd need to find *every* call path which could end up in icaltimezone_load_builtin_timezone() and add our own locking around it.
I've worked around the problem as described in comment #5. Seems to work -- I haven't been able to trigger the crash after starting Evolution in calendar view about 20 times now, whereas before, the crash occured on startup about 30% of the time for me. This is just an interim solution until David's patch lands in libical and a new release trickles down to distros. Committed to master for 2.91.2 in: http://git.gnome.org/browse/evolution/commit/?id=855ae5eed1eeae0cfe271a5a47d6fb8abf21d691 I can't remember if this issue is present in 2.32 or if it's just 2.91. It should apply cleanly to gnome-2-32 if needed.
Yup, it's for 2.32.0 too. Created commit 7510623 in evo gnome-2-32 (2.32.1+)
*** Bug 634024 has been marked as a duplicate of this bug. ***