GNOME Bugzilla – Bug 777307
race condition between gdbus signal callback and g_bus_unwatch_name - can access freed memory
Last modified: 2017-07-21 06:04:39 UTC
Created attachment 343528 [details] [review] Delegate g_free(client) from its context The on_name_owner_changed() can access the variable "client" in free memory, when another thread call g_bus_unwatch_name() asynchronously. The emit_signal_instance_in_idle_cb () checks signal_instance->connection->map_id_to_signal_data with connection lock, and executes real signal handler. In the case of gdbusnamewatching, g_bus_unwaching_name() call g_dbus_connection_signal_unsubscribe() and free client variable. ----------- scenario ------------- 1. [A thread] At emit_signal_instance_in_idle_cb, lock connection 2. [A thread] emit_signal_instance_in_idle_cb() check signal and set "has_subscription = TRUE;" 3. [A thread] unlock connection 4. [B thread] At g_bus_unwatch_name(), calls g_dbus_connection_signal_unsubscribe(). 5. [B thread] do "g_free (client);" 6. [A thread] At emit_signal_instance_in_idle_cb, do "signal_instance->callback". 7. [A thread] on_name_owner_changed is called, and access client variable. But client is already freed on 5 steps.
I believe there's a bug here, but what you wrote implies that something is accessing "client" when its refcount is zero, which I think we should fix instead somehow.
I guess the below is the same report. It's with 2.50.2, when running unit tests for evolution-data-server: > ==20828==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000007ba0 at pc 0x7f4940354499 bp 0x7f492ba93d80 sp 0x7f492ba93d70 > READ of size 8 at 0x60b000007ba0 thread T1 > #0 0x7f4940354498 in on_connection_disconnected glib-2.50.2/gio/gdbusnamewatching.c:259 > #1 0x7f493cbd3c57 in ffi_call_unix64 (/lib64/libffi.so.6+0x5c57) > #2 0x7f493cbd36b9 in ffi_call (/lib64/libffi.so.6+0x56b9) > #3 0x7f493fa2880c in g_cclosure_marshal_generic glib-2.50.2/gobject/gclosure.c:1490 > #4 0x7f493fa24938 in g_closure_invoke glib-2.50.2/gobject/gclosure.c:804 > #5 0x7f493fa71348 in signal_emit_unlocked_R glib-2.50.2/gobject/gsignal.c:3635 > #6 0x7f493fa6f10b in g_signal_emit_valist glib-2.50.2/gobject/gsignal.c:3391 > #7 0x7f493fa6ff07 in g_signal_emit glib-2.50.2/gobject/gsignal.c:3447 > #8 0x7f49403255dd in emit_closed_in_idle glib-2.50.2/gio/gdbusconnection.c:1358 > #9 0x7f493f37f048 in g_idle_dispatch glib-2.50.2/glib/gmain.c:5545 > #10 0x7f493f375abc in g_main_dispatch glib-2.50.2/glib/gmain.c:3203 > #11 0x7f493f379f4c in g_main_context_dispatch glib-2.50.2/glib/gmain.c:3856 > #12 0x7f493f37a522 in g_main_context_iterate glib-2.50.2/glib/gmain.c:3929 > #13 0x7f493f37b074 in g_main_loop_run glib-2.50.2/glib/gmain.c:4125 > #14 0x7f493e983913 in source_registry_object_manager_thread evolution-data-server/src/libedataserver/e-source-registry.c:1167 > #15 0x7f493f3fb049 in g_thread_proxy glib-2.50.2/glib/gthread.c:784 > #16 0x7f49411506c9 in start_thread (/lib64/libpthread.so.0+0x76c9) > #17 0x7f493e15bf6e in clone (/lib64/libc.so.6+0x107f6e) > > 0x60b000007ba0 is located 80 bytes inside of 104-byte region [0x60b000007b50,0x60b000007bb8) > freed by thread T0 here: > #0 0x7f4942145af0 in free (/usr/lib64/libasan.so.3+0xc6af0) > #1 0x7f493f39047c in g_free glib-2.50.2/glib/gmem.c:189 > #2 0x7f49403536fe in client_unref glib-2.50.2/gio/gdbusnamewatching.c:108 > #3 0x7f4940356c7f in g_bus_unwatch_name glib-2.50.2/gio/gdbusnamewatching.c:845 > #4 0x404310 in main evolution-data-server/src/calendar/libedata-cal/evolution-calendar-factory-subprocess.c:220 > #5 0x7f493e074400 in __libc_start_main (/lib64/libc.so.6+0x20400) > > previously allocated by thread T0 here: > #0 0x7f4942146010 in calloc (/usr/lib64/libasan.so.3+0xc7010) > #1 0x7f493f390386 in g_malloc0 glib-2.50.2/glib/gmem.c:124 > #2 0x7f493f3906c6 in g_malloc0_n glib-2.50.2/glib/gmem.c:355 > #3 0x7f4940355858 in g_bus_watch_name glib-2.50.2/gio/gdbusnamewatching.c:563 > #4 0x40429f in main evolution-data-server/src/calendar/libedata-cal/evolution-calendar-factory-subprocess.c:198 > #5 0x7f493e074400 in __libc_start_main (/lib64/libc.so.6+0x20400) > > Thread T1 created by T0 here: > #0 0x7f49420b0488 in pthread_create (/usr/lib64/libasan.so.3+0x31488) > #1 0x7f493f46862b in g_system_thread_new glib-2.50.2/glib/gthread-posix.c:1170 > #2 0x7f493f3fb31f in g_thread_new_internal glib-2.50.2/glib/gthread.c:874 > #3 0x7f493f3fb179 in g_thread_new glib-2.50.2/glib/gthread.c:827 > #4 0x7f493e984e55 in source_registry_initable_init evolution-data-server/src/libedataserver/e-source-registry.c:1385 > #5 0x7f49401fd473 in g_initable_init glib-2.50.2/gio/ginitable.c:112 > #6 0x7f493e986201 in e_source_registry_new_sync evolution-data-server/src/libedataserver/e-source-registry.c:1767 > #7 0x7f493ee3d289 in subprocess_factory_initable_init evolution-data-server/src/libebackend/e-subprocess-factory.c:160 > #8 0x7f49401fd473 in g_initable_init glib-2.50.2/gio/ginitable.c:112 > #9 0x7f49401fd732 in g_initable_new_valist glib-2.50.2/gio/ginitable.c:228 > #10 0x7f49401fd5a4 in g_initable_new glib-2.50.2/gio/ginitable.c:146 > #11 0x7f4940caff74 in e_subprocess_cal_factory_new evolution-data-server/src/calendar/libedata-cal/e-subprocess-cal-factory.c:174 > #12 0x404241 in main evolution-data-server/src/calendar/libedata-cal/evolution-calendar-factory-subprocess.c:191 > #13 0x7f493e074400 in __libc_start_main (/lib64/libc.so.6+0x20400) > > SUMMARY: AddressSanitizer: heap-use-after-free glib-2.50.2/gio/gdbusnamewatching.c:259 in on_connection_disconnected
Created attachment 344842 [details] [review] gdbus: Fix atomic accesses to global name watch ID
This patch fixes an unrelated issue which I spotted while looking at gdbusnamewatching.c. The thread safety problem in this bug is not trivial to fix, mostly because calls to on_connection_disconnected() and on_name_owner_changed() could effectively happen in any thread (they are emitted in the main thread for the GDBusConnection, which could be different from the thread where g_bus_watch_name() was called). I think we need to either: - do all reads and writes of the Client struct in the stored Client->main_context (so they’re serialised by the main context); or - add a lock to the entire struct (so they’re serialised by the lock). We could fix the immediate problem here by changing on_connection_disconnected() to look up the Client struct from map_id_to_client, just like g_bus_unwatch_name() does. But that won’t fix the other problems in the code, where functions like call_vanished_handler(), which can be run in any thread, change state in the Client.
Review of attachment 344842 [details] [review]: Allison says: this one’s fine, obvs.
Comment on attachment 344842 [details] [review] gdbus: Fix atomic accesses to global name watch ID Attachment 344842 [details] pushed as c131865 - gdbus: Fix atomic accesses to global name watch ID
Created attachment 345006 [details] [review] workaround code
Review of attachment 345006 [details] [review]: I don’t think that introducing new public API is the right thing to do here. The two typical patterns for handling this kind of thing are: a) Hold a strong reference in the signal callback closure itself (i.e. call ref() on the pointer passed in as the user_data to g_dbus_connection_signal_subscribe()). This forces the user_data to stay alive until after the signal is explicitly disconnected. b) Hold a weak reference in the signal callback closure and atomically resolve this to a strong reference or NULL in the callback — i.e. grab a strong reference on the user_data if it still exists; otherwise bail out of the callback. This means the callback can be gracefully executed even after the object is destroyed. I think approach b) is the better approach here, as it means we don’t have to complicate the reference counting and finalisation code for Client by working out when to explicitly disconnect the signal handlers. We can implement weak references on the Client by atomically looking it up in the map_id_to_client hash table. Patch coming.
Created attachment 345030 [details] [review] gdbus: Fix race in name watching on connection teardown If g_dbus_unwatch_name() is called from one thread at the same time as the GDBusConnection is emitting ::disconnected in another thread, there will be a race and the handler for ::disconnected may end up using memory after it’s freed. Fix this by serialising through the map_id_to_client, so that on_connection_disconnected() atomically gets a strong reference to the Client, or NULL.
Review of attachment 345030 [details] [review]: Makes sense to me. ::: gio/gdbusnamewatching.c @@ +251,3 @@ +/* Return a reference to the #Client for @watcher_id, or %NULL if it’s been + * unwatched. This is safe to call from any thread. */ Personal pet peeve of mine: I prefer multiline comments to be lined up as /* bla * bla bla * bla */ Feel free to ignore
Thanks for the review. Pushed without changing the comment because I can’t be bothered and am on the fence about formatting. Noted it for future though. Attachment 345030 [details] pushed as 4dd1b17 - gdbus: Fix race in name watching on connection teardown
Downstream bug report with glib2-2.52.3: https://bugzilla.redhat.com/show_bug.cgi?id=1473339 You might consider adding the change to the stable branch as well, unless you think it's too risky for it and requires more testing.