GNOME Bugzilla – Bug 788368
Race condition in GDbusObjectManagerClient
Last modified: 2017-10-11 15:36:51 UTC
We see frequent crashes of nmcli in the GLib code when the object manager client is initialized. For example: Thread 1: #0 in _int_malloc (av=av@entry=0x7f8470000020, bytes=bytes@entry=128) at malloc.c:3782 #1 in __GI___libc_malloc (bytes=128) at malloc.c:2897 #2 in g_realloc (mem=0x0, n_bytes=128) at gmem.c:159 #3 in g_ptr_array_maybe_expand (array=array@entry=0x7f847800b4e0, len=len@entry=1) at garray.c:1118 #4 in g_ptr_array_add (array=0x7f847800b4e0, data=0x7f847000aec0) at garray.c:1382 #5 in g_main_context_check (context=context@entry=0x7f847000ad60, max_priority=0, fds=fds@entry=0x7f847000afb0, n_fds=n_fds@entry=1) at gmain.c:3817 #6 in g_main_context_iterate (context=0x7f847000ad60, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3924 #7 in g_main_loop_run (loop=0x7f847000ac60) at gmain.c:4123 #8 in initable_init (initable=0x7f8470009a50, cancellable=0x0, error=0x7f8486763a98) at gdbusproxy.c:1949 #9 in g_initable_new_valist (object_type=<optimized out>, first_property_name=first_property_name@entry=0x7f8490ccaf90 "g-connection", var_args=var_args@entry=0x7f8486763910, cancellable=cancellable@entry=0x0, error=error@entry=0x7f8486763a98) at ginitable.c:228 #10 in g_initable_new (object_type=object_type@entry=94165753751728, cancellable=cancellable@entry=0x0, error=error@entry=0x7f8486763a98, first_property_name=first_property_name@entry=0x7f8490ccaf90 "g-connection") at ginitable.c:146 #11 in add_interfaces (manager=manager@entry=0x55a4ac4e3460 [GDBusObjectManagerClient], object_path=0x7f847c06d5e0 "/org/freedesktop/NetworkManager/IP4Config/75", ifaces_and_properties=<optimized out>, name_owner=name_owner@entry=0x7f8478019310 ":1.1680") at gdbusobjectmanagerclient.c:1495 #12 in process_get_all_result (manager=0x55a4ac4e3460 [GDBusObjectManagerClient], value=<optimized out>, name_owner=0x7f8478019310 ":1.1680") at gdbusobjectmanagerclient.c:1647 #13 in initable_init (initable=0x55a4ac4e3460, cancellable=0x0, error=0x7f8486763d20) at gdbusobjectmanagerclient.c:1400 #14 in async_init_thread (task=0x55a4ac4e9840 [GTask], source_object=<optimized out>, task_data=<optimized out>, cancellable=<optimized out>) at gasyncinitable.c:257 ... Thread 2: #0 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 in __GI_abort () at abort.c:90 #2 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f848ffd4608 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 in _int_free (ar_ptr=0x7f8478000020, ptr=<optimized out>, str=0x7f848ffd46a0 "free(): invalid next size (normal)", action=3) at malloc.c:5023 #4 in _int_free (av=0x7f8478000020, p=<optimized out>, have_lock=0) at malloc.c:3845 #5 in g_free (mem=0x7f847801ca30) at gmem.c:189 #6 in g_hash_table_resize (hash_table=hash_table@entry=0x55a4ac4cb580 = {...}) at ghash.c:619 #7 in g_hash_table_insert_node (hash_table=0x55a4ac4cb580 = {...}) at ghash.c:645 #8 in g_hash_table_insert_node (hash_table=hash_table@entry=0x55a4ac4cb580 = {...}, node_index=node_index@entry=4, key_hash=key_hash@entry=3982537675, new_key=new_key@entry=0x55a4ac4e97c0, new_value=new_value@entry=0x55a4ac4e2160, keep_new_key=keep_new_key@entry=0, reusing_key=reusing_key@entry=0) at ghash.c:978 #9 in g_hash_table_insert_internal (hash_table=0x55a4ac4cb580 = {...}, key=0x55a4ac4e97c0, value=0x55a4ac4e2160, keep_new_key=0) at ghash.c:1229 #10 in add_interfaces (manager=manager@entry=0x55a4ac4e3460 [GDBusObjectManagerClient], object_path=0x55a4ac4e9640 "/org/freedesktop/NetworkManager/IP6Config/85", ifaces_and_properties=<optimized out>, name_owner=0x7f8478019310 ":1.1680") at gdbusobjectmanagerclient.c:1558 #11 in on_control_proxy_g_signal (proxy=<optimized out>, sender_name=<optimized out>, signal_name=0x55a4ac4e7bb0 "InterfacesAdded", parameters=0x7f8478009070, user_data=0x55a4ac4e3460) at gdbusobjectmanagerclient.c:1672 #12 in ffi_call_unix64 () at ../src/x86/unix64.S:76 #13 in ffi_call (cif=cif@entry=0x7ffd3f42aea0, fn=<optimized out>, rvalue=0x7ffd3f42ae10, avalue=avalue@entry=0x7ffd3f42ad90) at ../src/x86/ffi64.c:522 #18 in <emit signal ??? on instance 0x7f8478004400 [GDBusProxy]> (instance=instance@entry=0x7f8478004400, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3447 #14 in g_cclosure_marshal_generic (closure=0x7f8478019f70, return_gvalue=0x0, n_param_values=<optimized out>, param_values=<optimized out>, invocation_hint=<optimized out>, marshal_data=0x0) at gclosure.c:1490 #15 in g_closure_invoke (closure=0x7f8478019f70, return_value=return_value@entry=0x0, n_param_values=4, param_values=param_values@entry=0x7ffd3f42b0a0, invocation_hint=invocation_hint@entry=0x7ffd3f42b040) at gclosure.c:804 #16 in signal_emit_unlocked_R (node=node@entry=0x7f8478018290, detail=detail@entry=0, instance=instance@entry=0x7f8478004400, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7ffd3f42b0a0) at gsignal.c:3635 #17 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7ffd3f42b260) at gsignal.c:3391 #19 in on_signal_received (connection=<optimized out>, sender_name=0x7f847c0721f0 ":1.1680", object_path=<optimized out>, interface_name=<optimized out>, signal_name=0x7f847c0722a0 "InterfacesAdded", parameters=0x7f8478009070, user_data=0x7f8478015d60) at gdbusproxy.c:917 #20 in emit_signal_instance_in_idle_cb (data=data@entry=0x7f847c074950) at gdbusconnection.c:3705 #21 in g_idle_dispatch (source=0x7f847c04d0d0, callback=0x7f8490c4a740 <emit_signal_instance_in_idle_cb>, user_data=0x7f847c074950) at gmain.c:5543 ... It seems to me that the call to: g_hash_table_insert (manager->priv->map_object_path_to_object_proxy, ...) in add_interfaces() in not protected by locks and can race with other operations on the hash table in other threads.
Created attachment 360695 [details] [review] [PATCH] gio: fix race condition in GDBusObjectManagerClient
Review of attachment 360695 [details] [review]: Some searching in the history led me to https://git.gnome.org/browse/glib/commit/?id=4e7ef619ebe025bd97d2a3ef8dd8c1b7baa903b2 and from the looks of it, indeed this was just an oversight.
Review of attachment 360695 [details] [review]: I’d prefer it if the g_hash_table_insert() call were moved higher up in the function, inside the existing critical section. Introducing a second critical section to add_interfaces() seems like it has more potential to cause problems.
Created attachment 360761 [details] [review] [PATCH v2] gio: fix race condition in GDBusObjectManagerClient (In reply to Philip Withnall from comment #3) > I’d prefer it if the g_hash_table_insert() call were moved higher up in the > function, inside the existing critical section. Introducing a second > critical section to add_interfaces() seems like it has more potential to > cause problems. Makes sense, patch updated.
Review of attachment 360761 [details] [review]: That looks good to me. Does it fix the nmcli problem you were seeing?
(In reply to Philip Withnall from comment #5) > That looks good to me. Does it fix the nmcli problem you were seeing? We have a test suite [1] that runs nightly on internal infrastructure and in September we got 10 crash reports that look related to this issue. Now tests are running with the patched version; I have to wait some days to confirm that the fix is working. [1] https://github.com/NetworkManager/NetworkManager-ci
Ah, excellent. Let’s wait a week or two and see if any failures appear there before pushing to master. Thanks a lot, that CI stuff is very helpful.
Beniamino, have any problems showed up in the CI testing over the last week?
(In reply to Philip Withnall from comment #8) > Beniamino, have any problems showed up in the CI testing over the last week? No, I haven't seen any problem since we switched to the patched version.
(In reply to Beniamino Galvani from comment #9) > (In reply to Philip Withnall from comment #8) > > Beniamino, have any problems showed up in the CI testing over the last week? > > No, I haven't seen any problem since we switched to the patched version. OK, I’ll push it then. Thanks a lot for the patch and for testing. :-)