GNOME Bugzilla – Bug 753874
[review] fix races in manager/prop_filter()
Last modified: 2015-08-21 15:32:58 UTC
please review
th/prop-filter-bgo753874
> manager: don't invoke non-thread-safe operations during prop_filter() >+ reply = g_dbus_message_new_method_error (pfd->message, >+ NM_PERM_DENIED_ERROR, >+ (error_message = "Object doesn't exist.")); NM_PERM_DENIED_ERROR isn't the right error here. It should be 'org.freedesktop.DBus.Error.UnknownObject'. But alternatively, you could just add a mutex to NMBusManager so that you can make the call from the filter thread, and then just let gdbus return the error itself if the object isn't there. > manager: check for object type in do_set_property_check() (Likewise, the error here should be 'org.freedesktop.DBus.Error.InvalidArgs') > manager: fix race subscribing prop_filter() I think this is a bug in g_dbus_connection_remove_filter(); it ought to be guaranteed that if you remove a filter while it's being run, that the GDestroyNotify doesn't run until after the filter finishes running. And this patch does not fix the problem; if you remove the filter after on_worker_message_received() has copied the filter list but before it invokes our filter, then the user_data will get g_slice_free()ed, and so the g_weak_ref_get() in prop_filter() will read freed memory.
(In reply to Dan Winship from comment #2) > > manager: don't invoke non-thread-safe operations during prop_filter() > > >+ reply = g_dbus_message_new_method_error (pfd->message, > >+ NM_PERM_DENIED_ERROR, > >+ (error_message = "Object doesn't exist.")); > > NM_PERM_DENIED_ERROR isn't the right error here. It should be > 'org.freedesktop.DBus.Error.UnknownObject'. But alternatively, you could > just add a mutex to NMBusManager so that you can make the call from the > filter thread, and then just let gdbus return the error itself if the object > isn't there. Yes, we could make NMBusManager thread-safe. But I think this is simpler, isn't it? We have very few functions that are used multi-threaded (prop_filter() being one of them). It seems simpler to keep the thread-safe API as small as possible. Error fixed. > > manager: check for object type in do_set_property_check() > > (Likewise, the error here should be 'org.freedesktop.DBus.Error.InvalidArgs') fixed. > > manager: fix race subscribing prop_filter() > > I think this is a bug in g_dbus_connection_remove_filter(); it ought to be > guaranteed that if you remove a filter while it's being run, that the > GDestroyNotify doesn't run until after the filter finishes running. > > And this patch does not fix the problem; if you remove the filter after > on_worker_message_received() has copied the filter list but before it > invokes our filter, then the user_data will get g_slice_free()ed, and so the > g_weak_ref_get() in prop_filter() will read freed memory. Oh right. This is a race and indeed a bug in g_dbus_connection_remove_filter() because you cannot unsubscribe safely. I added a semi-workaround for that issue. I didn't see that issue. The patch was for another race: even if g_dbus_connection_remove_filter() would delay calling the destroy-function until all filters ran, it could still invoke the filter with a disposed/disposing/post-finalized NMManager instance. Repushed. How about now?
(In reply to Thomas Haller from comment #3) > (In reply to Dan Winship from comment #2) > > > manager: fix race subscribing prop_filter() > > > > I think this is a bug in g_dbus_connection_remove_filter(); it ought to be > > guaranteed that if you remove a filter while it's being run, that the > > GDestroyNotify doesn't run until after the filter finishes running. FTR: it's bug 704568.
(In reply to Thomas Haller from comment #3) > I didn't see that issue. The patch was for another race: even if > g_dbus_connection_remove_filter() would delay calling the destroy-function > until all filters ran, it could still invoke the filter with a > disposed/disposing/post-finalized NMManager instance. Ah, right. Though unlikely, the new code could still hit g_dbus_connection_remove_filter()'s race condition though: (worker thread) on_worker_message_received() filters[n] = connection->filters[n] (main thread) g_dbus_connection_remove_filter() _set_prop_filter_free() g_idle_add (_set_prop_filter_free2) ... _set_prop_filter_free2() g_slice_free() message = filters[n].func (...) g_weak_ref_get() So maybe it's not worth trying to work around this while the bug still exists in glib, and you should just drop the last fixup. Everything else looks good.
merged as http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d46a2c1af396a5d5b9cd95865f36544a8282076c