GNOME Bugzilla – Bug 793727
Use-after-free in emit_network_changed() of gnetworkmonitorbase.c:361
Last modified: 2018-04-11 14:47:47 UTC
I've got the below use-after-free when testing some changes in evolution-data-server (unrelated changes to gio, as far as I can tell). The associated runtime warning was: (evolution-addressbook-factory:3136): GLib-GObject-CRITICAL **: g_object_ref: assertion 'object->ref_count > 0' failed and it crashed, because I had set: G_DEBUG=fatal-warnings
+ Trace 238423
Thread 1 (Thread 0x7f29f8abbac0 (LWP 3136))
Created attachment 368782 [details] gnm-crash.c This is a reproducer. It turns out that calling g_network_monitor_get_default() can cause race condition when called from a dedicated thread, not from the main thread (or better from the thread where the current main loop runs). The first line of the file contains a command line how to compile and run the test. As it's a race condition, it doesn't always trigger, thus the 'for' in execution.
Created attachment 368783 [details] [review] patch with debug prints This is a patch which helps to see what it does in the background. I thought that simple if (g_source_is_destroyed (g_main_current_source ())) return; would do the trick, but no. With this applied, I see sequences like: 0x18f40b0 g_network_monitor_base_init: 0x7f3948009f00 0x18f40b0 g_network_monitor_base_finalize: 0x7f3948009f00 0x18f3e20 emit_network_changed: already destroyed 0x7f3948009f00 0x18f40b0 g_network_monitor_base_init: 0x7f394800ad90 which proves that the added check in emit_network_changed() does a good thing, but it's still not enough, because I also see sequence: 0x25bd0b0 g_network_monitor_base_init: 0x7f313c009ee0 0x25bd0b0 g_network_monitor_base_finalize: 0x7f313c009ee0 (process:25465): GLib-GObject-CRITICAL **: g_object_ref: assertion 'object->ref_count > 0' failed 0x25bd0b0 g_network_monitor_base_init: 0x7f313c00add0 Segmentation fault (core dumped) Note it's g_network_monitor_get_default() creating an instance, then immediately killing it, then creating another instance again.
Created attachment 368789 [details] [review] proposed patch Instead of relying on whatever main context to deliver the on-idle to unset the 'initializing' boolean of the Base interface, rather chain up in the GInitable::init() of the descendants and unset the 'initializing' there.
Review of attachment 368789 [details] [review]: This looks reasonable to me. Can you split out the changes to get the classes to chain up their initable_init() calls please? And g_win32_network_monitor_initable_init() in gwin32networkmonitor.c needs to get the same change. ::: gio/gnetworkmonitorbase.c @@ -83,2 @@ monitor->priv->initializing = TRUE; - queue_network_changed (monitor); I think you still need something here: the queue_network_changed() call was presumably there to ensure that monitor->priv->is_available was initialised to the right value. That still needs to be done. ::: gio/gnetworkmonitorportal.c @@ +184,3 @@ g_network_monitor_portal_initable_iface_init (GInitableIface *iface) { + initable_parent_iface = g_type_interface_peek_parent (iface); Just put the g_type_interface_peek_parent() call in initable_init().
Review of attachment 368789 [details] [review]: A unit test which is an adapted version of your reproducer (that’s useful, thanks) would be good too. It could do the loop in main(), and it doesn’t matter if it doesn’t always trigger the conditions for a crash. Over many CI cycles, any remaining bugs will be caught.
(In reply to Philip Withnall from comment #4) > Can you split out the changes to get the > classes to chain up their initable_init() calls please? Why? One thing doesn't work without the other. It makes more sense to do it at once, from my point of view. > And g_win32_network_monitor_initable_init() in gwin32networkmonitor.c needs > to get the same change. I see, that's a very new thing, it wasn't part of the sources when I created the patch (even I created it on 2.54 sources, then my master branch checkout didn't have it too, probably because I didn't update it for some time). > ::: gio/gnetworkmonitorbase.c > @@ -83,2 @@ > monitor->priv->initializing = TRUE; > - queue_network_changed (monitor); > > I think you still need something here: the queue_network_changed() call was > presumably there to ensure that monitor->priv->is_available was initialised > to the right value. That still needs to be done. Let's see. This is in g_network_monitor_base_init(), which is called before any descendant. It means that both monitor->priv->have_ipv4_default_route and monitor->priv->have_ipv6_default_route are FALSE, the same as monitor->priv->is_available, thus the only purpose of the queue_network_changed() in the g_network_monitor_base_init() is to schedule the idle callback, but that causes the race condition. The descendants update these values through g_network_monitor_base_add_network() and g_network_monitor_base_remove_network(), when necessary. > ::: gio/gnetworkmonitorportal.c > @@ +184,3 @@ > g_network_monitor_portal_initable_iface_init (GInitableIface *iface) > { > + initable_parent_iface = g_type_interface_peek_parent (iface); > > Just put the g_type_interface_peek_parent() call in initable_init(). Do you mean to not store it as a global variable, but always gather it for every single instance of the Base descendant in the initable_init() in a way like this? return g_type_interface_peek_parent (G_INITABLE_GET_IFACE (initable))-> init (initable, cancellable, error); I'm fine with that, just the change makes things behave conceptually differently from the current expectations of G_DEFINE_TYPE macro family, which provide a global variable for parent structures (and do not repeat the lookup of the parent structure for each instance). (In reply to Philip Withnall from comment #5) > ... It could do the loop in main()... I'm afraid not. As the call is made with g_network_monitor_get_default() then once it returns something (and doesn't crash) it will always return the same instance till the end of the program. I do not know whether there's any legitimate way to free such default instance other than closing the application. I mean, I can workaround the crash by calling g_network_monitor_get_default() in the main(), but I hesitate to add any such workaround into the sources, because it doesn't scale long-term and for others.
Created attachment 369377 [details] [review] proposed patch ][ This is updated patch. The only change is that I covered also the win32 network monitor. I gave explanation for other concerns above. I tried with the: > Just put the g_type_interface_peek_parent() call in initable_init(). but it causes significant CPU usage with the above test program, while using the approach as it was had no CPU usage impact.
As I committed the change for bug #793031 and this is not in the sources, then I added a workaround for this bug into evolution-data-server in a hope that we'll be able to have this in glib sources by 2.58.0: https://git.gnome.org/browse/evolution-data-server/commit/?id=1bdbb4cd2
ping Philip, could we get to some resolution for this bug report, before the bugzilla gets into the read-only state, please?
Review of attachment 369377 [details] [review]: OK, this looks good, but I would still like a unit test. (In reply to Milan Crha from comment #6) > (In reply to Philip Withnall from comment #5) > > ... It could do the loop in main()... > > I'm afraid not. As the call is made with g_network_monitor_get_default() > then once it returns something (and doesn't crash) it will always return the > same instance till the end of the program. I do not know whether there's any > legitimate way to free such default instance other than closing the > application. I mean, I can workaround the crash by calling > g_network_monitor_get_default() in the main(), but I hesitate to add any > such workaround into the sources, because it doesn't scale long-term and for > others. Use g_test_subprocess() then.
Created attachment 370736 [details] [review] proposed patch ]I[ The same as before, only with added installable test. The 333 runs is rather an artificial number. I've got a crash at 9th run, but also after 200 runs. Thus when you run the test on unpatched GLib, then if you do not get the test failure it doesn't mean it's not there. Similarly with patched GLib, but due to the fix removing the crash condition it's unlikely to receive the crash due to the previous conditions. Please, in case I made anything wrong with respect of naming or such, you've my permission to change it just before committing to the sources. I really do not want to delay this even more due to some nitpicks or habits I'm not aware of in the glib code base. I also do not expect me doing the commit, for the similar reasons. Thanks for understanding.
Review of attachment 370736 [details] [review]: I’ll fix these problems up before pushing. ::: tests/Makefile.am @@ +95,3 @@ slice_color_SOURCES = slice-color.c memchunks.c slice_threadinit_LDADD = $(top_builddir)/gthread/libgthread-2.0.la $(LDADD) +network_monitor_test_LDADD = $(top_builddir)/gio/libgio-2.0.la $(LDADD) Needs to be added to tests/meson.build too. ::: tests/network-monitor-test.c @@ +50,3 @@ + +static void +test_network_monitor (void) This needs a comment explaining the purpose of the test, plus a `g_test_bug ("793727")` to point people back to the bug. @@ +52,3 @@ +test_network_monitor (void) +{ + gint ii; guint
git-formatted patches with a full commit message would be helpful in future, too. The test should be in gio/tests/, not tests/ — GNetworkMonitor is part of GIO, not GLib.
Modified and pushed to master as ca0add4b8a59313517a482a6664ec5ad37182c49.
(In reply to Philip Withnall from comment #12) > I’ll fix these problems up before pushing. Thanks. > Needs to be added to tests/meson.build too. It didn't look similarly as the Makefile.am, thus I rather didn't touch it. > guint Not a big deal in this case, just a matter of taste and habits :) (Just for consistency, I'm marking the patch from 'committed' to 'reviewed', because you didn't commit it as is, you made some build & code changes in it.)
> 138/194 network-monitor-race TIMEOUT 30.04 s Maybe lower the MAX_RUNS, 333*100ms means the test will run at least 33.3s, or change the timeout for this test, whichever is easier to do.
commit a90c578952219e740f24d9f2560f54f19ae7e906 (HEAD -> master, origin/master, origin/HEAD) Author: Philip Withnall <withnall@endlessm.com> Date: Wed Apr 11 15:45:10 2018 +0100 tests: Lower number of iterations in network-monitor-race While 333 runs is very likely to reproduce the bug, Milan has previously reproduced it with as few as 9 runs. Since this test will be run by the CI machinery quite often, a lower number of runs each CI run will still probably catch any regressions over time. This reduces the total test runtime from 33s to 2s. https://bugzilla.gnome.org/show_bug.cgi?id=793727 Signed-off-by: Philip Withnall <withnall@endlessm.com> Reviewed-by: nobody gio/tests/network-monitor-race.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)