After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 793727 - Use-after-free in emit_network_changed() of gnetworkmonitorbase.c:361
Use-after-free in emit_network_changed() of gnetworkmonitorbase.c:361
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gio
2.54.x
Other Linux
: Normal critical
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 793031
 
 
Reported: 2018-02-22 17:26 UTC by Milan Crha
Modified: 2018-04-11 14:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gnm-crash.c (845 bytes, text/plain)
2018-02-22 18:23 UTC, Milan Crha
  Details
patch with debug prints (1.26 KB, patch)
2018-02-22 18:29 UTC, Milan Crha
none Details | Review
proposed patch (4.27 KB, patch)
2018-02-22 19:03 UTC, Milan Crha
none Details | Review
proposed patch ][ (5.29 KB, patch)
2018-03-06 11:36 UTC, Milan Crha
none Details | Review
proposed patch ]I[ (8.04 KB, patch)
2018-04-10 15:27 UTC, Milan Crha
reviewed Details | Review

Description Milan Crha 2018-02-22 17:26:47 UTC
I've got the below use-after-free when testing some changes in evolution-data-server (unrelated changes to gio, as far as I can tell). The associated runtime warning was:

(evolution-addressbook-factory:3136): GLib-GObject-CRITICAL **: g_object_ref: assertion 'object->ref_count > 0' failed

and it crashed, because I had set: G_DEBUG=fatal-warnings

Thread 1 (Thread 0x7f29f8abbac0 (LWP 3136))

  • #0 waitpid
  • #1 do_system
  • #2 bugbuddy_segv_handle
    at gnome-segvhanlder.c line 180
  • #3 <signal handler called>
  • #4 _g_log_abort
    at gmessages.c line 554
  • #5 g_logv
    at gmessages.c line 1362
  • #6 g_log
    at gmessages.c line 1403
  • #7 g_return_if_fail_warning
    at gmessages.c line 2702
  • #8 g_object_ref
    at gobject.c line 3197
  • #9 emit_network_changed
    at gnetworkmonitorbase.c line 361
  • #10 g_idle_dispatch
    at gmain.c line 5486
  • #11 g_main_dispatch
    at gmain.c line 3142
  • #12 g_main_context_dispatch
    at gmain.c line 3795
  • #13 g_main_context_iterate
    at gmain.c line 3868
  • #14 g_main_loop_run
    at gmain.c line 4064
  • #15 dbus_server_run_server
    at .../evolution-data-server/src/libebackend/e-dbus-server.c line 247
  • #16 ffi_call_unix64
  • #17 ffi_call
  • #18 g_cclosure_marshal_generic_va
    at gclosure.c line 1604
  • #19 _g_closure_invoke_va
    at gclosure.c line 867
  • #20 g_signal_emit_valist
    at gsignal.c line 3300
  • #21 g_signal_emit
    at gsignal.c line 3447
  • #22 e_dbus_server_run
    at .../evolution-data-server/src/libebackend/e-dbus-server.c line 441
  • #23 main
    at ...evolution-data-server/src/services/evolution-addressbook-factory/evolution-addressbook-factory.c line 103

Comment 1 Milan Crha 2018-02-22 18:23:01 UTC
Created attachment 368782 [details]
gnm-crash.c

This is a reproducer. It turns out that calling g_network_monitor_get_default() can cause race condition when called from a dedicated thread, not from the main thread (or better from the thread where the current main loop runs).

The first line of the file contains a command line how to compile and run the test. As it's a race condition, it doesn't always trigger, thus the 'for' in execution.
Comment 2 Milan Crha 2018-02-22 18:29:25 UTC
Created attachment 368783 [details] [review]
patch with debug prints

This is a patch which helps to see what it does in the background. I thought that simple

   if (g_source_is_destroyed (g_main_current_source ()))
      return;

would do the trick, but no. With this applied, I see sequences like:

   0x18f40b0 g_network_monitor_base_init: 0x7f3948009f00
   0x18f40b0 g_network_monitor_base_finalize: 0x7f3948009f00
   0x18f3e20 emit_network_changed: already destroyed 0x7f3948009f00
   0x18f40b0 g_network_monitor_base_init: 0x7f394800ad90

which proves that the added check in emit_network_changed() does a good thing, but it's still not enough, because I also see sequence:

   0x25bd0b0 g_network_monitor_base_init: 0x7f313c009ee0
   0x25bd0b0 g_network_monitor_base_finalize: 0x7f313c009ee0

   (process:25465): GLib-GObject-CRITICAL **: g_object_ref: assertion
   'object->ref_count > 0' failed
   0x25bd0b0 g_network_monitor_base_init: 0x7f313c00add0
   Segmentation fault (core dumped)

Note it's g_network_monitor_get_default() creating an instance, then immediately killing it, then creating another instance again.
Comment 3 Milan Crha 2018-02-22 19:03:15 UTC
Created attachment 368789 [details] [review]
proposed patch

Instead of relying on whatever main context to deliver the on-idle to unset the 'initializing' boolean of the Base interface, rather chain up in the GInitable::init() of the descendants and unset the 'initializing' there.
Comment 4 Philip Withnall 2018-02-27 11:56:05 UTC
Review of attachment 368789 [details] [review]:

This looks reasonable to me. Can you split out the changes to get the classes to chain up their initable_init() calls please? And g_win32_network_monitor_initable_init() in gwin32networkmonitor.c needs to get the same change.

::: gio/gnetworkmonitorbase.c
@@ -83,2 @@
   monitor->priv->initializing = TRUE;
-  queue_network_changed (monitor);

I think you still need something here: the queue_network_changed() call was presumably there to ensure that monitor->priv->is_available was initialised to the right value. That still needs to be done.

::: gio/gnetworkmonitorportal.c
@@ +184,3 @@
 g_network_monitor_portal_initable_iface_init (GInitableIface *iface)
 {
+  initable_parent_iface = g_type_interface_peek_parent (iface);

Just put the g_type_interface_peek_parent() call in initable_init().
Comment 5 Philip Withnall 2018-02-27 11:57:36 UTC
Review of attachment 368789 [details] [review]:

A unit test which is an adapted version of your reproducer (that’s useful, thanks) would be good too. It could do the loop in main(), and it doesn’t matter if it doesn’t always trigger the conditions for a crash. Over many CI cycles, any remaining bugs will be caught.
Comment 6 Milan Crha 2018-02-27 15:21:42 UTC
(In reply to Philip Withnall from comment #4)
> Can you split out the changes to get the
> classes to chain up their initable_init() calls please?

Why? One thing doesn't work without the other. It makes more sense to do it at once, from my point of view.

> And g_win32_network_monitor_initable_init() in gwin32networkmonitor.c needs
> to get the same change.

I see, that's a very new thing, it wasn't part of the sources when I created the patch (even I created it on 2.54 sources, then my master branch checkout didn't have it too, probably because I didn't update it for some time).

> ::: gio/gnetworkmonitorbase.c
> @@ -83,2 @@
>    monitor->priv->initializing = TRUE;
> -  queue_network_changed (monitor);
> 
> I think you still need something here: the queue_network_changed() call was
> presumably there to ensure that monitor->priv->is_available was initialised
> to the right value. That still needs to be done.

Let's see. This is in g_network_monitor_base_init(), which is called before any descendant. It means that both monitor->priv->have_ipv4_default_route and
monitor->priv->have_ipv6_default_route are FALSE, the same as monitor->priv->is_available, thus the only purpose of the queue_network_changed() in the g_network_monitor_base_init() is to schedule the idle callback, but that causes the race condition. The descendants update these values through g_network_monitor_base_add_network() and g_network_monitor_base_remove_network(), when necessary.

> ::: gio/gnetworkmonitorportal.c
> @@ +184,3 @@
>  g_network_monitor_portal_initable_iface_init (GInitableIface *iface)
>  {
> +  initable_parent_iface = g_type_interface_peek_parent (iface);
> 
> Just put the g_type_interface_peek_parent() call in initable_init().

Do you mean to not store it as a global variable, but always gather it for every single instance of the Base descendant in the initable_init() in a way like this?

   return g_type_interface_peek_parent (G_INITABLE_GET_IFACE (initable))->
        init (initable, cancellable, error);

I'm fine with that, just the change makes things behave conceptually differently from the current expectations of G_DEFINE_TYPE macro family, which provide a global variable for parent structures (and do not repeat the lookup of the parent structure for each instance).

(In reply to Philip Withnall from comment #5)
> ... It could do the loop in main()...

I'm afraid not. As the call is made with g_network_monitor_get_default() then once it returns something (and doesn't crash) it will always return the same instance till the end of the program. I do not know whether there's any legitimate way to free such default instance other than closing the application. I mean, I can workaround the crash by calling g_network_monitor_get_default() in the main(), but I hesitate to add any such workaround into the sources, because it doesn't scale long-term and for others.
Comment 7 Milan Crha 2018-03-06 11:36:44 UTC
Created attachment 369377 [details] [review]
proposed patch ][

This is updated patch. The only change is that I covered also the win32 network monitor. I gave explanation for other concerns above. I tried with the:

> Just put the g_type_interface_peek_parent() call in initable_init().

but it causes significant CPU usage with the above test program, while using the approach as it was had no CPU usage impact.
Comment 8 Milan Crha 2018-03-12 15:12:22 UTC
As I committed the change for bug #793031 and this is not in the sources, then I added a workaround for this bug into evolution-data-server in a hope that we'll be able to have this in glib sources by 2.58.0:
https://git.gnome.org/browse/evolution-data-server/commit/?id=1bdbb4cd2
Comment 9 Milan Crha 2018-03-27 13:31:52 UTC
ping Philip, could we get to some resolution for this bug report, before the bugzilla gets into the read-only state, please?
Comment 10 Philip Withnall 2018-04-10 11:19:53 UTC
Review of attachment 369377 [details] [review]:

OK, this looks good, but I would still like a unit test.

(In reply to Milan Crha from comment #6)
> (In reply to Philip Withnall from comment #5)
> > ... It could do the loop in main()...
> 
> I'm afraid not. As the call is made with g_network_monitor_get_default()
> then once it returns something (and doesn't crash) it will always return the
> same instance till the end of the program. I do not know whether there's any
> legitimate way to free such default instance other than closing the
> application. I mean, I can workaround the crash by calling
> g_network_monitor_get_default() in the main(), but I hesitate to add any
> such workaround into the sources, because it doesn't scale long-term and for
> others.

Use g_test_subprocess() then.
Comment 11 Milan Crha 2018-04-10 15:27:17 UTC
Created attachment 370736 [details] [review]
proposed patch ]I[

The same as before, only with added installable test. The 333 runs is rather an artificial number. I've got a crash at 9th run, but also after 200 runs. Thus when you run the test on unpatched GLib, then if you do not get the test failure it doesn't mean it's not there. Similarly with patched GLib, but due to the fix removing the crash condition it's unlikely to receive the crash due to the previous conditions.

Please, in case I made anything wrong with respect of naming or such, you've my permission to change it just before committing to the sources. I really do not want to delay this even more due to some nitpicks or habits I'm not aware of in the glib code base. I also do not expect me doing the commit, for the similar reasons. Thanks for understanding.
Comment 12 Philip Withnall 2018-04-11 13:48:36 UTC
Review of attachment 370736 [details] [review]:

I’ll fix these problems up before pushing.

::: tests/Makefile.am
@@ +95,3 @@
 slice_color_SOURCES = slice-color.c memchunks.c
 slice_threadinit_LDADD = $(top_builddir)/gthread/libgthread-2.0.la $(LDADD)
+network_monitor_test_LDADD = $(top_builddir)/gio/libgio-2.0.la $(LDADD)

Needs to be added to tests/meson.build too.

::: tests/network-monitor-test.c
@@ +50,3 @@
+
+static void
+test_network_monitor (void)

This needs a comment explaining the purpose of the test, plus a `g_test_bug ("793727")` to point people back to the bug.

@@ +52,3 @@
+test_network_monitor (void)
+{
+  gint ii;

guint
Comment 13 Philip Withnall 2018-04-11 14:03:36 UTC
git-formatted patches with a full commit message would be helpful in future, too.

The test should be in gio/tests/, not tests/ — GNetworkMonitor is part of GIO, not GLib.
Comment 14 Philip Withnall 2018-04-11 14:04:53 UTC
Modified and pushed to master as ca0add4b8a59313517a482a6664ec5ad37182c49.
Comment 15 Milan Crha 2018-04-11 14:24:34 UTC
(In reply to Philip Withnall from comment #12)
> I’ll fix these problems up before pushing.

Thanks.

> Needs to be added to tests/meson.build too.

It didn't look similarly as the Makefile.am, thus I rather didn't touch it.

> guint

Not a big deal in this case, just a matter of taste and habits :)

(Just for consistency, I'm marking the patch from 'committed' to 'reviewed', because you didn't commit it as is, you made some build & code changes in it.)
Comment 16 Milan Crha 2018-04-11 14:29:26 UTC
> 138/194 network-monitor-race                    TIMEOUT 30.04 s

Maybe lower the MAX_RUNS, 333*100ms means the test will run at least 33.3s, or change the timeout for this test, whichever is easier to do.
Comment 17 Philip Withnall 2018-04-11 14:47:47 UTC
commit a90c578952219e740f24d9f2560f54f19ae7e906 (HEAD -> master, origin/master, origin/HEAD)
Author: Philip Withnall <withnall@endlessm.com>
Date:   Wed Apr 11 15:45:10 2018 +0100

    tests: Lower number of iterations in network-monitor-race
    
    While 333 runs is very likely to reproduce the bug, Milan has previously
    reproduced it with as few as 9 runs. Since this test will be run by the
    CI machinery quite often, a lower number of runs each CI run will still
    probably catch any regressions over time.
    
    This reduces the total test runtime from 33s to 2s.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=793727
    
    Signed-off-by: Philip Withnall <withnall@endlessm.com>
    Reviewed-by: nobody

 gio/tests/network-monitor-race.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)