After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 674473 - Crash when restarting NetworkManager
Crash when restarting NetworkManager
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: API
git master
Other Linux
: Normal critical
: ---
Assigned To: Dan Williams
Dan Williams
Depends on:
Blocks:
 
 
Reported: 2012-04-20 14:52 UTC by Cosimo Cecchi
Modified: 2012-04-23 22:51 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
NMClient: don't force dispose active connections when NM restarts (2.14 KB, patch)
2012-04-20 16:43 UTC, Giovanni Campagna
none Details | Review
NMActiveConnection: clear out devices on dispose (985 bytes, patch)
2012-04-20 16:44 UTC, Giovanni Campagna
reviewed Details | Review
NMObjectCache: don't reuse disposed objects (1.43 KB, patch)
2012-04-20 16:45 UTC, Giovanni Campagna
none Details | Review
NMClient: don't force dispose active connections when NM restarts (2.14 KB, patch)
2012-04-20 16:46 UTC, Giovanni Campagna
none Details | Review
libnm-glib: NULL out priv fields on dispose() (19.81 KB, patch)
2012-04-23 16:38 UTC, Dan Winship
committed Details | Review
libnm-glib: discard devices and active connections when NM goes down (3.92 KB, patch)
2012-04-23 16:39 UTC, Dan Winship
committed Details | Review
Clear object cache when NM stops (1.96 KB, patch)
2012-04-23 22:06 UTC, Dan Williams
committed Details | Review

Description Cosimo Cecchi 2012-04-20 14:52:43 UTC
I got this while updating NetworkManager; note that the daemon is restarted upon upgrade

Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal 11, Segmentation fault.

Thread 1 (Thread 0x7f3ccc89e9c0 (LWP 1000))

  • #0 g_type_check_instance_cast
    at gtype.c line 3994
  • #1 gjs_value_from_g_argument
    at gi/arg.c line 2468
  • #2 gjs_array_from_carray_internal
    at gi/arg.c line 1985
  • #3 gjs_array_from_boxed_array
    at gi/arg.c line 2078
  • #4 gjs_value_from_g_argument
    at gi/arg.c line 2543
  • #5 gjs_invoke_c_function
    at gi/function.c line 988
  • #6 function_call
    at gi/function.c line 1245
  • #7 CallJSNative
    at jscntxtinlines.h line 701
  • #8 js::Invoke
    at jsinterp.cpp line 696
  • #9 js::Interpret
    at jsinterp.cpp line 4810
  • #10 js::RunScript
    at jsinterp.cpp line 653
  • #11 js::Invoke
    at jsinterp.cpp line 740
  • #12 js_fun_apply
    at jsfun.cpp line 2205
  • #13 CallJSNative
    at jscntxtinlines.h line 701
  • #14 js::Interpret
    at jsinterp.cpp line 4799
  • #15 js::RunScript
    at jsinterp.cpp line 653
  • #16 js::Invoke
    at jsinterp.cpp line 740
  • #17 js_fun_apply
    at jsfun.cpp line 2205
  • #18 CallJSNative
    at jscntxtinlines.h line 701
  • #19 js::Interpret
    at jsinterp.cpp line 4799
  • #20 js::RunScript
    at jsinterp.cpp line 653
  • #21 js::Invoke
    at jsinterp.cpp line 740
  • #22 js_fun_apply
    at jsfun.cpp line 2205
  • #23 CallJSNative
    at jscntxtinlines.h line 701
  • #24 js::Interpret
    at jsinterp.cpp line 4799
  • #25 js::RunScript
    at jsinterp.cpp line 653
  • #26 js::Invoke
    at jsinterp.cpp line 740
  • #27 js::CallOrConstructBoundFunction
    at jsfun.cpp line 2319
  • #28 CallJSNative
    at jscntxtinlines.h line 701
  • #29 js::Invoke
    at jsinterp.cpp line 703
  • #30 js::ExternalInvoke
    at jsinterp.cpp line 863
  • #31 JS_CallFunctionValue
    at jsapi.cpp line 5145
  • #32 gjs_call_function_value
    at gjs/jsapi-util.c line 1183
  • #33 gjs_closure_invoke
    at gi/closure.c line 267
  • #34 closure_marshal
    at gi/value.c line 125
  • #35 g_closure_invoke
    at gclosure.c line 777
  • #36 signal_emit_unlocked_R
    at gsignal.c line 3547
  • #37 g_signal_emit_valist
    at gsignal.c line 3296
  • #38 g_signal_emit
    at gsignal.c line 3352
  • #39 g_object_dispatch_properties_changed
    at gobject.c line 1041
  • #40 g_object_notify_by_spec_internal
    at gobject.c line 1133
  • #41 g_object_notify
    at gobject.c line 1175
  • #42 deferred_notify_cb
    at nm-object.c line 393
  • #43 g_main_dispatch
    at gmain.c line 2515
  • #44 g_main_context_dispatch
    at gmain.c line 3052
  • #45 g_main_context_iterate
    at gmain.c line 3123
  • #46 g_main_loop_run
    at gmain.c line 3317
  • #47 meta_run
    at core/main.c line 555
  • #48 main
    at main.c line 333

Comment 1 Giovanni Campagna 2012-04-20 16:38:49 UTC
Found the bug (also reported as https://bugzilla.redhat.com/show_bug.cgi?id=806649), it is in libnm-glib, so reassigning.
Btw, I found multiple ways to fix it, each with different problems. Up to dcbw to decide.
Comment 2 Giovanni Campagna 2012-04-20 16:43:23 UTC
Created attachment 212438 [details] [review]
NMClient: don't force dispose active connections when NM restarts

Disposing of objects breaks expectations of API users (it removes
signal connections, for example), and may leave them with disposed
(and thus totally unusable) objects that won't go away until
finalized.

----

This is the simplest fix possible. It leaves a disposed NMActiveConnection
around, though, so we get criticals trying to disconnect signals afterwards,
and the object is kept in cache, despite not getting updates from DBus.
Comment 3 Giovanni Campagna 2012-04-20 16:44:29 UTC
Created attachment 212439 [details] [review]
NMActiveConnection: clear out devices on dispose

The devices property may still be accessed after dispose (since that
is forced by NMClient when NM is restarted), and not clearing causes
an invalid memory access and a segfault.

-----

This is the simplest fix possible. It leaves a disposed NMActiveConnection
around, though, so we get criticals trying to disconnect signals afterwards,
and the object is kept in cache, despite not getting updates from DBus.

-----

Obviously, this is the patch I meant.
Comment 4 Giovanni Campagna 2012-04-20 16:45:59 UTC
Created attachment 212440 [details] [review]
NMObjectCache: don't reuse disposed objects

If a object is disposed, it still exists in the cache until finalized
(since GDatalist members in GObject are cleared in finalize()), but
cannot be used. So use a weak ref instead, as those are cleared in
dispose().

------

This is another possibility. It ensures NMActiveConnection is out
of cache after disposal (so nm_active_connection_get_devices() is
back to work and the icon updates itself correctly), but apparently
doesn't prevent the object from being used for a short while (we still
get the criticals).
Comment 5 Giovanni Campagna 2012-04-20 16:46:58 UTC
Created attachment 212441 [details] [review]
NMClient: don't force dispose active connections when NM restarts

Disposing of objects breaks expectations of API users (it removes
signal connections, for example), and may leave them with disposed
(and thus totally unusable) objects that won't go away until
finalized.

------

This is the third, and most invasive, fix. It simply avoids disposing
the objects at all, leaving that to the GC or to manual memory management
from the user. It fixes all noticeable problems, including criticals,
but could cause leaks.
Comment 6 Dan Williams 2012-04-20 21:06:37 UTC
The problem with #3 is that NMDevice and NMActiveConnection both reference each other, so they'll never be freed if we don't break that.  The change to break circular ref and ensure that we correctly disposed things was made in 1138f136f8c9b126b5090906362db1c7519d39d7.  Before that change, objects weren't being disposed, and thus weren't being removed from the object cache, which caused duplicate devices to be returned in nm_client_get_devices() for a variety of reasons, and also caused crashes because the old object (which should have been disposed when NM quit) would have the same object path as a new object from the newly started NM.  I'm pretty sure I tried clearing out the object cache while developing 1138f136 and that had some issues too.

In reality the only cached top-level object is NMClient, and when the NMClient goes away everything else should go away as well.  Not quite sure how to fix this yet...
Comment 7 Dan Williams 2012-04-20 21:13:00 UTC
I think at least we need to ensure the cache is cleaned so that internally we don't get old objects.  In reality the cache is probably too simplistic with it's removal of objects when they're destroyed; that behavior fails if clients have refcounting problems and NM subsequently restarts.  So I guess we should re-investigate (a) making sure the ref cycle is broken between NMDevice and NMActiveConnection, and (b) dropping the entire cache on the floor when NM restarts.
Comment 8 Dan Winship 2012-04-23 14:39:09 UTC
Comment on attachment 212439 [details] [review]
NMActiveConnection: clear out devices on dispose

NULLing things on dispose is probably a good idea regardless of what else we do. We'd want to fix more than just that one specific field though.
Comment 9 Dan Winship 2012-04-23 16:38:43 UTC
Created attachment 212616 [details] [review]
libnm-glib: NULL out priv fields on dispose()

In some situations, objects might get used after being disposed, so
clear out their various priv fields so we don't try to access unreffed
objects, freed strings, etc.
Comment 10 Dan Winship 2012-04-23 16:39:33 UTC
Created attachment 212617 [details] [review]
libnm-glib: discard devices and active connections when NM goes down

When NMClient changes state to "not running", don't just unref all the
devices and connections: emit notify::active-connections and
device-removed signals too, so the app will drop its copies of them.

======

another approach, which I think should work...
Comment 11 Dan Williams 2012-04-23 17:02:32 UTC
Review of attachment 212616 [details] [review]:

Looks good.
Comment 12 Dan Williams 2012-04-23 17:04:11 UTC
Review of attachment 212617 [details] [review]:

Looks good too.
Comment 13 Dan Winship 2012-04-23 20:13:20 UTC
still needs another patch to empty out the NMObject cache
Comment 14 Dan Williams 2012-04-23 22:06:08 UTC
Created attachment 212636 [details] [review]
Clear object cache when NM stops
Comment 15 Dan Winship 2012-04-23 22:14:12 UTC
Comment on attachment 212636 [details] [review]
Clear object cache when NM stops

looks right
Comment 16 Dan Williams 2012-04-23 22:50:27 UTC
Review of attachment 212636 [details] [review]:

48981a6166208152890b6c57af19cc3c5db5837f
Comment 17 Dan Williams 2012-04-23 22:51:53 UTC
With latest git libnm-glib, I can restart NM all day and it doesn't appear to affect the shell or nm-applet.  I do get some clutter warnings in ~/.xsession-errors that might be related to the Shell indicator but I can't be certain since they aren't prefixed with a process ID or name.