GNOME Bugzilla – Bug 780861
Crash in GnomeWallClock
Last modified: 2017-09-25 23:56:11 UTC
Created attachment 349159 [details] [review] [PATCH] wall-clock: Use g_signal_connect_object Had some gnome-shell crashes that I think were caused by the on_schema_change handler getting called with a freed GnomeWallClock.
+ Trace 237316
The attached patch replaces g_signal_connect with g_signal_connect_object to ensure this cannot happen (barring multithreading). Also, in case it ever happens, add a guard to update_clock to catch the disposed-but-not-finalized case.
Review of attachment 349159 [details] [review]: I'm pretty sure this wouldn't fix the problem. The objects are correctly unref'ed in dispose, so the signals should be disconnected when the last reference is gone. The problem is probably that there's more than one reference being held. In your case, I wonder if the backtrace is correct. I think the problem might be update_clock()'s _gnome_datetime_source_new() which isn't disconnected if the trigger happens after dispose() but before finalize(). In any case, it would be worth creating a test application to try and reproduce the problem.
Full backtrace (done on GNOME 3.20.x): Program terminated with signal SIGABRT, Aborted.
+ Trace 237326
Thread 1 (Thread 0x7f6b895b4ac0 (LWP 1813))
I had two related crashes today: Program terminated with signal SIGSEGV, Segmentation fault.
+ Trace 237327
Created attachment 349300 [details] [review] wallclock: Don't update clock for non-clock settings changes
Comment on attachment 349300 [details] [review] wallclock: Don't update clock for non-clock settings changes Attachment 349300 [details] pushed as 2a74887 - wallclock: Don't update clock for non-clock settings changes
I don't understand how the finalized GSettings can still emit changed signals.
*** Bug 777661 has been marked as a duplicate of this bug. ***
Created attachment 349390 [details] [review] wallclock: Fix crash in update_clock() when destroyed By removing unneded separate dispose() function and something something something because I don't understand how that would make a difference. Did we receive the "changed" signal between dispose() and finalize()? But we already destroyed the GSettings in dispose()!
*** Bug 736775 has been marked as a duplicate of this bug. ***
Interestingly this is the same exact crash I had reported in bug 736775. I have a patch there too, but we couldn't figure out why it was needed.
My guess is that we end up with a race between the dispose and an idle signal that was popped from the GDBus thread. Would be nice to know if somebody has a better reproducer than "use gnome-shell from GNOME 3.20" (that was Frédéric's version, and he seems to be getting the crash very often). I never see this now, and I'm using: dconf-0.26.0-2.fc26.x86_64 glib2-2.52.0-1.fc26.x86_64 There weren't any changes in gnome-wall-clock.c that would (should?) have an incidence on the code in the past 3 years.
I applied the two patches from Bastien and things were stable until this morning, with a new crash: (gdb) bt
+ Trace 237356
gnome-wall-clock.c:341 is: 341 now = g_date_time_new_now (self->priv->timezone); (gdb) print self->priv $1 = (GnomeWallClockPrivate *) 0x9fa46d0 (gdb) print self->priv->timezone $2 = (GTimeZone *) 0x0
At this point, the question clearly is why the GSettings object still exists, and why it's still receiving signals pertaining to it. I've not been able to reproduce this problem locally.
Fwiw I see this (or something similar) on 3.24 all the time. Never spontaneous - happens on system resume/suspend.
Ok, I think I understand what's happening. TL;DR: it's not safe to assume GSettings instances are only owned by user code. When the crash happens, the GnomeWallClock instance is disposed and finalized but the GSettings instance has an extra reference added in a different thread with this stack trace:
+ Trace 237378
Reading through g_settings_backend_dispatch_signal() we can see that it calls g_main_context_invoke() with the closure that owns this extra reference which means that that reference is held until the loop in the main thread goes idle and by that time the "single" owner of the GSettings instance is already finalized. To get gnome-desktop fixed right now we can use g_connect_object() or something with the same effect but I also think this GSettings behavior should at least be documented (cc:ing desrt@desrt.ca)
Review of attachment 349390 [details] [review]: this doesn't fix it
Review of attachment 349390 [details] [review]: I'll clean this up though, we don't need to keep separate dispose and finalize.
Created attachment 350197 [details] [review] wallclock: Remove unneeded separate dispose() function
Attachment 350197 [details] pushed as 1329895 - wallclock: Remove unneeded separate dispose() function
In the meantime, would it be OK to commit https://bug736775.bugzilla-attachments.gnome.org/attachment.cgi?id=286333 from the duplicate bug 736775 as a workaround?
Looking at the code here... https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n299 And at this comment here... https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n204 I wonder if we really need to hold a reference during dispatch. If the GSettings instance is finalized while the dispatch closure is being invoked, we could as well add another weak reference to keep track of it and avoid calling back into the watch function. But perhaps I'm missing something? Allison, what do you think?
*** Bug 785277 has been marked as a duplicate of this bug. ***
*** Bug 786285 has been marked as a duplicate of this bug. ***
https://bugzilla.redhat.com/show_bug.cgi?id=1485231 Is this ^^ related issue?
BTW ABRT FAF migh be wrong, but this looks to be longstanding issue: https://retrace.fedoraproject.org/faf/problems/bthash/?bth=066c1d02c0b517242e80d683d734d5311f25c385&bth=0f0db8c14654302d82f6db0e6f473b85c59d152e&bth=3268c158d8b5f37834b3a5396f2f0f715bc5c6ad&bth=4ceda7cc60cad889bb88ad2133d16a51cd969820&bth=7696b2cee37b6e7335000d449c93f07d0e92bedf&bth=955cb8fcd82ea32afbcaec0068a4fc89b3d42f5b&bth=b080d1d0c6964b1da1931f2f0e7100e51e424bcb&bth=b6875029433acda057eed2ba207eadcb6b0ec758&bth=dd3fe30b2cd61e1f4b782ce3c99a875534e86c9c&bth=efee915d937d6f444c2aa2b33d868994d2408dee
(In reply to Cosimo Cecchi from comment #21) > Looking at the code here... > https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n299 > > And at this comment here... > https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n204 > > I wonder if we really need to hold a reference during dispatch. If the > GSettings instance is finalized while the dispatch closure is being invoked, > we could as well add another weak reference to keep track of it and avoid > calling back into the watch function. > > But perhaps I'm missing something? Allison, what do you think? Do you want to try it out? Would be good to get some forward movement on this.
Created attachment 358609 [details] [review] GSettingsBackend: use a GWeakRef during dispatch Instead of a full reference, which causes problems for clients that expect a GSettings instance to stop firing signals once they drop the last reference. --- OK, here's my attempt; things seem to work fine and tests pass, but I'm not sure how I could create a test to reproduce this specific situation we're trying to fix.
Review of attachment 358609 [details] [review]: This looks reasonable to me apart from the one minor nitpick below. However, I suspect that any user code which expects its signal handlers for a GSettings instance to be automatically disconnected when it drops its last reference to the GSettings object is flawed. The user code can’t know that it holds the last strong reference to the GSettings, so should explicitly disconnect its signal handlers. ::: gio/gsettingsbackend.c @@ +209,3 @@ * dispatching is as follows: * + * 1) hold a thread-safe GWearRef on the GSettings during an outstanding Nitpick: s/GWearRef/GWeakRef/
(In reply to Philip Withnall from comment #28) > Review of attachment 358609 [details] [review] [review]: > > This looks reasonable to me apart from the one minor nitpick below. However, > I suspect that any user code which expects its signal handlers for a > GSettings instance to be automatically disconnected when it drops its last > reference to the GSettings object is flawed. The user code can’t know that > it holds the last strong reference to the GSettings, so should explicitly > disconnect its signal handlers. When I do g_settings_new() and later g_object_unref(settings), I kind of expect to be the one removing the last reference to it. This assumption is there in huge quantities in a number of applications (including gnome-settings-daemon and gnome-control-center). If this is a known flaw, it needs to be documented, and place front and center of the documentation. Right now, the user code would only be flawed because the code doesn't work the way it should...
Created attachment 359407 [details] [review] GSettingsBackend: use a GWeakRef during dispatch -- Fixed the typo.
(In reply to Bastien Nocera from comment #29) > When I do g_settings_new() and later g_object_unref(settings), I kind of > expect to be the one removing the last reference to it. This assumption is > there in huge quantities in a number of applications (including > gnome-settings-daemon and gnome-control-center). Mmmm, yes (and hence we should push this patch), but also that kind of assumption about being the only bit of code touching the refcount of an object typically signifies unnecessarily tight coupling of code. In any case, this patch seems like an improvement. Rather than blocking on desrt returning from her indefinite holiday to comment on the patch, let’s go ahead with it.
Review of attachment 359407 [details] [review]: Go go go.
Attachment 359407 [details] pushed as 88a3967 - GSettingsBackend: use a GWeakRef during dispatch Thanks, pushing this.