After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 780861 - Crash in GnomeWallClock
Crash in GnomeWallClock
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gsettings
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Allison Karlitskaya (desrt)
gtkdev
: 736775 777661 785277 786285 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2017-04-03 07:45 UTC by Jan Alexander Steffens (heftig)
Modified: 2017-09-25 23:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] wall-clock: Use g_signal_connect_object (1.96 KB, patch)
2017-04-03 07:45 UTC, Jan Alexander Steffens (heftig)
needs-work Details | Review
wallclock: Don't update clock for non-clock settings changes (961 bytes, patch)
2017-04-05 13:40 UTC, Bastien Nocera
committed Details | Review
wallclock: Fix crash in update_clock() when destroyed (1.80 KB, patch)
2017-04-06 17:53 UTC, Bastien Nocera
needs-work Details | Review
wallclock: Remove unneeded separate dispose() function (1.54 KB, patch)
2017-04-21 12:35 UTC, Bastien Nocera
committed Details | Review
GSettingsBackend: use a GWeakRef during dispatch (3.93 KB, patch)
2017-08-28 16:36 UTC, Cosimo Cecchi
none Details | Review
GSettingsBackend: use a GWeakRef during dispatch (3.93 KB, patch)
2017-09-08 16:30 UTC, Cosimo Cecchi
committed Details | Review

Description Jan Alexander Steffens (heftig) 2017-04-03 07:45:03 UTC
Created attachment 349159 [details] [review]
[PATCH] wall-clock: Use g_signal_connect_object

Had some gnome-shell crashes that I think were caused by the
on_schema_change handler getting called with a freed GnomeWallClock.

  • #0 g_time_zone_ref
    at gtimezone.c line 267
  • #1 g_date_time_alloc
    at gdatetime.c line 429
  • #2 g_date_time_from_instant
    at gdatetime.c line 524
  • #3 g_date_time_new_from_timeval
    at gdatetime.c line 647
  • #4 g_date_time_new_now
    at gdatetime.c line 705
  • #5 update_clock
    at gnome-wall-clock.c line 350
  • #9 <emit signal changed:cursor-blink-timeout on instance 0x22374d0 [GSettings]>
    at gsignal.c line 3447
  • #10 g_settings_real_change_event
    at gsettings.c line 386
  • #11 ffi_call_unix64
  • #12 ffi_call
  • #13 g_cclosure_marshal_generic_va
    at gclosure.c line 1604
  • #14 _g_closure_invoke_va
    at gclosure.c line 867
  • #15 g_signal_emit_valist
    at gsignal.c line 3300
  • #16 g_signal_emit
    at gsignal.c line 3447
  • #17 settings_backend_path_changed
    at gsettings.c line 461
  • #18 g_settings_backend_invoke_closure
    at gsettingsbackend.c line 267
  • #19 g_main_dispatch
    at gmain.c line 3203
  • #20 g_main_context_dispatch
    at gmain.c line 3856
  • #21 g_main_context_iterate
    at gmain.c line 3929
  • #22 g_main_loop_run
    at gmain.c line 4125
  • #23 meta_run
  • #24 main

The attached patch replaces g_signal_connect with g_signal_connect_object to ensure this cannot happen (barring multithreading).

Also, in case it ever happens, add a guard to update_clock to catch the
disposed-but-not-finalized case.
Comment 1 Bastien Nocera 2017-04-03 08:40:40 UTC
Review of attachment 349159 [details] [review]:

I'm pretty sure this wouldn't fix the problem. The objects are correctly unref'ed in dispose, so the signals should be disconnected when the last reference is gone. The problem is probably that there's more than one reference being held.

In your case, I wonder if the backtrace is correct. I think the problem might be update_clock()'s _gnome_datetime_source_new() which isn't disconnected if the trigger happens after dispose() but before finalize().

In any case, it would be worth creating a test application to try and reproduce the problem.
Comment 2 Frederic Crozat 2017-04-05 11:57:08 UTC
Full backtrace (done on GNOME 3.20.x):

Program terminated with signal SIGABRT, Aborted.

Thread 1 (Thread 0x7f6b895b4ac0 (LWP 1813))

  • #0 __GI_raise
    at ../sysdeps/unix/sysv/linux/raise.c line 55
  • #1 __GI_abort
    at abort.c line 78
  • #2 __libc_message
    at ../sysdeps/posix/libc_fatal.c line 175
  • #3 malloc_printerr
  • #4 update_clock
    at gnome-wall-clock.c line 365
  • #5 g_cclosure_marshal_VOID__STRINGv
    at gmarshal.c line 1794
  • #6 _g_closure_invoke_va
    at gclosure.c line 867
  • #7 g_signal_emit_valist
    at gsignal.c line 3294
  • #8 g_signal_emit
    at gsignal.c line 3441
  • #9 g_settings_real_change_event
    at gsettings.c line 386
  • #10 ffi_call_unix64
    at ../../../libffi/src/x86/unix64.S line 122
  • #11 ffi_call_int
    at ../../../libffi/src/x86/ffi64.c line 656
  • #12 ffi_call
    at ../../../libffi/src/x86/ffi64.c line 663
  • #13 g_cclosure_marshal_generic_va
    at gclosure.c line 1604
  • #14 _g_closure_invoke_va
    at gclosure.c line 867
  • #15 g_signal_emit_valist
    at gsignal.c line 3294
  • #16 g_signal_emit
    at gsignal.c line 3441
  • #17 settings_backend_path_changed
    at gsettings.c line 461
  • #18 g_settings_backend_invoke_closure
    at gsettingsbackend.c line 267
  • #19 g_main_context_dispatch
    at gmain.c line 3154
  • #20 g_main_context_dispatch
    at gmain.c line 3769
  • #21 g_main_context_iterate
    at gmain.c line 3840
  • #22 g_main_loop_run
    at gmain.c line 4034
  • #23 meta_run
  • #24 main
    at main.c line 471

Comment 3 Frederic Crozat 2017-04-05 13:00:58 UTC
I had two related crashes today:

Program terminated with signal SIGSEGV, Segmentation fault.
  • #0 update_clock
    at gnome-wall-clock.c line 345
  • #0 update_clock
    at gnome-wall-clock.c line 345
  • #1 g_cclosure_marshal_VOID__STRINGv
    at gmarshal.c line 1794
  • #2 _g_closure_invoke_va
    at gclosure.c line 867
  • #3 g_signal_emit_valist
    at gsignal.c line 3294
  • #4 g_signal_emit
    at gsignal.c line 3441
  • #5 g_settings_real_change_event
    at gsettings.c line 386
  • #6 ffi_call_unix64
    at ../../../libffi/src/x86/unix64.S line 122
  • #7 ffi_call_int
    at ../../../libffi/src/x86/ffi64.c line 656
  • #8 ffi_call
    at ../../../libffi/src/x86/ffi64.c line 663
  • #9 g_cclosure_marshal_generic_va
    at gclosure.c line 1604
  • #10 _g_closure_invoke_va
    at gclosure.c line 867
  • #11 g_signal_emit_valist
    at gsignal.c line 3294
  • #12 g_signal_emit
    at gsignal.c line 3441
  • #13 settings_backend_path_changed
    at gsettings.c line 461
  • #14 g_settings_backend_invoke_closure
    at gsettingsbackend.c line 267
  • #15 g_main_context_dispatch
    at gmain.c line 3154
  • #16 g_main_context_dispatch
    at gmain.c line 3769
  • #17 g_main_context_iterate
    at gmain.c line 3840
  • #18 g_main_loop_run
    at gmain.c line 4034
  • #19 meta_run
    at core/main.c line 537
  • #20 main
    at main.c line 471
  • #0 g_time_zone_ref
    at gtimezone.c line 267
  • #0 g_time_zone_ref
    at gtimezone.c line 267
  • #1 g_date_time_alloc
    at gdatetime.c line 429
  • #2 g_date_time_from_instant
    at gdatetime.c line 524
  • #3 g_date_time_new_now
    at gdatetime.c line 647
  • #4 g_date_time_new_now
    at gdatetime.c line 705
  • #5 update_clock
    at gnome-wall-clock.c line 350
  • #6 g_cclosure_marshal_VOID__STRINGv
    at gmarshal.c line 1794
  • #7 _g_closure_invoke_va
    at gclosure.c line 867
  • #8 g_signal_emit_valist
    at gsignal.c line 3294
  • #9 g_signal_emit
    at gsignal.c line 3441
  • #10 g_settings_real_change_event
    at gsettings.c line 386
  • #11 ffi_call_unix64
    at ../../../libffi/src/x86/unix64.S line 122
  • #12 ffi_call_int
    at ../../../libffi/src/x86/ffi64.c line 656
  • #13 ffi_call
    at ../../../libffi/src/x86/ffi64.c line 663
  • #14 g_cclosure_marshal_generic_va
    at gclosure.c line 1604
  • #15 _g_closure_invoke_va
    at gclosure.c line 867
  • #16 g_signal_emit_valist
    at gsignal.c line 3294
  • #17 g_signal_emit
    at gsignal.c line 3441
  • #18 settings_backend_path_changed
    at gsettings.c line 461
  • #19 g_settings_backend_invoke_closure
    at gsettingsbackend.c line 267
  • #20 g_main_context_dispatch
    at gmain.c line 3154
  • #21 g_main_context_dispatch
    at gmain.c line 3769
  • #22 g_main_context_iterate
    at gmain.c line 3840
  • #23 g_main_loop_run
    at gmain.c line 4034
  • #24 meta_run
    at core/main.c line 537
  • #25 main
    at main.c line 471

Comment 4 Bastien Nocera 2017-04-05 13:40:18 UTC
Created attachment 349300 [details] [review]
wallclock: Don't update clock for non-clock settings changes
Comment 5 Bastien Nocera 2017-04-05 13:41:13 UTC
Comment on attachment 349300 [details] [review]
wallclock: Don't update clock for non-clock settings changes

Attachment 349300 [details] pushed as 2a74887 - wallclock: Don't update clock for non-clock settings changes
Comment 6 Bastien Nocera 2017-04-05 13:49:34 UTC
I don't understand how the finalized GSettings can still emit changed signals.
Comment 7 Bastien Nocera 2017-04-06 17:40:20 UTC
*** Bug 777661 has been marked as a duplicate of this bug. ***
Comment 8 Bastien Nocera 2017-04-06 17:53:28 UTC
Created attachment 349390 [details] [review]
wallclock: Fix crash in update_clock() when destroyed

By removing unneded separate dispose() function and something something
something because I don't understand how that would make a difference.
Did we receive the "changed" signal between dispose() and finalize()?
But we already destroyed the GSettings in dispose()!
Comment 9 Cosimo Cecchi 2017-04-07 22:32:43 UTC
*** Bug 736775 has been marked as a duplicate of this bug. ***
Comment 10 Cosimo Cecchi 2017-04-07 22:33:54 UTC
Interestingly this is the same exact crash I had reported in bug 736775. I have a patch there too, but we couldn't figure out why it was needed.
Comment 11 Bastien Nocera 2017-04-07 23:09:20 UTC
My guess is that we end up with a race between the dispose and an idle signal that was popped from the GDBus thread. Would be nice to know if somebody has a better reproducer than "use gnome-shell from GNOME 3.20" (that was Frédéric's version, and he seems to be getting the crash very often).

I never see this now, and I'm using:
dconf-0.26.0-2.fc26.x86_64
glib2-2.52.0-1.fc26.x86_64

There weren't any changes in gnome-wall-clock.c that would (should?) have an incidence on the code in the past 3 years.
Comment 12 Frederic Crozat 2017-04-12 08:32:34 UTC
I applied the two patches from Bastien and things were stable until this morning, with a new crash:

(gdb) bt
  • #0 g_time_zone_ref
    at gtimezone.c line 267
  • #1 g_date_time_alloc
    at gdatetime.c line 429
  • #2 g_date_time_from_instant
    at gdatetime.c line 524
  • #3 g_date_time_new_now
    at gdatetime.c line 647
  • #4 g_date_time_new_now
    at gdatetime.c line 705
  • #5 update_clock
    at gnome-wall-clock.c line 341
  • #6 g_cclosure_marshal_VOID__STRINGv
    at gmarshal.c line 1794
  • #7 _g_closure_invoke_va
    at gclosure.c line 867
  • #8 g_signal_emit_valist
    at gsignal.c line 3294
  • #9 g_signal_emit
    at gsignal.c line 3441
  • #10 g_settings_real_change_event
    at gsettings.c line 386
  • #11 ffi_call_unix64
    at ../../../libffi/src/x86/unix64.S line 122
  • #12 ffi_call_int
    at ../../../libffi/src/x86/ffi64.c line 656
  • #13 ffi_call
    at ../../../libffi/src/x86/ffi64.c line 663
  • #14 g_cclosure_marshal_generic_va
    at gclosure.c line 1604
  • #15 _g_closure_invoke_va
    at gclosure.c line 867
  • #16 g_signal_emit_valist
    at gsignal.c line 3294
  • #17 g_signal_emit
    at gsignal.c line 3441
  • #18 settings_backend_path_changed
    at gsettings.c line 461
  • #19 g_settings_backend_invoke_closure
    at gsettingsbackend.c line 267
  • #20 g_main_context_dispatch
    at gmain.c line 3154
  • #21 g_main_context_dispatch
    at gmain.c line 3769
  • #22 g_main_context_iterate
    at gmain.c line 3840
  • #23 g_main_loop_run
    at gmain.c line 4034
  • #24 meta_run
    at core/main.c line 537
  • #25 main
    at main.c line 471

gnome-wall-clock.c:341 is:
341		now = g_date_time_new_now (self->priv->timezone);

(gdb) print self->priv 
$1 = (GnomeWallClockPrivate *) 0x9fa46d0
(gdb) print self->priv->timezone 
$2 = (GTimeZone *) 0x0
Comment 13 Bastien Nocera 2017-04-12 11:16:04 UTC
At this point, the question clearly is why the GSettings object still exists, and why it's still receiving signals pertaining to it.

I've not been able to reproduce this problem locally.
Comment 14 Luis Villa 2017-04-16 19:58:00 UTC
Fwiw I see this (or something similar) on 3.24 all the time. Never spontaneous - happens on system resume/suspend.
Comment 15 Rui Matos 2017-04-19 16:58:04 UTC
Ok, I think I understand what's happening.

TL;DR: it's not safe to assume GSettings instances are only owned by user code.

When the crash happens, the GnomeWallClock instance is disposed and finalized but the GSettings instance has an extra reference added in a different thread with this stack trace:

  • #0 g_object_ref
    at gobject.c line 3047
  • #1 g_settings_backend_dispatch_signal
    at gsettingsbackend.c line 306
  • #2 dconf_engine_watch_established
    from /usr/lib64/gio/modules/libdconfsettings.so
  • #3 dconf_gdbus_method_call_done
    from /usr/lib64/gio/modules/libdconfsettings.so
  • #4 g_task_return_now
    at gtask.c line 1121
  • #5 g_task_return
    at gtask.c line 1179
  • #6 g_dbus_connection_call_done
    at gdbusconnection.c line 5708
  • #7 g_task_return_now
    at gtask.c line 1121
  • #8 complete_in_idle_cb
    at gtask.c line 1135
  • #9 g_idle_dispatch
    at gmain.c line 5543
  • #10 g_main_dispatch
    at gmain.c line 3201
  • #11 g_main_context_dispatch
    at gmain.c line 3854
  • #12 g_main_context_iterate
    at gmain.c line 3927
  • #13 g_main_context_iteration
    at gmain.c line 3988
  • #14 dconf_gdbus_worker_thread
    from /usr/lib64/gio/modules/libdconfsettings.so
  • #15 g_thread_proxy
    at gthread.c line 784
  • #16 start_thread
    from /lib64/libpthread.so.0
  • #17 clone
    from /lib64/libc.so.6

Reading through g_settings_backend_dispatch_signal() we can see that it calls g_main_context_invoke() with the closure that owns this extra reference which means that that reference is held until the loop in the main thread goes idle and by that time the "single" owner of the GSettings instance is already finalized.

To get gnome-desktop fixed right now we can use g_connect_object() or something with the same effect but I also think this GSettings behavior should at least be documented (cc:ing desrt@desrt.ca)
Comment 16 Rui Matos 2017-04-19 17:01:08 UTC
Review of attachment 349390 [details] [review]:

this doesn't fix it
Comment 17 Bastien Nocera 2017-04-21 12:12:17 UTC
Review of attachment 349390 [details] [review]:

I'll clean this up though, we don't need to keep separate dispose and finalize.
Comment 18 Bastien Nocera 2017-04-21 12:35:03 UTC
Created attachment 350197 [details] [review]
wallclock: Remove unneeded separate dispose() function
Comment 19 Bastien Nocera 2017-04-21 12:36:03 UTC
Attachment 350197 [details] pushed as 1329895 - wallclock: Remove unneeded separate dispose() function
Comment 20 Cosimo Cecchi 2017-04-24 19:44:49 UTC
In the meantime, would it be OK to commit https://bug736775.bugzilla-attachments.gnome.org/attachment.cgi?id=286333 from the duplicate bug 736775 as a workaround?
Comment 21 Cosimo Cecchi 2017-04-24 20:00:58 UTC
Looking at the code here... https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n299

And at this comment here...
https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n204

I wonder if we really need to hold a reference during dispatch. If the GSettings instance is finalized while the dispatch closure is being invoked, we could as well add another weak reference to keep track of it and avoid calling back into the watch function.

But perhaps I'm missing something? Allison, what do you think?
Comment 22 Rui Matos 2017-07-25 13:53:34 UTC
*** Bug 785277 has been marked as a duplicate of this bug. ***
Comment 23 Cosimo Cecchi 2017-08-14 18:29:01 UTC
*** Bug 786285 has been marked as a duplicate of this bug. ***
Comment 24 Vít Ondruch 2017-08-25 07:27:20 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1485231

Is this ^^ related issue?
Comment 26 Bastien Nocera 2017-08-25 08:55:00 UTC
(In reply to Cosimo Cecchi from comment #21)
> Looking at the code here...
> https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n299
> 
> And at this comment here...
> https://git.gnome.org/browse/glib/tree/gio/gsettingsbackend.c#n204
> 
> I wonder if we really need to hold a reference during dispatch. If the
> GSettings instance is finalized while the dispatch closure is being invoked,
> we could as well add another weak reference to keep track of it and avoid
> calling back into the watch function.
> 
> But perhaps I'm missing something? Allison, what do you think?

Do you want to try it out? Would be good to get some forward movement on this.
Comment 27 Cosimo Cecchi 2017-08-28 16:36:36 UTC
Created attachment 358609 [details] [review]
GSettingsBackend: use a GWeakRef during dispatch

Instead of a full reference, which causes problems for clients that
expect a GSettings instance to stop firing signals once they drop the
last reference.

---

OK, here's my attempt; things seem to work fine and tests pass, but I'm not sure how I could create a test to reproduce this specific situation we're trying to fix.
Comment 28 Philip Withnall 2017-09-08 14:15:45 UTC
Review of attachment 358609 [details] [review]:

This looks reasonable to me apart from the one minor nitpick below. However, I suspect that any user code which expects its signal handlers for a GSettings instance to be automatically disconnected when it drops its last reference to the GSettings object is flawed. The user code can’t know that it holds the last strong reference to the GSettings, so should explicitly disconnect its signal handlers.

::: gio/gsettingsbackend.c
@@ +209,3 @@
    * dispatching is as follows:
    *
+   *  1) hold a thread-safe GWearRef on the GSettings during an outstanding

Nitpick: s/GWearRef/GWeakRef/
Comment 29 Bastien Nocera 2017-09-08 14:30:13 UTC
(In reply to Philip Withnall from comment #28)
> Review of attachment 358609 [details] [review] [review]:
> 
> This looks reasonable to me apart from the one minor nitpick below. However,
> I suspect that any user code which expects its signal handlers for a
> GSettings instance to be automatically disconnected when it drops its last
> reference to the GSettings object is flawed. The user code can’t know that
> it holds the last strong reference to the GSettings, so should explicitly
> disconnect its signal handlers.

When I do g_settings_new() and later g_object_unref(settings), I kind of expect to be the one removing the last reference to it. This assumption is there in huge quantities in a number of applications (including gnome-settings-daemon and gnome-control-center).

If this is a known flaw, it needs to be documented, and place front and center of the documentation.

Right now, the user code would only be flawed because the code doesn't work the way it should...
Comment 30 Cosimo Cecchi 2017-09-08 16:30:29 UTC
Created attachment 359407 [details] [review]
GSettingsBackend: use a GWeakRef during dispatch

--

Fixed the typo.
Comment 31 Philip Withnall 2017-09-08 17:14:03 UTC
(In reply to Bastien Nocera from comment #29)
> When I do g_settings_new() and later g_object_unref(settings), I kind of
> expect to be the one removing the last reference to it. This assumption is
> there in huge quantities in a number of applications (including
> gnome-settings-daemon and gnome-control-center).

Mmmm, yes (and hence we should push this patch), but also that kind of assumption about being the only bit of code touching the refcount of an object typically signifies unnecessarily tight coupling of code.

In any case, this patch seems like an improvement. Rather than blocking on desrt returning from her indefinite holiday to comment on the patch, let’s go ahead with it.
Comment 32 Philip Withnall 2017-09-08 17:15:20 UTC
Review of attachment 359407 [details] [review]:

Go go go.
Comment 33 Cosimo Cecchi 2017-09-08 18:25:17 UTC
Attachment 359407 [details] pushed as 88a3967 - GSettingsBackend: use a GWeakRef during dispatch

Thanks, pushing this.