GNOME Bugzilla – Bug 782688
Crashes trying to set keyboard map when logging in
Last modified: 2019-10-07 14:35:35 UTC
I just got a gnome-shell crash, trying to get my 3 displays to work (one does not display anything anymore). Unrecoverable failure in required component org.gnome.Shell.desktop Process 1968 (gnome-shell) crashed in xkb_keymap_ref() Process 1968 (gnome-shell) of user 1000 dumped core. Stack trace of thread 1968: #0 0x00007fdcee1e6b53 xkb_keymap_ref (libxkbcommon.so.0) #1 0x00007fdceed8fe1a clutter_evdev_set_keyboard_map (libmutter-clutter-0.so) #2 0x00007fdcefd7ff63 meta_backend_native_set_keymap (libmutter-0.so.0) #3 0x00007fdce83dbbde ffi_call_unix64 (libffi.so.6) #4 0x00007fdce83db54f ffi_call (libffi.so.6) #5 0x00007fdcf2ad42ec n/a (libgjs.so.0) #6 0x00007fdcf2ad5a96 n/a (libgjs.so.0) #7 0x00007fdcf40c4f85 n/a (n/a) #8 0x000055a83c5db7b0 n/a (n/a) #9 0x00007fdcc1021a7d n/a (n/a)
I got another trace a few moments later: Process 3230 (gnome-shell) of user 1000 dumped core. Stack trace of thread 3230: #0 0x00007feadd108e81 _g_log_abort (libglib-2.0.so.0) #1 0x00007feadd109ebc g_log_default_handler (libglib-2.0.so.0) #2 0x0000562b573ca945 default_log_handler (gnome-shell) #3 0x00007feadd10a14d g_logv (libglib-2.0.so.0) #4 0x00007feadd10a2bf g_log (libglib-2.0.so.0) #5 0x00007feae1b56a5e x_io_error (libmutter-0.so.0) #6 0x00007feadbb52a5e _XIOError (libX11.so.6) #7 0x00007feadbb50422 _XReadEvents (libX11.so.6) #8 0x00007feadbb37b54 XIfEvent (libX11.so.6) #9 0x00007feae1b1e7cb meta_display_get_current_time_roundtrip (libmutter-0.so.0) #10 0x00007feae1b6a4f1 meta_wayland_surface_destroy_window (libmutter-0.so.0) #11 0x00007feae1b6ad10 wl_surface_destructor (libmutter-0.so.0) #12 0x00007feadca8ff80 destroy_resource (libwayland-server.so.0) #13 0x00007feade756b89 wl_map_for_each (libwayland-client.so.0) #14 0x00007feadca9006d wl_client_destroy (libwayland-server.so.0) #15 0x00007feadca90128 wl_client_connection_data (libwayland-server.so.0) #16 0x00007feadca91c52 wl_event_loop_dispatch (libwayland-server.so.0) #17 0x00007feae1b56317 wayland_event_source_dispatch (libmutter-0.so.0) #18 0x00007feadd103277 g_main_context_dispatch (libglib-2.0.so.0) #19 0x00007feadd103618 g_main_context_iterate.isra.25 (libglib-2.0.so.0) #20 0x00007feadd103932 g_main_loop_run (libglib-2.0.so.0) #21 0x00007feae1b28bbc meta_run (libmutter-0.so.0) #22 0x0000562b573ca4a7 main (gnome-shell) #23 0x00007feadb53f5fe __libc_start_main (libc.so.6) #24 0x0000562b573ca5ba _start (gnome-shell) Stack trace of thread 3234: #0 0x00007feadb623ced poll (libc.so.6) #1 0x00007feadd103599 g_main_context_iterate.isra.25 (libglib-2.0.so.0) #2 0x00007feadd1036ac g_main_context_iteration (libglib-2.0.so.0) #3 0x00007feab6632f3d dconf_gdbus_worker_thread (libdconfsettings.so) #4 0x00007feadd12a586 g_thread_proxy (libglib-2.0.so.0) #5 0x00007feadb8f736d start_thread (libpthread.so.0) #6 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3267: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3268: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3270: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3271: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3272: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3273: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 8274: #0 0x00007feadb62a7a9 syscall (libc.so.6) #1 0x00007feadd1487da g_cond_wait_until (libglib-2.0.so.0) #2 0x00007feadd0d7b31 g_async_queue_pop_intern_unlocked (libglib-2.0.so.0) #3 0x00007feadd12af24 g_thread_pool_thread_proxy (libglib-2.0.so.0) #4 0x00007feadd12a586 g_thread_proxy (libglib-2.0.so.0) #5 0x00007feadb8f736d start_thread (libpthread.so.0) #6 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3231: #0 0x00007feadb623ced poll (libc.so.6) #1 0x00007feadd103599 g_main_context_iterate.isra.25 (libglib-2.0.so.0) #2 0x00007feadd1036ac g_main_context_iteration (libglib-2.0.so.0) #3 0x00007feadd1036f1 glib_worker_main (libglib-2.0.so.0) #4 0x00007feadd12a586 g_thread_proxy (libglib-2.0.so.0) #5 0x00007feadb8f736d start_thread (libpthread.so.0) #6 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3269: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 8215: #0 0x00007feadb62a7a9 syscall (libc.so.6) #1 0x00007feadd1487da g_cond_wait_until (libglib-2.0.so.0) #2 0x00007feadd0d7b31 g_async_queue_pop_intern_unlocked (libglib-2.0.so.0) #3 0x00007feadd12af24 g_thread_pool_thread_proxy (libglib-2.0.so.0) #4 0x00007feadd12a586 g_thread_proxy (libglib-2.0.so.0) #5 0x00007feadb8f736d start_thread (libpthread.so.0) #6 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3232: #0 0x00007feadb623ced poll (libc.so.6) #1 0x00007feadd103599 g_main_context_iterate.isra.25 (libglib-2.0.so.0) #2 0x00007feadd103932 g_main_loop_run (libglib-2.0.so.0) #3 0x00007feadec28b16 gdbus_shared_thread_func (libgio-2.0.so.0) #4 0x00007feadd12a586 g_thread_proxy (libglib-2.0.so.0) #5 0x00007feadb8f736d start_thread (libpthread.so.0) #6 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3266: #0 0x00007feadb623ced poll (libc.so.6) #1 0x00007feae5124b71 poll_func (libpulse.so.0) #2 0x00007feae5116530 pa_mainloop_poll (libpulse.so.0) #3 0x00007feae5116bc0 pa_mainloop_iterate (libpulse.so.0) #4 0x00007feae5116c50 pa_mainloop_run (libpulse.so.0) #5 0x00007feae5124ab9 thread (libpulse.so.0) #6 0x00007feadb2f1078 internal_thread_func (libpulsecommon-10.0.so) #7 0x00007feadb8f736d start_thread (libpthread.so.0) #8 0x00007feadb62fe0f __clone (libc.so.6) Stack trace of thread 3274: #0 0x00007feadb8fd7db pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0) #1 0x00007fead5ff9580 PR_WaitCondVar (libnspr4.so) #2 0x00007fead9cc80b1 _ZN2js12HelperThread10threadLoopEv (libmozjs-38.so) #3 0x00007fead5ffeecb _pt_root (libnspr4.so) #4 0x00007feadb8f736d start_thread (libpthread.so.0) #5 0x00007feadb62fe0f __clone (libc.so.6)
Any messages in the journal from around the time of the first crash? It looks like you tried to set a keymap that xkbcommon did not understand. The second one looks like an Xwayland crash.
I didn't do anything related to keymap. I have this message which might be related to my apple keyboard: apple 0005:05AC:0256.0009: unknown main item tag 0x0 Also some thunderbolt errors: thunderbolt 0000:07:00.0: resetting error on 0:c.
I get the first one (xkb_keymap_ref issue) a lot theses days, mostly on login, right after entering my password. My session try to load and then crashes and go back to the login screen. Usually I can login the second time without issues. I have the same stacktrace each time.
Could you get a full backtrace next time? If coredumpctl is used, get it by: coredumpctl gdb <pid-of-coredump-entry> then type: "backtrace full" Whats your input settings? I.e. what input method, keyboard layouts etc have you configureD?
Created attachment 352343 [details] Full backtrace Here is the full backtrace, fresh from this morning :)
My keyboard layout is Swedish, I have only one input source. For the input method I'm not sure, I didn't change anything so it should be the default one in Fedora 26.
Can't reproduce this, even with identical input to what seems to be passed according to that backtrace. Also, by looking at all the different code paths leading to the crash, you should have gotten something in the log prefixed with "xkbcommon: ERROR: " Did you check "journalctl --system" too? FWIW, the crash can easily be avoided with a simple NULL check, and probably should, as the call might come from the outside (outside of libmutter), but still want to find out the cause to fix it properly.
Yes you are right, here are the errors: xkbcommon: ERROR: Couldn't look up rules 'evdev', model 'pc105+inet', layout 'se,us', variant ',', options '' xkbcommon: ERROR: /home/mlavault/.xkb xkbcommon: ERROR: 1 include paths could not be added: xkbcommon: ERROR: /usr/share/X11/xkb xkbcommon: ERROR: 1 include paths searched: xkbcommon: ERROR: Couldn't find file "rules/evdev" in include paths
I assume you used journalctl -r here? (it looks reversed, in the future, it might be a good idea to paste from journalctl -e instead, to get the in the correct order). Anyway, does the /usr/share/X11/xkb/rules/evdev file exist? Whats the files permission?
Yes sorry about that, journalctl is still pretty new to me. Yes the file exists, here are the permissions: -rw-r--r-- 1 root root 42941 12 mai 15:06 /usr/share/X11/xkb/rules/evdev
Created attachment 352346 [details] [review] libxkbcommon-warn-if-not-enoent.patch Do you have the possibility to test the attached libxkbcommon patch? It'll print any error "fopen()" returns when trying to open the keymap as long as it's not ENOENT (i.e. File not found). As the file exist, and has the adequate permissions, it must be some other error.
I can try, I can compile xkbcommon with the patch applied, but then how would I install it ?
A risky method would be to override the system one by setting the prefix to /usr. You can also build an RPM with the patch and install that. Otherwise I can look into creating a copr repo with that patch that you can temporarily enable.
If you could create a copr repo that would be fantastic ! I'm afraid to break something if I do it myself. Thanks a lot for your time and reactiveness on this issue, it is hugely appreciated :)
Here it is: https://copr.fedorainfracloud.org/coprs/jadahl/gnomebug-782688/ You can either download the RPM manually and rpm -i it from here: https://copr-be.cloud.fedoraproject.org/results/jadahl/gnomebug-782688/fedora-26-x86_64/00555099-libxkbcommon/ Or add the copr repo and AFAIK just update: dnf config-manager --add-repo https://copr.fedorainfracloud.org/coprs/jadahl/gnomebug-782688/repo/fedora-26/jadahl-gnomebug-782688-fedora-26.repo Using the second method, just don't forget to remove it later as I won't update it!
You'll also have to restart gdm for this. Easiest way to do that is to reboot.
Thanks ! So the real error is this one: xkbcommon: ERROR: Couldn't open file "/usr/share/X11/xkb/rules/evdev": Too many open files Which seems weird to me. I don't see why it would load such a huge ammount of files.
That was my suspicion, as we have the similar issue in bug 782690 except no way to reproduce it. If you don't mind, I'll create a new libxkbcommon package that'll print the content of /proc/<pid/fd (open files) to the log when it fails to open rules/evdev due to too many open files.
Of course, happy to help !
A new build is ready with libxkbcommon-0.7.1-5.fc26. It should print lots of stuff (what it does is run "ls -l /proc/<pid>/fd") into the journal when it happens. Copy all of it and attach it as a text file to this bug.
I tried the new build and cannot reproduce anymore. I don't see any messages and don't seems to have the crash. Did you add some safety checks ?
Created attachment 352351 [details] [review] libxkbcommon-list-open-files-on-error.patch (In reply to Maël Lavault from comment #22) > I tried the new build and cannot reproduce anymore. I don't see any messages > and don't seems to have the crash. Did you add some safety checks ? No more checks. The only difference is that, if fopen fails with "too many open files", "ls -l /proc/..." will be run and the output printed to stdout (i.e. the journal). It should crash the same way it did before.
Ok, I'll monitor this in the coming day and keep you updated as soon as I get another crash
For some reason I haven't been able to reproduce the crash lately...
Created attachment 352854 [details] Too many open files Ok I finally managed to reproduce it. See attached file.
I have application that use autostart. It might be related to atom starting up.
(In reply to Maël Lavault from comment #26) > Created attachment 352854 [details] > Too many open files > > Ok I finally managed to reproduce it. See attached file. Ok, so there seems to be an abnormal amount of timerfds (847 open timerfd instances). Now we "just" need to who is the one opening all these timerfds.
How can I find this information ?
(In reply to Maël Lavault from comment #29) > How can I find this information ? This is a problem with gnome-shell or one of its dependencies. I haven't managed to find time to look further than that there are timerfds in libgnome-desktop (related to time and date), libinput and some place more IIRC.
Created attachment 353550 [details] [review] background: free WallClock explicitly when it is not needed
Created attachment 353551 [details] [review] [DEBUG] wallclock: Move the removal of GSource to dispose() This patch includes some line for printing debug messages.
I am not sure that Maël experienced the same use-case of me. But it could be reproduced by changing the VTs (Ctrl+Alt+F5 - Ctrl+Alt+F1 (2 more timerfd created) ...) This symptom has gone after applying these patches. But there is one thing strange for me. Even after I explicitly called <run_dispose() in gnome-shell> to call <dispose() of wallclock in gnome-desktop>, <destroy() of wallclock in gnome-desktop> is not called. Isn't the Gobject's destroy() called automatically after dispose() is called? You might check it easily with the debug messages I inserted.
(In reply to Hyungwon Hwang from comment #33) > I am not sure that Maël experienced the same use-case of me. But it could be > reproduced by changing the VTs (Ctrl+Alt+F5 - Ctrl+Alt+F1 (2 more timerfd > created) ...) > > This symptom has gone after applying these patches. > > But there is one thing strange for me. Even after I explicitly called > <run_dispose() in gnome-shell> to call <dispose() of wallclock in > gnome-desktop>, <destroy() of wallclock in gnome-desktop> is not called. > Isn't the Gobject's destroy() called automatically after dispose() is called? > > You might check it easily with the debug messages I inserted. Both dispose and finalize is called when the GObject is destroyed. I assume this is done by the javascript garbage collector, when the JS object doesn't have any references left to it. Are the finalize function of the wall clock never called at all for you? If so, it sounds like the Background JS object references are never properly unset thus never destroyed by the GC.
Yes. At that time, it wasn't called at all. This time, I tried to call GC explicitly by running imports.system.gc() in looking glass. I could see that destroy() was called when GC run. I guess that the problem at the first time happened when a lot of timerfd were created but GC didn't run, because timerfd uses not much memory. In my opinion, freeing timerfd by calling run_dispose() is a safe way for avoiding this kind of situation. It would be good to get an advice from gnome-desktop developers.
Created attachment 353558 [details] [review] wallclock: Move the removal of GSource to dispose()
Yea, does indeed make sense. I can see another place where the wall clock has an explicit run_dispose(), so I guess it was originally intended as a counter measure for delayed GC. However, it seems that in bug 780861, the content of dispose was moved to finalize as part of a crash fix, so need to make sure we don't introduce any crash that that bug fixed.
Review of attachment 353558 [details] [review]: This would revert 1329895396bae1999a9a90d0b27fe260e4a0d693. See https://bugzilla.gnome.org/show_bug.cgi?id=780861 I don't think this makes any sense. If the problem is that GnomeWallClock uses too much resources internally, maybe you'd want to share instances of it, rather than create them and try to find a way to dispose of them. This is a problem to be solved in JS, not in the C code...
Review of attachment 353551 [details] [review]: Rejecting the debug patch.
(In reply to Bastien Nocera from comment #38) > Review of attachment 353558 [details] [review] [review]: > > This would revert 1329895396bae1999a9a90d0b27fe260e4a0d693. See > https://bugzilla.gnome.org/show_bug.cgi?id=780861 > > I don't think this makes any sense. If the problem is that GnomeWallClock > uses too much resources internally, maybe you'd want to share instances of > it, rather than create them and try to find a way to dispose of them. > > This is a problem to be solved in JS, not in the C code... There's 4 instances of GnomeWallClock, only two different types. I don't think that's what's leaking the timerfds. Might be worth making those objects singletons if it's going to be a resource problem.
Ignoring the other places in gnome-shell, there are 1 instance of GnomeWallClock per background, and one background per logical monitor, and they are destroyed and regenerated each time the monitor configuration changes (e.g. on VT switches). It doesn't explain why 847 timerfds leak though, unless we somehow GC extremely rarely. Anayhow, using a single wall clock for all of gnome-shell would at least avoid any chance of the wall clock being the cause of the timerfd leak.
Yes. It seems not the right way to hit the root cause. Even though 847 timerfds were leaked from GnomeWallClock, I had to find why the code was called that much and why the GC didn't work well. What about to gather more info about it? Maël, do you still experience it these days? If so, could you tell me more about the situation when it happens?
So it only seems to happen after boot, right when I try to login for the first time. I cannot reproduce it 100% of the time, it is doesn't happens a lot and when it does, gnome shell crashes and go back to login screen where i can login again. It usually works well the second time. I suspect it might have something to do with displayport support, I get a log of bugs from it (crashes, flickering, screen that goes black, ...). It might crash silently and recreate display a lot, which would lead to timerfd leaks (this is just an hypothesis). I have 3 screens on 2 minidp port (with daisy chaining) and the internal screen of my macbook pro which is deactivated (but still shows a grey background somehow).
We can check whether that is the issue. What you'd need to do is to run udevadm monitor -s drm > udev-drm.log from another VT, then try to log in from GDM and see if it reproduces. You should get log entries for every 'hotplug' event in the same way as mutter would see them.
Hey, I just found this thread after trying to debug the same problem here: https://bugzilla.redhat.com/show_bug.cgi?id=1441490 If its useful, I can reproduce it consistently by running gnome-shell inside valgrind on wayland. Also, I found that commenting out all references to this._clock in js/ui/background.js from gnome-desktop prevented the problem from hapenning.
I think I found the underlying cause - see bug 788110. The issue is that sometimes the "changed" event is emitted by Gio.Settings objects even though a change has not occurred. The logic in background.js assumes that this is not the case so sometimes when too many of these events occur it enters an infinite loop which happens to instantiate a WallClock along the way.
I'm also seeing this bug, see https://bugzilla.redhat.com/show_bug.cgi?id=1507656 for details. This issue has been present for a few releases, at least since 3.22. I get this crash sporadically on login, maybe every 1 or 2 out of 10 tries. (In reply to Jonas Ådahl from comment #2) > Any messages in the journal from around the time of the first crash? It > looks like you tried to set a keymap that xkbcommon did not understand. Usually, logging in works fine with exactly the same configuration. My active keyboard layout is: $ localectl System Locale: LANG=de_DE.UTF-8 VC Keymap: de-neo X11 Layout: de,de X11 Variant: neo,nodeadkeys furthermore, there is another German (de nodeadkeys) and en-us layout. Still I get this:
+ Trace 238203
so it looks like something in the API makes 4 layouts out of 3. This looks wrong to me. Also, the fact that "keymap" is 0x0 here looks wrong too. > The second one looks like an Xwayland crash. I'm seeing the Xwayland crash too, see https://bugzilla.redhat.com/show_bug.cgi?id=1510078. (In reply to Maël Lavault from comment #4) > I get the first one (xkb_keymap_ref issue) a lot theses days, mostly on > login, right after entering my password. My session try to load and then > crashes and go back to the login screen. Usually I can login the second time > without issues. I have the same stacktrace each time. Same here. (In reply to Maël Lavault from comment #43) > So it only seems to happen after boot, right when I try to login for the > first time. I cannot reproduce it 100% of the time, it is doesn't happens a > lot and when it does, gnome shell crashes and go back to login screen where > i can login again. It usually works well the second time. Same here. > I suspect it might have something to do with displayport support, I get a > log of bugs from it (crashes, flickering, screen that goes black, ...). It > might crash silently and recreate display a lot, which would lead to timerfd > leaks (this is just an hypothesis). I have 3 screens on 2 minidp port (with > daisy chaining) and the internal screen of my macbook pro which is > deactivated (but still shows a grey background somehow). I doubt this bug is related to your monitor setup. I have one single fullHD (1920x1080p) HDMI monitor connected to an Intel iGPU. (In reply to Daniel Playfair Cal from comment #46) > I think I found the underlying cause - see bug 788110. > > The issue is that sometimes the "changed" event is emitted by Gio.Settings > objects even though a change has not occurred. The logic in background.js > assumes that this is not the case so sometimes when too many of these events > occur it enters an infinite loop which happens to instantiate a WallClock > along the way. That makes sense. My log files also look like gnome-shell is looping extension disabling/enabling. Most notably, I'm getting thousands of messages like this, but with different signal ID and sometimes different instance pointer: > gnome-shell[2656]: gsignal.c:2641: instance '0x55c51f11b0d0' has no handler with id
Hmm, I think there are two reasons that spurious changed events are emitted. One is that when a setting is changed (in the same process, e.g. from JS), dconf does not check if the value is different before emitting a changed signal. There is a patch to address that behaviour here: https://bugzilla.gnome.org/show_bug.cgi?id=789639. Perhaps that will solve this problem? The other is that when another process sets a setting (whether or not the value changes) and a watch request for a key is in progress, dconf emits a changed signal for all keys. There is a bug here: https://bugzilla.gnome.org/show_bug.cgi?id=790640 and I am experimenting with different ways to patch it. Its a slow process though, since I'm new to dconf/gsettings/DBus etc. I don't recognise that gsignal message, but I often get lots of the same warning from some piece of C code that is part of whatever infinite loop I've ended up in.
*** Bug 792284 has been marked as a duplicate of this bug. ***
I'm sorry. I already had this bug opened some days ago and forgot about the tab. In addidtion to what I have written in #792284 I can add, that I'm using the interal display of my X220 (LVDS) and an external monitor (VGA). Thanks
Peter, could you please try the two patches here: https://bugzilla.gnome.org/show_bug.cgi?id=789639 Attachments 365361 and 365362 - I think they will fix it. Other interesting info: - Are you using the BTRFS filesystem? specifically for "~/.config/dconf"? - If you run the shell in valgrind, does the crash occur consistently?
Thanks! Sound interesting. I cannot tell you when I will have time to test, but I will report then. I'm using EXT4 with data=ordered and noatime.
After this happend once again, I tried to provocate that issue with valgrind as Daniel mentioned: XDG_SESSION_TYPE=wayland valgrind --leak-check=no --log-file=gnomevalgrind.txt gnome-shell Of course slow, but didn't lead to the (expected) crash. Should I use another way to use Valgrind and GNOME-Shell?
Did the shell successfully start and become interactive? I had to recompile mesa with the --enable-valgrind option, otherwise the valgrind log was filled with massive quantities of spurious warnings from the graphics stack. I also needed the patch from here (https://bugzilla.gnome.org/show_bug.cgi?id=790640, attachment 366936 [details] [review]) to prevent infinite loops in valgrind when starting the shell (depends on what extensions you have installed). Otherwise maybe the slowness caused by valgrind prevents whatever race condition causes the issue :( There's also newer versions of those patches I mentioned before which fixed this problem for me: https://bugzilla.gnome.org/show_bug.cgi?id=789639 (attachments 366937,366938). If you're experiencing this crash regularly perhaps you could try running with them and seeing if they still occur?
*** Bug 787422 has been marked as a duplicate of this bug. ***
Bumped the version to something slightly more recent to avoid coming across as an obsolete bug.
Still happens to me on gnome-shell-3.28.3-1.fc28.x86_64 and mutter-3.28.3-3.fc28.x86_64. Same stacktrace as in comment 6. This was also reported in https://gitlab.gnome.org/GNOME/gnome-shell/issues/118 which was moved to https://gitlab.gnome.org/GNOME/mutter/issues/76#note_290964 which points to https://gitlab.gnome.org/GNOME/dconf/merge_requests/1
+ Trace 238682
Thread 1 (Thread 0x7f00e7f84240 (LWP 4768))
Please reopen if anyone can still reproduce this issue on dconf >= 0.29.1