GNOME Bugzilla – Bug 789501
Crashes in meta_monitor_manager_read_current_state and init_output when unplugging external monitor
Last modified: 2018-02-19 15:03:03 UTC
When unplugging an external monitor, both gnome-shell processes crash. I'm running Arch Linux with Gnome 3.26.1 on wayland. I am using a Dell XPS 9560 laptop with one internal 3840x2160 screen and two external screens. I unplugged and external screen connected via USB-C on the laptop to displayport via an adapter, and the crash happened immediately. Backtrace of the shell process:
+ Trace 238110
Backtrace of the gdm process (the coredump didn't complete and there was a full system hang, so this is from the journal): #0 0x00007f4c69c5db0c init_output (libmutter-1.so.0) #1 0x00007f4c69bd132c meta_monitor_manager_read_current_state (libmutter-1.so.0) #2 0x00007f4c69c5be48 handle_hotplug_event (libmutter-1.so.0) #3 0x00007f4c6608a17e ffi_call_unix64 (libffi.so.6) #4 0x00007f4c66089aef ffi_call (libffi.so.6) #5 0x00007f4c6b9f084b g_cclosure_marshal_generic (libgobject-2.0.so.0) #6 0x00007f4c6b9e76f5 g_closure_invoke (libgobject-2.0.so.0) #7 0x00007f4c6b9fb0b0 n/a (libgobject-2.0.so.0) #8 0x00007f4c6b9ff696 g_signal_emit_valist (libgobject-2.0.so.0) #9 0x00007f4c6ba00920 g_signal_emit (libgobject-2.0.so.0) #10 0x00007f4c5fdc94c9 n/a (libgudev-1.0.so.0) #11 0x00007f4c6b7170be g_main_context_dispatch (libglib-2.0.so.0) #12 0x00007f4c6b718f69 n/a (libglib-2.0.so.0) #13 0x00007f4c6b719f42 g_main_loop_run (libglib-2.0.so.0) #14 0x00007f4c69c04f6c meta_run (libmutter-1.so.0) #15 0x00005563dda6de8c n/a (gnome-shell) #16 0x00007f4c6c436f6a __libc_start_main (libc.so.6) #17 0x00005563dda6dfba n/a (gnome-shell)
Looks like this could only ever happen if we find a connected output that doesn't advertise any modes at all. I'll attach a patch that works around that.
Created attachment 362301 [details] [review] monitor-manager/kms: Treat connectors without modes as disconnected There seems to be a kernel race when one disconnects an external monitor connected to a DisplayPort via a USB-C adapter. The race results in a connector being reported as connected, but without any modes supported. Mitigate this by considering an connector without modes equivalent to not being connected.
Thanks Hmmm, that patch seems to have improved things but I still get crashes, although it makes sense that its a kernel bug. I have this in my log for example (lots of other red messages...) I unplugged the monitor and the shell did not crash which is new, but when I plugged it in this happened (no coredump): #0 0x00007fda8389bb0c init_output (libmutter-1.so.0) #1 0x00007fda8380f32c meta_monitor_manager_read_current_state (libmutter-1.so.0) #2 0x00007fda83899e48 handle_hotplug_event (libmutter-1.so.0) #3 0x00007fda7fcc817e ffi_call_unix64 (libffi.so.6) #4 0x00007fda7fcc7aef ffi_call (libffi.so.6) #5 0x00007fda8562e84b g_cclosure_marshal_generic (libgobject-2.0.so.0) #6 0x00007fda856256f5 g_closure_invoke (libgobject-2.0.so.0) #7 0x00007fda856390b0 n/a (libgobject-2.0.so.0) #8 0x00007fda8563d696 g_signal_emit_valist (libgobject-2.0.so.0) #9 0x00007fda8563e920 g_signal_emit (libgobject-2.0.so.0) #10 0x00007fda79a074c9 n/a (libgudev-1.0.so.0) #11 0x00007fda853550be g_main_context_dispatch (libglib-2.0.so.0)
Created attachment 362303 [details] System log before crash and hard system lock This happened after booting a second time, after much flickering of all of the screens. I got a hard lock, the caps lock light flashed for a while, and then everything was completely unresponsive.
I'd need a backtrace from a coredump for the above crash in init_output(), as I can't see where in that function it stumbled. The crash that is visible in the journal seems to be Xwayland exiting incorrectly though. Either way, it might be a good idea to report this on the Intel driver as well, since it seems to behave improperly.
Review of attachment 362301 [details] [review]: we could still create the output and add just the common modes
Created attachment 362535 [details] [review] monitor-manager-kms: Don't add outputs without modes There seems to be a kernel race when one disconnects an external monitor connected to a DisplayPort via a USB-C adapter. The race results in a connector being reported as connected, but without any modes supported. This had the side effect that we tried to set a preferred mode to the first listed mode, but as no modes were available, we instead tried to dereference the first element of a NULL array, causing a segmentation fault. Mitigate this by skipping adding output if no supported modes are advertised and the output doesn't support scaling, while moving the fallback path for calculating a preferred output mode to after possibly adding the common modes, to avoid the unvolentary NULL dereference.
So here is a new patch with your suggestion. I accidentally pushed it to gnome-3-26 already though while pushing another fix. I'll leave it there for now, while attaching it here anyway to have it post-mortem-reviewed.
Review of attachment 362535 [details] [review]: right, looks fine
*** Bug 789797 has been marked as a duplicate of this bug. ***
Created attachment 362874 [details] [review] gpu-kms: Don't add outputs without modes There seems to be a kernel race when one disconnects an external monitor connected to a DisplayPort via a USB-C adapter. The race results in a connector being reported as connected, but without any modes supported. This had the side effect that we tried to set a preferred mode to the first listed mode, but as no modes were available, we instead tried to dereference the first element of a NULL array, causing a segmentation fault. Mitigate this by skipping adding output if no supported modes are advertised and the output doesn't support scaling, while moving the fallback path for calculating a preferred output mode to after possibly adding the common modes, to avoid the unvolentary NULL dereference. ---- Slightly changed version for the master branch.
Comment on attachment 362535 [details] [review] monitor-manager-kms: Don't add outputs without modes Marking already landed patch as committed.
Ping
Hey, It seems to be no longer the case that when I unplug the same external monitor the shell crashes. If I plug it back in, sometimes the shell crashes, but usually at times when there is also a hard system lock. I reported an i915 bug here for that: https://bugs.freedesktop.org/show_bug.cgi?id=103474 The coredump didn't complete due to the hard lock, but this is what happenned after booting with an external display plugged in, and then pulling the cable out and back in twice: #0 0x00007f4eab03ac40 raise (libpthread.so.0) #1 0x0000561b0a7a0399 dump_gjs_stack_on_signal_handler (gnome-shell) #2 0x00007f4eab03ada0 __restore_rt (libpthread.so.0) #3 0x00007f4ea9f480c0 _g_log_abort (libglib-2.0.so.0) #4 0x00007f4ea9f481be g_log_structured_array (libglib-2.0.so.0) #5 0x00007f4ea9f4826d g_log_default_handler (libglib-2.0.so.0) #6 0x0000561b0a7a0191 default_log_handler (gnome-shell) #7 0x00007f4ea9f488fd g_logv (libglib-2.0.so.0) #8 0x00007f4ea9f48a02 g_log (libglib-2.0.so.0) #9 0x00007f4ea8472a9e x_io_error (libmutter-1.so.0) #10 0x00007f4ea715e39e _XIOError (libX11.so.6) #11 0x00007f4ea715bd62 _XReadEvents (libX11.so.6) #12 0x00007f4ea7143444 XIfEvent (libX11.so.6) #13 0x00007f4ea84335db meta_display_get_current_time_roundtrip (libmutter-1.so.0) #14 0x00007f4ea8487858 meta_wayland_surface_destroy_window (libmutter-1.so.0) #15 0x00007f4ea84880b3 wl_surface_destructor (libmutter-1.so.0) #16 0x00007f4e9fc09c12 n/a (libwayland-server.so.0) #17 0x00007f4e9fc0e619 n/a (libwayland-server.so.0) #18 0x00007f4e9fc09cff wl_client_destroy (libwayland-server.so.0) #19 0x00007f4e9fc09db9 n/a (libwayland-server.so.0) #20 0x00007f4e9fc0b9b2 wl_event_loop_dispatch (libwayland-server.so.0) #21 0x00007f4ea8472247 wayland_event_source_dispatch (libmutter-1.so.0) #22 0x00007f4ea9f39b4d g_main_dispatch (libglib-2.0.so.0) #23 0x00007f4ea9f39c4b g_main_context_dispatch (libglib-2.0.so.0) #24 0x00007f4ea9f3b2be g_main_context_iterate (libglib-2.0.so.0) #25 0x00007f4ea9f3bf48 g_main_loop_run (libglib-2.0.so.0) #26 0x00007f4ea843e1dc meta_run (libmutter-1.so.0) #27 0x0000561b0a7a0c6a main (gnome-shell) #28 0x00007f4eaac91f6a __libc_start_main (libc.so.6) #29 0x0000561b0a7a005a _start (gnome-shell)
Review of attachment 362874 [details] [review]: looks fine
I confirm that applying the patch in the attachment https://bugzilla.gnome.org/attachment.cgi?id=362301&action=diff, on top of the mutter-3.26.2 Fedora-27 source rpm, solved the problem of gnome-shell crashing when I turn of my 4k Dell P2715Q monitor using an RX-560 graphics card (originally reported in: https://bugs.freedesktop.org/show_bug.cgi?id=100745#c10)
(In reply to Dimitrios Liappis from comment #16) > I confirm that applying the patch in the attachment > https://bugzilla.gnome.org/attachment.cgi?id=362301&action=diff, on top of > the mutter-3.26.2 Fedora-27 source rpm, solved the problem of gnome-shell > crashing when I turn of my 4k Dell P2715Q monitor using an RX-560 graphics > card (originally reported in: > https://bugs.freedesktop.org/show_bug.cgi?id=100745#c10) Are you saying that the patch that landed does not fix the issue, but applying the patch you mentioned makes the problem go away? If so you are experiencing another bug: https://bugzilla.gnome.org/show_bug.cgi?id=790207
Comment on attachment 362874 [details] [review] gpu-kms: Don't add outputs without modes Attachment 362874 [details] pushed as d092e91 - gpu-kms: Don't add outputs without modes
(In reply to Jonas Ådahl from comment #17) > Are you saying that the patch that landed does not fix the issue, but > applying the patch you mentioned makes the problem go away? I am on FC27 with the latest `mutter-3.26.2-2.fc27.x86_64` (https://bodhi.fedoraproject.org/updates/FEDORA-2017-39b370bebf) package. When you are referring to the patch "that landed" I assume you are referring at: https://git.gnome.org/browse/mutter/commit/?id=d092e91 right? Obviously I am doing something wrong, but when I inspect the *source* rpm for the FC27 package, I don't see any `src/backends/native/meta-gpu-kms.c` file either inside the `mutter-3.26.2.tar.xz` tarball or as .patch files under SPECS/ Since the Fedora mutter package hasn't been updated for 2 months, I presume it doesn't contain any of the fixes attached here. > If so you are > experiencing another bug: https://bugzilla.gnome.org/show_bug.cgi?id=790207 I see, thanks will report there instead.
For the 3.26 release, the patch in question is https://gitlab.gnome.org/GNOME/mutter/commit/91873142165f44234cc6d20d0202408fb36bef51 which you should already have if you run 3.26.2. If your backtrace ends up in the move_resize() function and not anywhere in src/backends/ it's the other bug.
> If your backtrace ends up in the move_resize() function and not anywhere in src/backends/ it's the other bug. For completeness sake, indeed my stack trace ended in "meta_window_move_resize_request" and I've reported some progress in the other bug: https://bugzilla.gnome.org/show_bug.cgi?id=790207#c11 , thanks!