After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 789501 - Crashes in meta_monitor_manager_read_current_state and init_output when unplugging external monitor
Crashes in meta_monitor_manager_read_current_state and init_output when unplu...
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: wayland
3.26.x
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
: 789797 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2017-10-25 23:35 UTC by Daniel Playfair Cal
Modified: 2018-02-19 15:03 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
monitor-manager/kms: Treat connectors without modes as disconnected (2.04 KB, patch)
2017-10-26 03:34 UTC, Jonas Ådahl
reviewed Details | Review
System log before crash and hard system lock (37.77 KB, text/plain)
2017-10-26 04:14 UTC, Daniel Playfair Cal
  Details
monitor-manager-kms: Don't add outputs without modes (2.98 KB, patch)
2017-10-30 10:59 UTC, Jonas Ådahl
committed Details | Review
gpu-kms: Don't add outputs without modes (5.83 KB, patch)
2017-11-03 07:56 UTC, Jonas Ådahl
committed Details | Review

Description Daniel Playfair Cal 2017-10-25 23:35:24 UTC
When unplugging an external monitor, both gnome-shell processes crash.

I'm running Arch Linux with Gnome 3.26.1 on wayland. I am using a Dell XPS 9560 laptop with one internal 3840x2160 screen and two external screens. I unplugged and external screen connected via USB-C on the laptop to displayport via an adapter, and the crash happened immediately.

Backtrace of the shell process:

  • #0 init_output
    at backends/native/meta-monitor-manager-kms.c line 750
  • #1 init_outputs
    at backends/native/meta-monitor-manager-kms.c line 1094
  • #2 meta_monitor_manager_kms_read_current
    at backends/native/meta-monitor-manager-kms.c line 1131
  • #3 meta_monitor_manager_read_current_state
    at backends/meta-monitor-manager.c line 2569
  • #4 handle_hotplug_event
    at backends/native/meta-monitor-manager-kms.c line 1500
  • #5 on_uevent
    at backends/native/meta-monitor-manager-kms.c line 1516
  • #6 ffi_call_unix64
    at ../src/x86/unix64.S line 76
  • #7 ffi_call
    at ../src/x86/ffi64.c line 525
  • #8 g_cclosure_marshal_generic
  • #9 g_closure_invoke
  • #10 0x00007f6ff48080b0 in
  • #11 g_signal_emit_valist
  • #12 g_signal_emit
  • #13 0x00007f6fe8bd64c9 in
  • #14 g_main_context_dispatch
  • #15 0x00007f6ff4525f69 in
  • #16 g_main_loop_run
  • #17 meta_run
    at core/main.c line 648
  • #18 0x0000562442ea4e8c in
  • #19 __libc_start_main
  • #20 0x0000562442ea4fba in

Backtrace of the gdm process (the coredump didn't complete and there was a full system hang, so this is from the journal):

#0  0x00007f4c69c5db0c init_output (libmutter-1.so.0)
#1  0x00007f4c69bd132c meta_monitor_manager_read_current_state (libmutter-1.so.0)
#2  0x00007f4c69c5be48 handle_hotplug_event (libmutter-1.so.0)
#3  0x00007f4c6608a17e ffi_call_unix64 (libffi.so.6)
#4  0x00007f4c66089aef ffi_call (libffi.so.6)
#5  0x00007f4c6b9f084b g_cclosure_marshal_generic (libgobject-2.0.so.0)
#6  0x00007f4c6b9e76f5 g_closure_invoke (libgobject-2.0.so.0)
#7  0x00007f4c6b9fb0b0 n/a (libgobject-2.0.so.0)
#8  0x00007f4c6b9ff696 g_signal_emit_valist (libgobject-2.0.so.0)
#9  0x00007f4c6ba00920 g_signal_emit (libgobject-2.0.so.0)
#10 0x00007f4c5fdc94c9 n/a (libgudev-1.0.so.0)
#11 0x00007f4c6b7170be g_main_context_dispatch (libglib-2.0.so.0)
#12 0x00007f4c6b718f69 n/a (libglib-2.0.so.0)
#13 0x00007f4c6b719f42 g_main_loop_run (libglib-2.0.so.0)
#14 0x00007f4c69c04f6c meta_run (libmutter-1.so.0)
#15 0x00005563dda6de8c n/a (gnome-shell)
#16 0x00007f4c6c436f6a __libc_start_main (libc.so.6)
#17 0x00005563dda6dfba n/a (gnome-shell)
Comment 1 Jonas Ådahl 2017-10-26 03:33:08 UTC
Looks like this could only ever happen if we find a connected output that doesn't advertise any modes at all. I'll attach a patch that works around that.
Comment 2 Jonas Ådahl 2017-10-26 03:34:36 UTC
Created attachment 362301 [details] [review]
monitor-manager/kms: Treat connectors without modes as disconnected

There seems to be a kernel race when one disconnects an external
monitor connected to a DisplayPort via a USB-C adapter. The race
results in a connector being reported as connected, but without any
modes supported.

Mitigate this by considering an connector without modes equivalent to
not being connected.
Comment 3 Daniel Playfair Cal 2017-10-26 04:12:48 UTC
Thanks

Hmmm, that patch seems to have improved things but I still get crashes, although it makes sense that its a kernel bug. I have this in my log for example (lots of other red messages...)

I unplugged the monitor and the shell did not crash which is new, but when I plugged it in this happened (no coredump):

#0  0x00007fda8389bb0c init_output (libmutter-1.so.0)
#1  0x00007fda8380f32c meta_monitor_manager_read_current_state (libmutter-1.so.0)
#2  0x00007fda83899e48 handle_hotplug_event (libmutter-1.so.0)
#3  0x00007fda7fcc817e ffi_call_unix64 (libffi.so.6)
#4  0x00007fda7fcc7aef ffi_call (libffi.so.6)
#5  0x00007fda8562e84b g_cclosure_marshal_generic (libgobject-2.0.so.0)
#6  0x00007fda856256f5 g_closure_invoke (libgobject-2.0.so.0)
#7  0x00007fda856390b0 n/a (libgobject-2.0.so.0)
#8  0x00007fda8563d696 g_signal_emit_valist (libgobject-2.0.so.0)
#9  0x00007fda8563e920 g_signal_emit (libgobject-2.0.so.0)
#10 0x00007fda79a074c9 n/a (libgudev-1.0.so.0)
#11 0x00007fda853550be g_main_context_dispatch (libglib-2.0.so.0)
Comment 4 Daniel Playfair Cal 2017-10-26 04:14:32 UTC
Created attachment 362303 [details]
System log before crash and hard system lock

This happened after booting a second time, after much flickering of all of the screens. I got a hard lock, the caps lock light flashed for a while, and then everything was completely unresponsive.
Comment 5 Jonas Ådahl 2017-10-26 04:21:10 UTC
I'd need a backtrace from a coredump for the above crash in init_output(), as I can't see where in that function it stumbled.

The crash that is visible in the journal seems to be Xwayland exiting incorrectly though.

Either way, it might be a good idea to report this on the Intel driver as well, since it seems to behave improperly.
Comment 6 Rui Matos 2017-10-26 07:51:46 UTC
Review of attachment 362301 [details] [review]:

we could still create the output and add just the common modes
Comment 7 Jonas Ådahl 2017-10-30 10:59:39 UTC
Created attachment 362535 [details] [review]
monitor-manager-kms: Don't add outputs without modes

There seems to be a kernel race when one disconnects an external
monitor connected to a DisplayPort via a USB-C adapter. The race
results in a connector being reported as connected, but without any
modes supported.

This had the side effect that we tried to set a preferred mode to
the first listed mode, but as no modes were available, we instead tried
to dereference the first element of a NULL array, causing a
segmentation fault.

Mitigate this by skipping adding output if no supported modes are
advertised and the output doesn't support scaling, while moving the
fallback path for calculating a preferred output mode to after possibly
adding the common modes, to avoid the unvolentary NULL dereference.
Comment 8 Jonas Ådahl 2017-10-30 11:01:37 UTC
So here is a new patch with your suggestion. I accidentally pushed it to gnome-3-26 already though while pushing another fix. I'll leave it there for now, while attaching it here anyway to have it post-mortem-reviewed.
Comment 9 Rui Matos 2017-10-30 14:37:44 UTC
Review of attachment 362535 [details] [review]:

right, looks fine
Comment 10 Jonas Ådahl 2017-11-02 02:53:48 UTC
*** Bug 789797 has been marked as a duplicate of this bug. ***
Comment 11 Jonas Ådahl 2017-11-03 07:56:38 UTC
Created attachment 362874 [details] [review]
gpu-kms: Don't add outputs without modes

There seems to be a kernel race when one disconnects an external
monitor connected to a DisplayPort via a USB-C adapter. The race
results in a connector being reported as connected, but without any
modes supported.

This had the side effect that we tried to set a preferred mode to
the first listed mode, but as no modes were available, we instead tried
to dereference the first element of a NULL array, causing a
segmentation fault.

Mitigate this by skipping adding output if no supported modes are
advertised and the output doesn't support scaling, while moving the
fallback path for calculating a preferred output mode to after possibly
adding the common modes, to avoid the unvolentary NULL dereference.

----

Slightly changed version for the master branch.
Comment 12 Jonas Ådahl 2017-11-03 08:44:44 UTC
Comment on attachment 362535 [details] [review]
monitor-manager-kms: Don't add outputs without modes

Marking already landed patch as committed.
Comment 13 Jonas Ådahl 2017-11-30 03:30:37 UTC
Ping
Comment 14 Daniel Playfair Cal 2017-11-30 12:30:18 UTC
Hey,

It seems to be no longer the case that when I unplug the same external monitor the shell crashes.

If I plug it back in, sometimes the shell crashes, but usually at times when there is also a hard system lock. I reported an i915 bug here for that: https://bugs.freedesktop.org/show_bug.cgi?id=103474

The coredump didn't complete due to the hard lock, but this is what happenned after booting with an external display plugged in, and then pulling the cable out and back in twice:

#0  0x00007f4eab03ac40 raise (libpthread.so.0)
#1  0x0000561b0a7a0399 dump_gjs_stack_on_signal_handler (gnome-shell)
#2  0x00007f4eab03ada0 __restore_rt (libpthread.so.0)
#3  0x00007f4ea9f480c0 _g_log_abort (libglib-2.0.so.0)
#4  0x00007f4ea9f481be g_log_structured_array (libglib-2.0.so.0)
#5  0x00007f4ea9f4826d g_log_default_handler (libglib-2.0.so.0)
#6  0x0000561b0a7a0191 default_log_handler (gnome-shell)
#7  0x00007f4ea9f488fd g_logv (libglib-2.0.so.0)
#8  0x00007f4ea9f48a02 g_log (libglib-2.0.so.0)
#9  0x00007f4ea8472a9e x_io_error (libmutter-1.so.0)
#10 0x00007f4ea715e39e _XIOError (libX11.so.6)
#11 0x00007f4ea715bd62 _XReadEvents (libX11.so.6)
#12 0x00007f4ea7143444 XIfEvent (libX11.so.6)
#13 0x00007f4ea84335db meta_display_get_current_time_roundtrip (libmutter-1.so.0)
#14 0x00007f4ea8487858 meta_wayland_surface_destroy_window (libmutter-1.so.0)
#15 0x00007f4ea84880b3 wl_surface_destructor (libmutter-1.so.0)
#16 0x00007f4e9fc09c12 n/a (libwayland-server.so.0)
#17 0x00007f4e9fc0e619 n/a (libwayland-server.so.0)
#18 0x00007f4e9fc09cff wl_client_destroy (libwayland-server.so.0)
#19 0x00007f4e9fc09db9 n/a (libwayland-server.so.0)
#20 0x00007f4e9fc0b9b2 wl_event_loop_dispatch (libwayland-server.so.0)
#21 0x00007f4ea8472247 wayland_event_source_dispatch (libmutter-1.so.0)
#22 0x00007f4ea9f39b4d g_main_dispatch (libglib-2.0.so.0)
#23 0x00007f4ea9f39c4b g_main_context_dispatch (libglib-2.0.so.0)
#24 0x00007f4ea9f3b2be g_main_context_iterate (libglib-2.0.so.0)
#25 0x00007f4ea9f3bf48 g_main_loop_run (libglib-2.0.so.0)
#26 0x00007f4ea843e1dc meta_run (libmutter-1.so.0)
#27 0x0000561b0a7a0c6a main (gnome-shell)
#28 0x00007f4eaac91f6a __libc_start_main (libc.so.6)
#29 0x0000561b0a7a005a _start (gnome-shell)
Comment 15 Rui Matos 2017-11-30 17:07:17 UTC
Review of attachment 362874 [details] [review]:

looks fine
Comment 16 Dimitrios Liappis 2018-02-04 19:58:20 UTC
I confirm that applying the patch in the attachment https://bugzilla.gnome.org/attachment.cgi?id=362301&action=diff, on top of the mutter-3.26.2 Fedora-27 source rpm, solved the problem of gnome-shell crashing when I turn of my 4k Dell P2715Q monitor using an RX-560 graphics card (originally reported in: https://bugs.freedesktop.org/show_bug.cgi?id=100745#c10)
Comment 17 Jonas Ådahl 2018-02-05 02:47:05 UTC
(In reply to Dimitrios Liappis from comment #16)
> I confirm that applying the patch in the attachment
> https://bugzilla.gnome.org/attachment.cgi?id=362301&action=diff, on top of
> the mutter-3.26.2 Fedora-27 source rpm, solved the problem of gnome-shell
> crashing when I turn of my 4k Dell P2715Q monitor using an RX-560 graphics
> card (originally reported in:
> https://bugs.freedesktop.org/show_bug.cgi?id=100745#c10)

Are you saying that the patch that landed does not fix the issue, but applying the patch you mentioned makes the problem go away? If so you are experiencing another bug: https://bugzilla.gnome.org/show_bug.cgi?id=790207
Comment 18 Jonas Ådahl 2018-02-05 02:55:18 UTC
Comment on attachment 362874 [details] [review]
gpu-kms: Don't add outputs without modes

Attachment 362874 [details] pushed as d092e91 - gpu-kms: Don't add outputs without modes
Comment 19 Dimitrios Liappis 2018-02-05 18:38:35 UTC
(In reply to Jonas Ådahl from comment #17)

> Are you saying that the patch that landed does not fix the issue, but
> applying the patch you mentioned makes the problem go away? 

I am on FC27 with the latest `mutter-3.26.2-2.fc27.x86_64` (https://bodhi.fedoraproject.org/updates/FEDORA-2017-39b370bebf) package.
When you are referring to the patch "that landed" I assume you are referring at: https://git.gnome.org/browse/mutter/commit/?id=d092e91 right? Obviously I am doing something wrong, but when I inspect the *source* rpm for the FC27 package, I don't see any `src/backends/native/meta-gpu-kms.c` file either inside the  `mutter-3.26.2.tar.xz` tarball or as .patch files under SPECS/

Since the Fedora mutter package hasn't been updated for 2 months, I presume it doesn't contain any of the fixes attached here.

> If so you are
> experiencing another bug: https://bugzilla.gnome.org/show_bug.cgi?id=790207

I see, thanks will report there instead.
Comment 20 Jonas Ådahl 2018-02-06 03:08:25 UTC
For the 3.26 release, the patch in question is https://gitlab.gnome.org/GNOME/mutter/commit/91873142165f44234cc6d20d0202408fb36bef51 which you should already have if you run 3.26.2. If your backtrace ends up in the move_resize() function and not anywhere in src/backends/ it's the other bug.
Comment 21 Dimitrios Liappis 2018-02-19 15:03:03 UTC
> If your backtrace ends up in the move_resize() function and not anywhere in src/backends/ it's the other bug.

For completeness sake, indeed my stack trace ended in "meta_window_move_resize_request" and I've reported some progress in the other bug: https://bugzilla.gnome.org/show_bug.cgi?id=790207#c11 , thanks!