After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 786929 - Attaching a monitor to laptop while in suspend and then waking up laptop will reliably crash gnome-shell
Attaching a monitor to laptop while in suspend and then waking up laptop will...
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: general
3.24.x
Other Linux
: Normal major
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2017-08-28 17:02 UTC by el
Modified: 2018-07-05 18:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[gnome-3-24] monitor-manager: Don't free old state until logical monitors are replaced (6.63 KB, patch)
2017-09-06 03:24 UTC, Jonas Ådahl
reviewed Details | Review
[gnome-3-26] monitor-manager: Don't free old state until logical monitors are replaced (7.35 KB, patch)
2017-10-27 11:52 UTC, Jonas Ådahl
committed Details | Review
backends: Move MetaOutput::crtc field into private struct (26.73 KB, patch)
2017-11-03 10:31 UTC, Jonas Ådahl
accepted-commit_now Details | Review
backends: Add logical monitor -> monitor -> output -> crtc ref chain (5.42 KB, patch)
2017-11-03 10:31 UTC, Jonas Ådahl
reviewed Details | Review

Description el 2017-08-28 17:02:35 UTC
Attaching a monitor to laptop while in suspend and then waking up laptop will reliably crash gnome-shell under the X11 session (haven't tested Wayland). Details can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1472724
Comment 1 Jonas Ådahl 2017-09-04 03:38:54 UTC
Seems related to tablet input settings (logical monitor <-> tablet device mapping). Moving to mutter.
Comment 2 Jonas Ådahl 2017-09-06 03:24:23 UTC
Created attachment 359248 [details] [review]
[gnome-3-24] monitor-manager: Don't free old state until logical monitors are replaced

Logical monitors keep pointers around to monitor objects, which themself
keep pointers aronud to outputs, which keeps pointer to modes and CRTCs.
To avoid causing crashes when using the logical monitor API (which
might use monitor APIs etc) between a hot plug and the time the logical
monitors are regenerated (which is done synchronously in the native
backend but asynchronously in the X11 backend), postpone the freeing of
the state that logical monitor references, until the logical monitor
state is regenerated.

On the native backend, this should have no significant effect, as the
logical state is always regenerated immediately after the hardware
state is updated, but on X11, this should fix race conditions when
events being processed between the hot plug and the hot plug result
tries to access the yet to be up to date state.

-----

I cannot reproduce the issue, but I suspect this is the issue. In other words,
the series of events I imagine happens:

1) Hot plug event from the X server

MetaMonitorManager updates the hardware state, and issues a Xrandr configuration

2) Input device added event from the X server

This will try to configure various things, for example it seems like a tablet
device is paired with a logical monitor. When it tries to pair, it'll use the
old logical monitor state, before the hot plug. Since that means it'll try
match various EDID state, the matching code will peek at the monitors in the
logical monitors to see if it can find a match. This is where it seems to crash
since the logical monitor represents an out of date state.

3) The xrandr configuration event would be received.

I assume we'd eventually get here if we didn't crash after 2), and I assume the
input settings paths will be triggered again, since the logical state is now
finally up to date.


If possible, please try this patch to see if it helps. It's made against
3.24.3, but I can rebase it to whatever patch needed.
Comment 3 Marco Trevisan (Treviño) 2017-10-20 07:05:19 UTC
Review of attachment 359248 [details] [review]:

As discussed, this looks sane to me even as a first step for fixing the crashes in 3.36.
Having better ownership would be indeed cleaner, but I see this needs a bigger refactor.
Comment 4 Jonas Ådahl 2017-10-27 11:52:34 UTC
Created attachment 362402 [details] [review]
[gnome-3-26] monitor-manager: Don't free old state until logical monitors are replaced

Logical monitors keep pointers around to monitor objects, which themself
keep pointers aronud to outputs, which keeps pointer to modes and CRTCs.
To avoid causing crashes when using the logical monitor API (which
might use monitor APIs etc) between a hot plug and the time the logical
monitors are regenerated (which is done synchronously in the native
backend but asynchronously in the X11 backend), postpone the freeing of
the state that logical monitor references, until the logical monitor
state is regenerated.

On the native backend, this should have no significant effect, as the
logical state is always regenerated immediately after the hardware
state is updated, but on X11, this should fix race conditions when
events being processed between the hot plug and the hot plug result
tries to access the yet to be up to date state.

----

Rebased it on top of gnome-3-26, added a descriptive comment, and removed a
TODO. I'll rework this patch for master too.
Comment 5 Rui Matos 2017-10-27 14:59:08 UTC
Review of attachment 362402 [details] [review]:

looks fine to me
Comment 6 Jonas Ådahl 2017-10-30 10:58:24 UTC
Comment on attachment 362402 [details] [review]
[gnome-3-26] monitor-manager: Don't free old state until logical monitors are replaced

Patch pushed to gnome-3-26. Not closing as the equivalent patch is not yet on master.
Comment 7 Jonas Ådahl 2017-11-03 10:31:15 UTC
Created attachment 362883 [details] [review]
backends: Move MetaOutput::crtc field into private struct

No functional changes. This is only done so that changes to reference
counting can done more reliably.
Comment 8 Jonas Ådahl 2017-11-03 10:31:20 UTC
Created attachment 362884 [details] [review]
backends: Add logical monitor -> monitor -> output -> crtc ref chain

Make it so that each logical monitor has a reference to all the
monitors that are assigned to it.

All monitors has a reference to each output that belongs to it.

Each output has a reference to any CRTC it has been assigned.
Comment 9 Jonas Ådahl 2017-11-30 03:20:34 UTC
Ping.
Comment 10 Florian Müllner 2017-11-30 16:09:19 UTC
Review of attachment 362884 [details] [review]:

Commit message nit:
"All monitors has a references" should be either "Each monitor has a reference" or "All monitors have a reference"

::: src/backends/meta-logical-monitor.c
@@ +259,3 @@
   MetaLogicalMonitor *logical_monitor = META_LOGICAL_MONITOR (object);
 
+  if (logical_monitor->monitors)

Any reason for the check? NULL is a valid GList, so I expect g_list_free_full () to work unconditionally ...

@@ +273,3 @@
   GObjectClass *object_class = G_OBJECT_CLASS (klass);
 
+  object_class->dispose = meta_logical_monitor_dispose;

Mmh, I usually go by the rule of thumb to use dispose() for ref-counted resources and finalize() for non-ref-counted ones - by that, finalize was correct here
(Not that it really matters, it's not part of a ref cycle anyway - just curious about that change)

::: src/backends/meta-monitor.c
@@ +363,3 @@
+  MetaMonitorPrivate *priv = meta_monitor_get_instance_private (monitor);
+
+  if (priv->outputs)

Same question as for meta-logical-monitor.c

@@ +398,3 @@
   GObjectClass *object_class = G_OBJECT_CLASS (klass);
 
+  object_class->dispose = meta_monitor_dispose;

Same question as for meta-logical-monitor.c
Comment 11 Florian Müllner 2017-11-30 16:31:27 UTC
Review of attachment 362883 [details] [review]:

::: src/backends/meta-output.c
@@ +51,3 @@
+{
+  MetaOutputPrivate *priv = meta_output_get_instance_private (output);
+  

Trailing whitespace
Comment 12 Jonas Ådahl 2017-12-11 09:04:01 UTC
(In reply to Florian Müllner from comment #10)
> Review of attachment 362884 [details] [review] [review]:
> 
> Commit message nit:
> "All monitors has a references" should be either "Each monitor has a
> reference" or "All monitors have a reference"
> 
> ::: src/backends/meta-logical-monitor.c
> @@ +259,3 @@
>    MetaLogicalMonitor *logical_monitor = META_LOGICAL_MONITOR (object);
>  
> +  if (logical_monitor->monitors)
> 
> Any reason for the check? NULL is a valid GList, so I expect
> g_list_free_full () to work unconditionally ...

Mostly for clarity, as I find it odd to (re)set something NULL to NULL.

> 
> @@ +273,3 @@
>    GObjectClass *object_class = G_OBJECT_CLASS (klass);
>  
> +  object_class->dispose = meta_logical_monitor_dispose;
> 
> Mmh, I usually go by the rule of thumb to use dispose() for ref-counted
> resources and finalize() for non-ref-counted ones - by that, finalize was
> correct here
> (Not that it really matters, it's not part of a ref cycle anyway - just
> curious about that change)

The finalize->dispose move was because I went from freeing resources to unrefing (only that they were packaged in lists).

> 
> ::: src/backends/meta-monitor.c
> @@ +363,3 @@
> +  MetaMonitorPrivate *priv = meta_monitor_get_instance_private (monitor);
> +
> +  if (priv->outputs)
> 
> Same question as for meta-logical-monitor.c
> 
> @@ +398,3 @@
>    GObjectClass *object_class = G_OBJECT_CLASS (klass);
>  
> +  object_class->dispose = meta_monitor_dispose;
> 
> Same question as for meta-logical-monitor.c

Same here about starting to unref and not just free things.
Comment 13 Florian Müllner 2017-12-11 09:30:21 UTC
(In reply to Jonas Ådahl from comment #12)
> (In reply to Florian Müllner from comment #10)
> > Any reason for the check? NULL is a valid GList, so I expect
> > g_list_free_full () to work unconditionally ...
> 
> Mostly for clarity, as I find it odd to (re)set something NULL to NULL.

Fair enough.


> The finalize->dispose move was because I went from freeing resources to
> unrefing (only that they were packaged in lists).

That sounds questionable - the resource directly owned by the object is still the list, which is still not ref-counted. I see it similarly to objects that are always unreffed, even where all their resources are non-refcounted.
Comment 14 Jonas Ådahl 2017-12-14 03:29:49 UTC
(In reply to Florian Müllner from comment #13)
> (In reply to Jonas Ådahl from comment #12)
> > The finalize->dispose move was because I went from freeing resources to
> > unrefing (only that they were packaged in lists).
> 
> That sounds questionable - the resource directly owned by the object is
> still the list, which is still not ref-counted. I see it similarly to
> objects that are always unreffed, even where all their resources are
> non-refcounted.

Isn't it still ref counted though? MetaLogicalMonitor owns references to its monitors; so does MetaMonitorManager. MetaMonitor own references to its MetaOutputs; so does MetaGpu. MetaOutput owns references to its MetaCrtc; so does MetaGpu. The fact that its in a GList doesn't sound like an important detail here.
Comment 15 Marco Trevisan (Treviño) 2018-04-16 22:55:58 UTC
I guess at this point both patches are fine to go in, right?
Comment 16 Marco Trevisan (Treviño) 2018-04-17 01:09:24 UTC
Ouch, these don't apply cleanly anymore... I've rebased it and reproposed in https://gitlab.gnome.org/GNOME/mutter/merge_requests/81
Comment 17 Marco Trevisan (Treviño) 2018-06-28 14:06:05 UTC
This has been fixed in commit 768ec15ea and commit d9c18fd5