GNOME Bugzilla – Bug 786929
Attaching a monitor to laptop while in suspend and then waking up laptop will reliably crash gnome-shell
Last modified: 2018-07-05 18:39:13 UTC
Attaching a monitor to laptop while in suspend and then waking up laptop will reliably crash gnome-shell under the X11 session (haven't tested Wayland). Details can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1472724
Seems related to tablet input settings (logical monitor <-> tablet device mapping). Moving to mutter.
Created attachment 359248 [details] [review] [gnome-3-24] monitor-manager: Don't free old state until logical monitors are replaced Logical monitors keep pointers around to monitor objects, which themself keep pointers aronud to outputs, which keeps pointer to modes and CRTCs. To avoid causing crashes when using the logical monitor API (which might use monitor APIs etc) between a hot plug and the time the logical monitors are regenerated (which is done synchronously in the native backend but asynchronously in the X11 backend), postpone the freeing of the state that logical monitor references, until the logical monitor state is regenerated. On the native backend, this should have no significant effect, as the logical state is always regenerated immediately after the hardware state is updated, but on X11, this should fix race conditions when events being processed between the hot plug and the hot plug result tries to access the yet to be up to date state. ----- I cannot reproduce the issue, but I suspect this is the issue. In other words, the series of events I imagine happens: 1) Hot plug event from the X server MetaMonitorManager updates the hardware state, and issues a Xrandr configuration 2) Input device added event from the X server This will try to configure various things, for example it seems like a tablet device is paired with a logical monitor. When it tries to pair, it'll use the old logical monitor state, before the hot plug. Since that means it'll try match various EDID state, the matching code will peek at the monitors in the logical monitors to see if it can find a match. This is where it seems to crash since the logical monitor represents an out of date state. 3) The xrandr configuration event would be received. I assume we'd eventually get here if we didn't crash after 2), and I assume the input settings paths will be triggered again, since the logical state is now finally up to date. If possible, please try this patch to see if it helps. It's made against 3.24.3, but I can rebase it to whatever patch needed.
Review of attachment 359248 [details] [review]: As discussed, this looks sane to me even as a first step for fixing the crashes in 3.36. Having better ownership would be indeed cleaner, but I see this needs a bigger refactor.
Created attachment 362402 [details] [review] [gnome-3-26] monitor-manager: Don't free old state until logical monitors are replaced Logical monitors keep pointers around to monitor objects, which themself keep pointers aronud to outputs, which keeps pointer to modes and CRTCs. To avoid causing crashes when using the logical monitor API (which might use monitor APIs etc) between a hot plug and the time the logical monitors are regenerated (which is done synchronously in the native backend but asynchronously in the X11 backend), postpone the freeing of the state that logical monitor references, until the logical monitor state is regenerated. On the native backend, this should have no significant effect, as the logical state is always regenerated immediately after the hardware state is updated, but on X11, this should fix race conditions when events being processed between the hot plug and the hot plug result tries to access the yet to be up to date state. ---- Rebased it on top of gnome-3-26, added a descriptive comment, and removed a TODO. I'll rework this patch for master too.
Review of attachment 362402 [details] [review]: looks fine to me
Comment on attachment 362402 [details] [review] [gnome-3-26] monitor-manager: Don't free old state until logical monitors are replaced Patch pushed to gnome-3-26. Not closing as the equivalent patch is not yet on master.
Created attachment 362883 [details] [review] backends: Move MetaOutput::crtc field into private struct No functional changes. This is only done so that changes to reference counting can done more reliably.
Created attachment 362884 [details] [review] backends: Add logical monitor -> monitor -> output -> crtc ref chain Make it so that each logical monitor has a reference to all the monitors that are assigned to it. All monitors has a reference to each output that belongs to it. Each output has a reference to any CRTC it has been assigned.
Ping.
Review of attachment 362884 [details] [review]: Commit message nit: "All monitors has a references" should be either "Each monitor has a reference" or "All monitors have a reference" ::: src/backends/meta-logical-monitor.c @@ +259,3 @@ MetaLogicalMonitor *logical_monitor = META_LOGICAL_MONITOR (object); + if (logical_monitor->monitors) Any reason for the check? NULL is a valid GList, so I expect g_list_free_full () to work unconditionally ... @@ +273,3 @@ GObjectClass *object_class = G_OBJECT_CLASS (klass); + object_class->dispose = meta_logical_monitor_dispose; Mmh, I usually go by the rule of thumb to use dispose() for ref-counted resources and finalize() for non-ref-counted ones - by that, finalize was correct here (Not that it really matters, it's not part of a ref cycle anyway - just curious about that change) ::: src/backends/meta-monitor.c @@ +363,3 @@ + MetaMonitorPrivate *priv = meta_monitor_get_instance_private (monitor); + + if (priv->outputs) Same question as for meta-logical-monitor.c @@ +398,3 @@ GObjectClass *object_class = G_OBJECT_CLASS (klass); + object_class->dispose = meta_monitor_dispose; Same question as for meta-logical-monitor.c
Review of attachment 362883 [details] [review]: ::: src/backends/meta-output.c @@ +51,3 @@ +{ + MetaOutputPrivate *priv = meta_output_get_instance_private (output); + Trailing whitespace
(In reply to Florian Müllner from comment #10) > Review of attachment 362884 [details] [review] [review]: > > Commit message nit: > "All monitors has a references" should be either "Each monitor has a > reference" or "All monitors have a reference" > > ::: src/backends/meta-logical-monitor.c > @@ +259,3 @@ > MetaLogicalMonitor *logical_monitor = META_LOGICAL_MONITOR (object); > > + if (logical_monitor->monitors) > > Any reason for the check? NULL is a valid GList, so I expect > g_list_free_full () to work unconditionally ... Mostly for clarity, as I find it odd to (re)set something NULL to NULL. > > @@ +273,3 @@ > GObjectClass *object_class = G_OBJECT_CLASS (klass); > > + object_class->dispose = meta_logical_monitor_dispose; > > Mmh, I usually go by the rule of thumb to use dispose() for ref-counted > resources and finalize() for non-ref-counted ones - by that, finalize was > correct here > (Not that it really matters, it's not part of a ref cycle anyway - just > curious about that change) The finalize->dispose move was because I went from freeing resources to unrefing (only that they were packaged in lists). > > ::: src/backends/meta-monitor.c > @@ +363,3 @@ > + MetaMonitorPrivate *priv = meta_monitor_get_instance_private (monitor); > + > + if (priv->outputs) > > Same question as for meta-logical-monitor.c > > @@ +398,3 @@ > GObjectClass *object_class = G_OBJECT_CLASS (klass); > > + object_class->dispose = meta_monitor_dispose; > > Same question as for meta-logical-monitor.c Same here about starting to unref and not just free things.
(In reply to Jonas Ådahl from comment #12) > (In reply to Florian Müllner from comment #10) > > Any reason for the check? NULL is a valid GList, so I expect > > g_list_free_full () to work unconditionally ... > > Mostly for clarity, as I find it odd to (re)set something NULL to NULL. Fair enough. > The finalize->dispose move was because I went from freeing resources to > unrefing (only that they were packaged in lists). That sounds questionable - the resource directly owned by the object is still the list, which is still not ref-counted. I see it similarly to objects that are always unreffed, even where all their resources are non-refcounted.
(In reply to Florian Müllner from comment #13) > (In reply to Jonas Ådahl from comment #12) > > The finalize->dispose move was because I went from freeing resources to > > unrefing (only that they were packaged in lists). > > That sounds questionable - the resource directly owned by the object is > still the list, which is still not ref-counted. I see it similarly to > objects that are always unreffed, even where all their resources are > non-refcounted. Isn't it still ref counted though? MetaLogicalMonitor owns references to its monitors; so does MetaMonitorManager. MetaMonitor own references to its MetaOutputs; so does MetaGpu. MetaOutput owns references to its MetaCrtc; so does MetaGpu. The fact that its in a GList doesn't sound like an important detail here.
I guess at this point both patches are fine to go in, right?
Ouch, these don't apply cleanly anymore... I've rebased it and reproposed in https://gitlab.gnome.org/GNOME/mutter/merge_requests/81
This has been fixed in commit 768ec15ea and commit d9c18fd5