GNOME Bugzilla – Bug 789153
GNOME Shell silently goes headless on some ATI (radeon) config
Last modified: 2017-10-19 07:45:34 UTC
Most of the thread is in https://bugs.launchpad.net/ubuntu/+source/gnome-shell/+bug/1723577. GNOME Shell is at 3.26.1 with db43c45b12fbae462857e59dce6849c6aa965091 cherry-picked. The issue started during this 3.26.1 update, when some people (plague at least 3 users in the few people testing the release having some ATI cards) running ATI didn't get GNOME Shell to start. Forcing Xorg with WaylandEnable=false works for them. After some bisect, reverting 023b50e7a7002226d6176bede22930d96da074a9, 5c37facc083078564faaeec4aa58084857c56ee1 + removing the cherry-pick was confirmed to work by the impacted user. From the logs in the launchpad bugs, it's clear that GNOME Shell wasn't segfaulting, but silently running. The current theory is that GNOME Shell was segfaulting under wayland before, and thus, rolled back to Xorg which "worked" for the user. With those headless patches, the Shell isn't segfaulting anymore, but running without primary monitor attached, leaving the user with no visible UI in gdm.
relevant irc chatter: <seb128> hey there <seb128> the gnome-shell 3.26.0 to 3.26.1 update made gdm not display the login screen for some Ubuntu users with ATI video cards, https://bugs.launchpad.net/ubuntu/+source/gdm3/+bug/1723577 <seb128> does anyone has an idea of what debug info could be useful or what commit/change could create the issue? <seb128> (using the radeon driver) <halfline> seb128: nice <halfline> Oct 17 13:13:19 valeryan24-desktop gnome-shell[1206]: Failed to use linear monitor configuration: Invalid mode 775501120x892744761 (0,000001) for monitor 'ACI ASUS VW220' <halfline> Oct 17 13:13:19 valeryan24-desktop gnome-shell[1206]: Failed to use fallback monitor configuration: Invalid mode 775501120x892744761 (0,000001) for monitor 'ACI ASUS VW220' <seb128> hey halfline <halfline> man i thought my 4k monitor was nice <halfline> i want one of those ! <seb128> lol <ebassi> 900M monitors are all the rage <seb128> halfline, do you see any recent commit that could lead to those errors? <borschty> not without a gpu capable of driving those though <seb128> similar error in the log from another user <didrocks> (we have 3 of them known right now, all radeon) <ebassi> borschty: What, you don't have an nvidia 100080i? <halfline> seb128: i can scan through the commit log and see if anything pops out at me <halfline> but best person to chat to is jadahl <halfline> seb128: you guys are using wayland at the login screen ? <seb128> halfline, right, it's probably after work hours for him right? <seb128> halfline, yes, login and default session <halfline> he's on taiwan time, but oddly is around this time sometimes :-) <seb128> halfline, we are trying to remote debug with non technical users, throwing some deb builds with commit reverted atm (like the headless one) to see if we can figure out which one is creating the issue <seb128> halfline, the new version of Ubuntu is going out tomorrow, we are trying to figure out if that's a blocker, unsure what configurations are impacted exactly :-/ <halfline> what does sudo -u gdm dbus-run-session gsettings get org.gnome.mutter experimental-features say ? <halfline> err wait <halfline> you need DCONF_PROFILE=gdm too <halfline> sudo -u gdm dbus-run session env DCONF_PROFILE=gdm gsettings get org.gnome.mutter experimental-features <halfline> maybe Debian-gdm instead of -u gdm not sure <seb128> no it's gdm <seb128> @as [] <seb128> halfline, ^ <jadahl> halfline: that doesn't look right <halfline> jadahl: hey! <jadahl> how did you get a 900M monitor? <halfline> jadahl: radeon users https://bugs.launchpad.net/ubuntu/+source/gdm3/+bug/1723577 <ebassi> jadahl: More importantly: how did you get it on the Moon, in order to be at the minimum safe distance for your eyes? And doesn't the latency kill it? <halfline> my best guess so far is monitor_priv->preferred_mode is freed? <jadahl> ebassi: old friend of mr armstrong <halfline> well this code is interesting <halfline> if (crtc_mode == output->preferred_mode)• <halfline> monitor_priv->preferred_mode = mode;• <halfline> … <halfline> if (!meta_monitor_add_mode (monitor, mode))• <halfline> meta_monitor_mode_free (mode);• <halfline> so if meta_monitor_add_mode fails, monitor_priv->preferred_mode will be left freed <halfline> loks like it will fail if the mode is already in the hash <jadahl> halfline: that could be it <jadahl> i'll write a patch, one moment <halfline> cool <didrocks> so, it's the headless commits causing it <halfline> oh you found a patch to revert that makes the problem go away? <didrocks> the test diff is https://launchpadlibrarian.net/341293952/gnome-shell_3.26.1-0ubuntu3_3.26.1-0ubuntu4~revertheadless1.diff.gz <jadahl> not sure how the headless can affect that <didrocks> I reverted both 023b50e7a7002226d6176bede22930d96da074a9 and 5c37facc083078564faaeec4aa58084857c56ee1 <didrocks> the user confirms that with those reverts, gdm shows up the gnome shell UI <halfline> hmm okay <halfline> so it could be there are two problems <didrocks> yeah, could be <halfline> it could be this invalid mode warning is something on the side <halfline> that doesn't actually stop things from working <didrocks> indeed, and so, GNOME Shell with those commits think it's headless and don't fallback <didrocks> (which matches the user logs, we didn't see any crash) <jadahl> that wouldn't explain the failure to find a working config <halfline> oh but this warning above does right? <jadahl> right, that probably means there are two modes with different flags and we choose the wrong one to expose <halfline> so the issue is, because of this bug above, no working config is produced, which before would make gnoem-shell crash <halfline> and fall back to Xorg <halfline> but now <halfline> gnome-shell doesn't crash, but instead assumes its headless <halfline> and so doesn't fall back to Xorg <halfline> but also doesn't show anything <didrocks> that's my guess… <jadahl> ah, so the headless bug fix exposed the issues gravity <jadahl> makes sense <jadahl> any chance to get the output of "sudo modetest" ? <didrocks> in the failing case? <jadahl> in any case <halfline> on a failing machien <jadahl> it'll get me the modes and flags etc from kms <halfline> (but not necessarily a failing boot) <jadahl> yes, on a failing machine <didrocks> ok ;) asking (/!\ it's on a forum, so slow to turn around) <jadahl> didrocks: is there a gnome bugzilla about the issue? <didrocks> jadahl: not yet, will file one soon. Just trying to do a temporary revert for our release and grab more users info. <jadahl> didrocks: fwiw, I have a test case now that gets the "same" error as pasted above, so its probably that issue <didrocks> jadahl: feel free to edit/amend: https://bugzilla.gnome.org/show_bug.cgi?id=789153 <bugbot> Bug 789153: gnome-shell, major, Normal, ---, gnome-shell-maint, NEW , GNOME Shell silently goes headless on some ATI (radeon) config <didrocks> jadahl: oh, good!
Moving to mutter.
Created attachment 361815 [details] [review] monitor/normal: Prefer modes with same flags as preferred mode When generating MetaMonitorMode's, prefer CRTC modes that has the same set of flags as the preferred mode. This not only is probably a better set of configurable modes, but it'll guarantee that the preferred mode is added. This fixes a crash when the preferred mode was not the first mode with the same resolution, refresh rate and set of handled modes.
Created attachment 361816 [details] [review] monitor-unit-tests: Check non-first preferred modes Check that if there are multiple modes with the same ID (resolution, refresh rate and handled flags) we correctly add the preferred mode to the list of monitor modes.
Review of attachment 361815 [details] [review]: right, makes sense
Review of attachment 361816 [details] [review]: sure ::: src/tests/monitor-unit-tests.c @@ +2782,3 @@ + .height = 600, + .refresh_rate = 60.0, + //.flags = META_CRTC_MODE_FLAG_PHSYNC, did you intend to do something more w this? meta_monitor_normal_generate_modes() won't let this flag show up here but you already added the & in the comparison above to ignore it so perhaps just add a comment here?
(In reply to Rui Matos from comment #6) > Review of attachment 361816 [details] [review] [review]: > > sure > > ::: src/tests/monitor-unit-tests.c > @@ +2782,3 @@ > + .height = 600, > + .refresh_rate = 60.0, > + //.flags = META_CRTC_MODE_FLAG_PHSYNC, > > did you intend to do something more w this? > meta_monitor_normal_generate_modes() won't let this flag show up here but > you already added the & in the comparison above to ignore it so perhaps just > add a comment here? That was just me forgetting that I filtered out the non-handled flags on the MetaMonitorMode, then forgetting to remove the commented out flag I had added in error.
Attachment 361815 [details] pushed as 4ad8c4b - monitor/normal: Prefer modes with same flags as preferred mode Attachment 361816 [details] pushed as 12381d5 - monitor-unit-tests: Check non-first preferred modes
Now also on gnome-3-26.
For reference, I've prepared some packages with those fixes (and reintroduced the headless fixes in GNOME Shell that we reverted for the release) for the impacted users to confirm. I'll keep you posted.