GNOME Bugzilla – Bug 781723
DELL 5K Display (UP2715k) stopped working in 3.24: runtime check failed: (mode->spec.refresh_rate == 0.0f ...
Last modified: 2021-07-05 13:47:24 UTC
Created attachment 350401 [details] Error Message Gnome-Shell/Mutter refuse to start after upgrading from 3.22 to 3.24. I use a Dell 5K Display, which: - uses two Display Ports (DP) - and is handled as two 2560x2880 display merged to one 5120x2880 screen Gnome 3.22 handles this perfectly. Without any manual intervention it is recognizing the display in its native resolution. (5120x2880) But under 3.24 I get the following error message when trying to start gnome-shell: ---------------------- see attachment --------------------------------------- (gnome-shell:1227): mutter-WARNING **: (backends/meta-monitor.c:560):create_tiled_monitor_mode: runtime check failed: (mode->spec.refresh_rate == 0.0f || (mode->spec.refresh_rate == preferred_crtc_mode->refresh_rate)) + Segmentation Fault ----------------------------------------------------------------------------- I'm using Antergos (Arch): Kernel 4.10.11-1 mutter 3.24.1+1+geb394f19d-1 gnome-shell 3.24.1+2+g45c2627d4-1
Could you attach a backtrace of the segmentation fault? Could you also attach the output of the command "modetest", which should be part of a package possibly called drm-utils or something similar.
I really want to help. But I'm only a Linux User / PHP Developer and not very familiar with the linux internals. I used the first hour of my work day to find out, how to backtrace a segmentation fault. But found no answer ;-( Also a drm-utils package seems not to exist for Arch Linux. I have also screened all packages containing "drm" in their names - none of these provide the command modetest. As I have rolled back my production system to Gnome 3.22, I have set up a second system with Gnome 3.24 for testing purposes. So I can provide you with anything that a "stupid hand on the machine" can do ;-)
Created attachment 350439 [details] dmesg output I have attached the output of dmesg.
(In reply to Sebastian from comment #2) > I really want to help. But I'm only a Linux User / PHP Developer and not > very familiar with the linux internals. I'm not very familiar how things are done on Arch, and it seems quite complicated to get debug symbols, but lets try something else first. > > I used the first hour of my work day to find out, how to backtrace a > segmentation fault. But found no answer ;-( Sorry about that :( > > Also a drm-utils package seems not to exist for Arch Linux. I have also > screened all packages containing "drm" in their names - none of these > provide the command modetest. That is unfortunate, but maybe not too bad. > > As I have rolled back my production system to Gnome 3.22, I have set up a > second system with Gnome 3.24 for testing purposes. So I can provide you > with anything that a "stupid hand on the machine" can do ;-) That would be helpful, as I don't have the hardware you have to reproduce this issue. Could you try to build the tool manually? Below is a list of commands that should do that. git clone git://anongit.freedesktop.org/mesa/drm cd drm ./autogen.sh make -j9 Then to run the modetest: ./tests/modetest/modetest > modetest.txt Then attach the modetest.txt here. Let me know if any of the steps above fails.
Additional observations: 1.) Starting Gnome-Shell with the 5K-Display attached -> FAILS 2.) Starting Gnome-Shell and attaching the 5K-Display afterwards -> WORKS 3.) Restarting Gnome-Shell after 2.) -> FAILS
(In reply to Sebastian from comment #5) > Additional observations: > > 1.) Starting Gnome-Shell with the 5K-Display attached -> FAILS > > 2.) Starting Gnome-Shell and attaching the 5K-Display afterwards -> WORKS > > 3.) Restarting Gnome-Shell after 2.) -> FAILS That is indeed an interseting observation. Some follow questions: Do the 5K display work as expected after 2.)? Do you run gnome under X11 or under Wayland? Does it make any difference?
Created attachment 350441 [details] modetest output I have attached the output of modetest.
> Do the 5K display work as expected after 2.)? Yes. It works like expected. (like before with Gnome 3.22) > Do you run gnome under X11 or under Wayland? Under X11. > Does it make any difference? I have not tried Wayland yet. But I will have a look at it now.
(In reply to Sebastian from comment #7) > Created attachment 350441 [details] > modetest output > > I have attached the output of modetest. What hardware and driver versions are you using? Does it make any difference if you run 'sudo ./tests/modetest/modetest > modetest.txt' ? Using your rolled back 3.22 version, or after having attached the monitor as in your step 2.), could you run: gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources > resources.txt and attach resources.txt?
Created attachment 350446 [details] resources.txt (gdbus call / DisplayConfig) GNOME 3.22 I have attached the output of: ------ gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources > resources.txt ------ on the rolled back GNOME 3.22
GPU: nVidia Geforce GTX Titan X Driver: nVidia 378.13 X-Server: 1.19.3 (11903000) I have also tried nouveau -> complete crash, when I connect the 5K-Display. (works fine without it)
If you run nouveau, does modetest give any more information? (you should be able to run that command from the command line without X being started). Could you also attach the result of 'xrandr --verbose' when things are working?
Created attachment 350448 [details] resources.txt (gdbus call / DisplayConfig) GNOME 3.24 (nouveau)
> Does it make any difference if you > run 'sudo ./tests/modetest/modetest > modetest.txt' ? No. Leads to the same result. > If you run nouveau, does modetest give any more information? No. Same result as the attached modetest output.
Created attachment 350449 [details] OUTPUT OF: xrandr --verbose (Gnome 3.22)
I have tried to rollback only mutter from 3.24 -> 3.22. (everything else still on 3.24) = no improvement (same problems as with everything on 3.24) ... even after rolling back gnome-shell, gnome-session, gnome-settings-daemon -> the problem still exists.
(In reply to Sebastian from comment #15) > Created attachment 350449 [details] > OUTPUT OF: xrandr --verbose (Gnome 3.22) Looking at this output, what I can see that the preferred mode of the secondary tile output is 848x480 which seems odd. The preferred mode of the primary tile output is also odd: 2560x1440. Looking at the mutter code 3.22, it looks like it should completely fail to set up the tiled modes, as it will (at least as far as I can tell) just set the preferred modes. So something, at some point, has configured the monitor to use the non-preferred modes. Is there anything relevant in the logs when running on 3.22? Especially if you remove (or temporarily rename) ~/.config/monitors.xml causing mutter to try to create default configurations, and then try again. Another thing that would be interesting is to check whether 'xrandr --verbose' produces any different result when starting X up with the monitor connected, and connecting it after having started X up. BTW, did you try running modetest with sudo when using nouveau?
BTW, getting the actual backtraces would be very helpful. Here seems to be a page about how to do so on Arch: https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces
> So something, at some point, has configured the monitor to use the > non-preferred modes. Maybe Xorg or the nVidia-Driver? Before mid 2016 the monitor was not "merged" to one Screen. And I don't remember exactly - but those resolutions (848x480 and 2560x1440) sound very familar to me ... I think at that time it was set up like this. (one display -> 2560x1440 | one display -> 848x480) And I had to adjust it via the Gnome-Display-Settings to 2560x2880 both. Still leaving me with two separate displays.
> BTW, did you try running modetest with sudo when using nouveau? Yes. Same result ;-(
Created attachment 350466 [details] monitor.xml deleted / gnome restarted (journalctl -f)
Created attachment 350467 [details] Gnome Display Settings adjusted (journalctl -f)
Created attachment 350468 [details] Gnome Display Settings Dialog after deleting monitors.xml + restarting gnome
> Is there anything relevant in the logs when running on 3.22? What I have done: -> START first log file (journalctl -f) 1. deleted the monitor.xml (nothing happens) 2. restarted Gnome ([F2] + r) -> END first log file (delete-monitors-xml-restart.txt) -> START second log file 3. The 5K-Display was now on some strange Resolution (I guess 848x480) 4. I used the Gnome Display Settings (see Screenshot attached) 5. the correct resolution 5120x2880 was the default option in the pull down menu 6. I clicked apply (german: "anwenden") 7. everything was back to normal -> END second log file (adjust-gnome-display-settings.txt)
So to summarize, I know what causes the warning, and could work around that, by handling situations when the driver (nvidia) is providing data that makes less sense than what one would expect. What is still unclear is why anything is crashing. I've tried to reproduce the issue, though without your particular hardware configuration, I can only try to mimic the error, but still when doing that running under X11, nothing is crashing. To get anyway further on that part, I'd have to get a backtrace, to get any kind of clue of where it is crashing. At first you can try $ gdb gnome-shell ... (gdb) run ... *crashes* ... (gdb) backtrace This will most likely produce something, but quite likely not something very usable. For it to be usable, you need to have debug symbols. Here is a guide how to do that when recompiling: https://wiki.archlinux.org/index.php/Step-by-step_debugging_guide#Improved_gdb_output You'd have to recompile the mutter package with debug symbols, but also doing it with the gnome-shell package would help.
> You'd have to recompile the mutter package with debug symbols, but also > doing it with the gnome-shell package would help. As this requires some knowledge I don't have at this time. I think we have to postpone our investigation two week, as I go on holidays tomorrow and have no access to the machines. When I'm back, I will try some things first: 1. install other distributions already using Gnome 3.24 (Fedora / Open Suse Thumbleweed) to rule out distribution related problems. 2. install the latest nvidia Beta Driver It would be also interesting to test a AMD GPU, to rule out nvidia related problems. I will try to get one. My Onboard Intel GPU is not capable of 5K. Thanks for your help and patience so far!
+ Perhaps I should also test KDE/XFCE with nVidia, to see if they also segfault. + As I have weekly Image Backups of my system, I can roll back to any state and limit the updates to just go from Gnome 3.22 to 3.24 without updating anything else, that is not dependend.
(In reply to Sebastian from comment #26) > > You'd have to recompile the mutter package with debug symbols, but also > > doing it with the gnome-shell package would help. > > As this requires some knowledge I don't have at this time. I think we have > to postpone our investigation two week, as I go on holidays tomorrow and > have no access to the machines. > > When I'm back, I will try some things first: > > 1. install other distributions already using Gnome 3.24 (Fedora / Open Suse > Thumbleweed) to rule out distribution related problems. This might make things easier, as (at least Fedora, but probably Opensuse too) debug symbols can be installed with just a command. > > 2. install the latest nvidia Beta Driver > > It would be also interesting to test a AMD GPU, to rule out nvidia related > problems. I will try to get one. My Onboard Intel GPU is not capable of 5K. > > Thanks for your help and patience so far! Thanks yourself. I'll prepare the fixes I have the data to write in the mean time. (In reply to Sebastian from comment #27) > + Perhaps I should also test KDE/XFCE with nVidia, to see if they also > segfault. > That is unlikely, as this is an issue with gnome-shell/mutter, which had various changes involving tiled monitors go in between 3.22 and 3.24.
> which had various changes involving tiled monitors go in between 3.22 and 3.24. Ahhh ... okay. > debug symbols can be installed with just a command Great! Then I will set up a Fedora Test System first after my holidays.
Created attachment 350779 [details] [review] monitor: Fix naming of spec generation function It was at one point referred to as 'id', but was changed to 'spec', but the name of this function was not updated.
Created attachment 350780 [details] [review] monitor: Generate spec struct earlier By generating the spec struct earlier, code executed later can use the fields in the spec.
Created attachment 350781 [details] [review] monitor: Move tiled CRTC mode identification into helper It'll be used in more places later.
Created attachment 350782 [details] [review] monitor: Move get_suggested() behind behind vfunc Only support suggested monitor positioning if the monitor is non-tiled. Normally this functionality is used by virtual machines to provide a hint of how to place the virtual monitors, and they don't tend to use tiled monitors anyway.
Created attachment 350783 [details] [review] monitor: Handle tiled monitors where (0, 0) is not the main output In some circumstances, the origin tile (0, 0) is not the one that should be used to drive the monitor when using a non-tiled mode. Update MetaMonitorTiled to support this case. It also seems to be so that the preferred mode might be some low resolution or bogus mode on these monitors, so also adapt MetaMonitorTiled to manage to ignore the preferred mode of a tiled monitor if the preferred mode doesn't use both tiles.
Created attachment 350784 [details] [review] tests/monitor-unit-tests: Check meta_monitor_is_active()
Created attachment 350785 [details] [review] tests/monitor-unit-tests: Check handling of odd tiled monitors Add tests for handling tiled monitors where the origin tile output is not the main output.
The "monitor: Handle tiled monitors where (0, 0) is not the main output" should add support for the type of monitors Sebastian seems to have to the MetaMonitor abstraction layer. It doesn't fix any crash on the X11 session, so it should only fix the warning mentioned in this report, but would fix a crash on the Wayland session, as the Wayland session relies on MetaMonitors current mode being valid.
Review of attachment 350779 [details] [review]: ok
Review of attachment 350780 [details] [review]: ok
Review of attachment 350781 [details] [review]: sure
Review of attachment 350782 [details] [review]: makes sense
Review of attachment 350784 [details] [review]: ++
Review of attachment 350783 [details] [review]: seems fine
Review of attachment 350785 [details] [review]: lgtm ::: src/tests/monitor-unit-tests.c @@ +2810,3 @@ +static void +meta_test_monitor_custom_tited_non_preferred_config (void) nit: s/tited/tiled/
Created attachment 351599 [details] output of gdb with debug symbols / trying to run gnome-shell So, guys. I am back from vacation in a small bulgarian village with shaky internet connection. Just enjoying all the high tech stuff in my Berlin home. I have built with debug symbols under Arch Linux: 1. mutter 2. gnome-shell See the attached gdb-output.txt. I hope this is helpful.
Hmm, strange. I wonder why it removes it that early. Could you reproduce it again, but also enter the gdb command "backtrace full" when you got there? (i.e. when you see "Thread 1 received signal SIGSEGV... (gdb)")
Created attachment 351608 [details] gdb output / running gnome-shell / backtrace full +done+
Created attachment 351610 [details] [review] monitor: Don't get the monitor manager from the backend We will both create and destroy monitors during initialization (when using the X11 backend), so don't try to access the monitor manager from the backend, but store a pointer to it instead. It's store it in MetaMonitor even though only MetaMonitorTiled uses it, mostly because it makes more sense to store such a pointer there.
Yeah! I have applied only your last patch and have rebuilded / reinstalled mutter. And the bug is gone. I can restart gnome-shell (gnome-shell --replace) and boot into Gnome without any problems. Thanks for taking care of this issue!
(In reply to Sebastian from comment #49) > Yeah! > > I have applied only your last patch and have rebuilded / reinstalled mutter. > > And the bug is gone. > > I can restart gnome-shell (gnome-shell --replace) and boot into Gnome > without any problems. > > Thanks for taking care of this issue! Thanks for testing and for helping out debugging!
Review of attachment 351610 [details] [review]: Nit: "It's store it in" LGTM otherwise
Review of attachment 351610 [details] [review]: ::: src/backends/meta-monitor.c @@ +1006,2 @@ + meta_monitor_manager_tiled_monitor_removed (monitor_priv->monitor_manager, + monitor); Instead of having this direct call to the MonitorManager here we could have the MonitorManager use gobject weak refs (or a specific signal) to be notified when (tiled) monitors are finalized. Anyway, we're already doing it and this fixes a crash so, fine.
Attachment 350779 [details] pushed as 0f5ac1d - monitor: Fix naming of spec generation function Attachment 350780 [details] pushed as 317517f - monitor: Generate spec struct earlier Attachment 350781 [details] pushed as a6678a2 - monitor: Move tiled CRTC mode identification into helper Attachment 350782 [details] pushed as d8adfa9 - monitor: Move get_suggested() behind behind vfunc Attachment 350783 [details] pushed as a3b4ee5 - monitor: Handle tiled monitors where (0, 0) is not the main output Attachment 350784 [details] pushed as 57d07bd - tests/monitor-unit-tests: Check meta_monitor_is_active() Attachment 350785 [details] pushed as 3254103 - tests/monitor-unit-tests: Check handling of odd tiled monitors Attachment 351610 [details] pushed as dfdc15c - monitor: Don't get the monitor manager from the backend
Jonas, this wasn't pushed to master. On purpose?
(In reply to Rui Matos from comment #54) > Jonas, this wasn't pushed to master. On purpose? The master branch version of these patches depends on the not yet reviewed patches in bug 777732 (which on my branch are based on the patches of bug 782152).
Those fixes went into mutter 3.24.3, right? After the hotfixes did a good job on my system, with 3.24.3 the problems are back. I tried to record a video: https://goo.gl/photos/9fSbASYZaRBank7k6 The left half of the MST-Display is displayed without any wallpaper (background) and if I move windows to it, it first shows a strange behavior and than the shell restarts.
They did indeed. Could you get a backtrace of the crash? Was there any change in nvidia driver version too?
> Was there any change in nvidia driver version too? No. I have done the following to find the source of the problem: 1. Full Update (including nVidia and over 700 packages held back) ROLLBACK from Backup 2. Only upated mutter The problem appears in both scenarios.
I was not able to get a backtrace as - when running gnome-shell with gdb - the system gets completely unresponsive, when I move windows to the left half of the MST-Screen. I will attach 3 files: 1. shell output when starting gnome-shell 2. screenshot (screen layout) 3. photo of my screens -> 3 I have marked (in red) what I think the cause of the problem is. It seems, that the start of the x-screen is moved to far right ... we now have an empty left half of the left monitor (what is a kind of outside-screen-area and leads to the crash?). You can see it, when looking at the gnome-topbar. It is spread across the right half of the left monitor and the left half of the right monitor. And what should be the right half of the right monitor is not visible at all (virtually in the "air" ;-)
Created attachment 354483 [details] output after starting gnome-shell with gdb
Created attachment 354484 [details] Screenshot of the screen-layout
Created attachment 354485 [details] photo of the screens with annotations
(In reply to Sebastian from comment #62) > Created attachment 354485 [details] > photo of the screens with annotations 1 -> Gnome-Top-Bar spread across monitors 2 -> empty / defunctional screen area
So to confirm, 3.24.1 with "monitor: Don't get the monitor manager from the backend" things work fine, but 3.24.3 they do not? If you set a non-tiled mode, does things go back to normal? Are you running gdb from a terminal in the same session or another computer? If you run this from another machine via ssh, when it freezes, could you hit Ctrl-C and then run "thread apply all backtrace full" and attach the content? If you run gnome-shell gdb from the same computer, when gnome-shell crashes, it'll stop and wait for you to enter commands. This will most likely freeze your session. If you did run this from the same session, you could do the following: echo "core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern ulimit -c unlimited gnome-shell --replace This will create a core dump called something like 'core.gnome-shell.1234' when gnome-shell crashes (IIRC it'll be saved in your home directory for gnome-shell no matter where you started it from). Note that it'll override any distribution crash reporting tool. When you got core dump run gdb gnome-shell path-to-core-dump then backtrace full to get the backtrace.
> So to confirm, 3.24.1 with "monitor: Don't get the monitor manager from the > backend" things work fine, but 3.24.3 they do not? Yes, exactly. And for me it looks like a new kind of problem introduced with the patches following 3.24.1.
Created attachment 354488 [details] full backtrace of gnome-shell
> If you set a non-tiled mode, does things go back to normal? It is not possible the set the Dell 5K monitor to non-tiled. It is a multi-stream-display using two DP-connections. And the resulting 2x2560x2880 should merged to one 5120x2880 by the OS. (works with Windows 10, Mac OS and Gnome < 3.24 and Gnome 3.24.1 + your patch ;-) But if I disconnect that 5K Monitor things go back to normal. My single stream Monitors (NEC 4K + BENQ FHD) work just fine.
That looks like something that should be fixed by bug 783630 (although that patch is correct, I placed a statement in the wrong place, as you can see from the review). Could it be the same? Is there an arrangement like that, with a gap in between? (that could also be answered by attaching a new "gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources > resources.txt" attachment before draging the window.
Created attachment 354492 [details] Gnome-Display-Settings Screeshot No, there is no gap between the screens. (see screenshot of my gnome-display-settings)
Created attachment 354493 [details] output of gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources
> Could it be the same? It might be an explanation for the crash when moving the window. But it does not explain, why the screen looks so weird. Like you see on my Screenshots/Photos something must be wrong in setting up the screens. (topbar spread accross to monitors, only half displayed on one ... empty/offscreen area on the left half of the left monitor) ... maybe that weird "setup" results in something that is very similar to "a gap" in the screen-setup. (somekind of "offscreen-area")
Created attachment 355391 [details] [review] monitor-manager: Let the MetaMonitor derive the whole layout Instead of letting MetaMonitor derive the logical monitor size, then using the main monitor for the position, just let MetaMonitor derive the whole layout including the position. This means it can deal with tiled monitors better, for example when the main output (the output always active when the monitor is active) is not the origin output (the output with tile position (0, 0).
Created attachment 355392 [details] [review] monitor-unit-tests: Check tiled monitors with non-origin main output Test that a tiled monitor with tile (0, 0) as the non-main output, where main output is defined as the output that is active as long as the monitor is active.
Review of attachment 355391 [details] [review]: looks correct
Review of attachment 355392 [details] [review]: sure
Attachment 355391 [details] pushed as 32fd1e8 - monitor-manager: Let the MetaMonitor derive the whole layout Attachment 355392 [details] pushed as 2bdd97e - monitor-unit-tests: Check tiled monitors with non-origin main output
Patches pushed to both master and gnome-3-24. Waiting with resolving until it has been confirmed that it actually do so.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/mutter/-/issues/ Thank you for your understanding and your help.