GNOME Bugzilla – Bug 739178
background corrupted on resume from suspend with nvidia blob driver
Last modified: 2019-05-17 07:49:41 UTC
Created attachment 289315 [details] corrupted gnome shell background on resume from suspend gnome-shell background and lock screen background are corrupted with random pixels. cf attachment this bug is open on archlinux bug tracker and debian bug tracker https://bugs.archlinux.org/task/42511 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=765436 it seems to be nvidia proprietary driver related
*** Bug 739203 has been marked as a duplicate of this bug. ***
Sorry, i forgot to mention : - Archlinux with kernel 3.17.1 - Xorg server 1.16.1 - nvidia binary driver 340.46 nothing suspicious in logs and as said in the other report, atl-r + r restore the backgrounds.
one more update. it seams that the background picture is literaly erased from memory and not restored. by using the nvidia-smi utility 1- before sleep : memory usage 115MiB / 1023MiB 2- after sleep : 45MiB / 1023MiB 3- after sleep + reload shell : 97MiB / 1023MiB i've done this three steps a couple of time and the result is always the same a decrease of video memory consumption.
Any chance for you to reproduce this with the nouveau driver?
i will try. one workaround for the moment is having an animated wallpaper (default gnome one) so when a wakeup occured, the wallpapper will change if the wallpapper delay is reached :-)
ok, so i installed nouveau and the background is not corrupted after wakeup. it's only with nvidia binary driver. i will give nouveau a try.
conclusion, after three or four sleep/wakeup since i installed the nouveau driver i don't have any backgrounds corruption (in screen shield and gnome-shell). plus things is that wakeup is almost instantaneous with "nouveau" and performances for standard usage are decent.
Closing this as not a gnome bug then. Thank you for testing.
We usually leave bugs in the driver open until the underlying driver bug is fixed and released.
Is it filed upstream? It's a common bug among nvidia blob users.
*** Bug 740299 has been marked as a duplicate of this bug. ***
here is the same question on nvidia forum https://devtalk.nvidia.com/default/topic/787748/linux/-nvidia340xx-archlinux64-gnome3-14-the-background-of-desktop-and-lockscreen-mess-after-resume-from-/
The nvidia forum post by sandipt mentions: "Our current software architecture doesn't preserve the contents of FBOs across modeswitches or power events. As a result, our implementation relies on applications re-rendering to their FBOs every frame. This is a fundamental limitation of our design that will be fixed in the future." Is is possible for gnome-shell to somehow work around this, by redrawing the background on resume from suspend? Also, this code path was working fine on 3.12, what changed on the gnome-shell side?
*** Bug 741457 has been marked as a duplicate of this bug. ***
I have the same probleme under Archlinux with AMD catalyst proprietary driver. And same probleme here https://bbs.archlinux.org/viewtopic.php?pid=1486134
Same here, under SolydX with Gnome, after updating from Gnome Shell 3.12 to Gnome Shell 3.14. My graphic card is Nvidia, with propietary drivers.
I ust had a report of the same case for openSUSE, where NVidia also chimed in (see https://bugzilla.suse.com/show_bug.cgi?id=914149#c18 ) ### quote ### My best guess would be that the desktop background picture is being stored as an OpenGL Frame Buffer Object (FBO), and that whatever program is responsible for drawing it is counting on the FBO's contents to be preserved across the suspend/resume cycle. FBO contents aren't guaranteed to persist across modeswitch events (suspend/resume will incur a VT switch away from X and back to it, and the VT switches in turn will incur modesets). If an FBO is being reused across a modeswitch, its contents need to be refreshed. ### end quote ###
I'm having this issue on Fedora 21 with NVIDIA driver, two monitors. From what I have read, it's clearly a GNOME bug. https://devtalk.nvidia.com/default/topic/795854/linux/background-gets-corrupted-after-suspend-on-gnome-3-14-346-22-gtx-750-ti-/
The whole reason we use an FBO is so that we don't have to repaint it on every frame. That's considerably expensive. We weren't expecting drivers to lose FBO contents at arbitrary points without a single way of letting us know that that was the case. If the NVIDIA driver team can give us a workaround other than "paint the FBO every frame", then I'd be fine to implement it.
Wouldn't it be a simple fix to just repaint the FBO after resume?
Well, mode switches, but we don't know when that happens.
I have the same probleme under : -Distro: Fedora release 21 (Twenty One) -Display Server: Fedora X.org 116.3 -Kernel: 3.18.7-200.fc21.x86_64 -Desktop: Gnome 3.14.3 -GLX Renderer: GeForce GT 330/PCIe/SSE2 GLX Version: 3.3.0 NVIDIA 340.76 atl-r + r restore the backgrounds. Что за беда?
*** Bug 744889 has been marked as a duplicate of this bug. ***
Just to confirm that I have been experiencing the same issue with GeForce GTX 660 Ti, driver version 346.47 and Fedora 21. I have been experiencing this issue for some time and the only time I'm aware of having seen it is after a resume. Could I suggest building in a repaint of the buffer on resume as it is likely to resolve 99% of the issues people are experiencing.
(In reply to mikey from comment #24) > Just to confirm that I have been experiencing the same issue with GeForce > GTX 660 Ti, driver version 346.47 and Fedora 21. > > I have been experiencing this issue for some time and the only time I'm > aware of having seen it is after a resume. Could I suggest building in a > repaint of the buffer on resume as it is likely to resolve 99% of the issues > people are experiencing. I'm in favor of this. It's a hack but a good one. My background is so depressing right now. Don't go all Bug 650371 on this one, please - it's clearly broken.
(In reply to Alex Hultman from comment #25) > (In reply to mikey from comment #24) > > Just to confirm that I have been experiencing the same issue with GeForce > > GTX 660 Ti, driver version 346.47 and Fedora 21. > > > > I have been experiencing this issue for some time and the only time I'm > > aware of having seen it is after a resume. Could I suggest building in a > > repaint of the buffer on resume as it is likely to resolve 99% of the issues > > people are experiencing. > > I'm in favor of this. There's no polls here. > It's a hack but a good one. It's also not possible. See comment 21. > My background is so > depressing right now. Don't go all Bug 650371 on this one, please - it's > clearly broken. It's a bug in the NVidia driver. I would suggest you contact them and point them here instead of trying to get gnome-shell developers to implement bug fixes which really are work-arounds.
> It's a bug in the NVidia driver. As a suffering user that sentiment is discouraging and lowers my respect for the gnome community, despite knowing it holds truth. As far as i've understood it, the nvidia driver has always behaved that way, and it is gnome that made the change that broke things. I certainly appreciate all efforts to report the bug upstream to nvidia and push for them to fix it, but any temporary hack to address the worse aspect of the bug: refresh after resume; i'm betting would be hugely appreciated by all effected gnome users. Even a simple recipe that users can add to a resume event to force the refresh would be appreciated.
(In reply to Mick Wever from comment #27) > As far as i've understood it, the nvidia driver has always behaved that way, > and it is gnome that made the change that broke things. There are some contracts drawn up when you advertise particular OpenGL features. They're the only ones that don't implement them properly. > I certainly appreciate all efforts to report the bug upstream to nvidia and > push for them to fix it, but any temporary hack to address the worse aspect > of the bug: We're waiting on NVidia to tell us what that work-around should be (comment 19). > refresh after resume; i'm betting would be hugely appreciated by > all effected gnome users. Refresh after resume still isn't possible to detect from the graphics stack (see comment 21, again). > Even a simple recipe that users can add to a resume event to force the > refresh would be appreciated. Use the nouveau drivers, they don't have that problem (comment 5).
> Even a simple recipe that users can add to a resume event to force the > refresh would be appreciated. Would a custom hook in systemd be a possibility for users to add themselves? ref https://wiki.archlinux.org/index.php/Power_management#Hooks_in_.2Fusr.2Flib.2Fsystemd.2Fsystem-sleep
(In reply to Bastien Nocera from comment #28) > (In reply to Mick Wever from comment #27) > > As far as i've understood it, the nvidia driver has always behaved that way, > > and it is gnome that made the change that broke things. > > There are some contracts drawn up when you advertise particular OpenGL > features. They're the only ones that don't implement them properly. > > > I certainly appreciate all efforts to report the bug upstream to nvidia and > > push for them to fix it, but any temporary hack to address the worse aspect > > of the bug: > > We're waiting on NVidia to tell us what that work-around should be (comment > 19). > > > refresh after resume; i'm betting would be hugely appreciated by > > all effected gnome users. > > Refresh after resume still isn't possible to detect from the graphics stack > (see comment 21, again). > > > Even a simple recipe that users can add to a resume event to force the > > refresh would be appreciated. > > Use the nouveau drivers, they don't have that problem (comment 5). I posted a link with a quote from NVIDIA's Aaron Plattner (comment 18). In case you haven't read it: This is a known interaction problem between the way GNOME stores its backgrounds and the fact that FBO contents are not preserved across suspend / resume cycles. In general, applications should avoid relying on FBO contents being preserved across mode switches or VT switches. ---- So the workaround is already out there, I have found similar answers from NVIDIA in other threads. Don't rely on FBO contents being preserved across mode switches. And no - nouveau is not an option, at all. GNOME 3.12 worked fine.
(In reply to Alex Hultman from comment #30) > NVIDIA in other threads. Don't rely on FBO contents being preserved across > mode switches. I'd love to do that, but I don't know when modeswitches happen. If NVIDIA gives me a way to detect when modeswitches happen, even if it's super-specific to their architecture and driver, I will be more than delighted to write some code to repaint the FBO as a workaround. Keep in mind that this is a blatant violation of the OpenGL specification, who claim that FBOs should survive with valid contents for the duration of the FBO. But I understand that writing OpenGL drivers is legitimately hard, and I have a good understanding of why this happens in their architecture. I'd be more than happy to add a hack workaround once I have a way to do it. If there's anything I can do to help this out from the GNOME side, please let me know. > And no - nouveau is not an option, at all. GNOME 3.12 worked fine. GNOME 3.12 had major performance issues with certain background setups, which is why we went to using an FBO to store precached background pixels.
Thanks Jasper for all the explanations. I’ll try to get in touch with nVidia through our hardware support.
Created attachment 299832 [details] [review] meta-background: Add a function to refresh all background instances We need to reload the FBOs under some circumstances, this adds a way to easily do so.
Created attachment 299833 [details] [review] Refresh all background instances after suspend if needed NVIDIA drivers don't preserve FBO contents across suspend / resume cycles which results in broken backgrounds. We can work around that by forcing a refresh when coming out of suspend. -- There, this hack works fine in my testing. Also, despite what Aaron Plattner says, IME VT switching doesn't wipe the background and mode switches are taken care of anyway since we already reload the FBOs on monitors-changed.
Review of attachment 299832 [details] [review]: Sure.
Review of attachment 299833 [details] [review]: OK, I figured it was also on VT switches, which is why I hesitated a bit more. I don't have an NVIDIA machine to actually test with myself, but this looks fine from a blind review. I would add a comment above the hack with a link to this bug or the NVIDIA forums or something to let people know why we're doing this.
Created attachment 299853 [details] [review] Refresh all background instances after suspend if needed -- Ok, added a reference to this bug in a comment to make it clear what's going on. So, how do we want to deal with this? I can ask for a code freeze break if it's deemed worth it and safe enough, otherwise it can go into 3.16.1. This also applies cleanly to the 3.14 branches, so I'll push it there as well if I get an ack.
Review of attachment 299853 [details] [review]: OK. Let's put this in 3.16.1.
(In reply to Rui Matos from comment #37) > This also applies cleanly to the 3.14 branches, so I'll push it there > as well if I get an ack. You don't need an ack for 3.14 - as some stuff has accumulated on the stable branch lately, I'm planning to do a final release these days, so please push it there before 3.16.1.
Review of attachment 299853 [details] [review]: ::: src/shell-util.c @@ +387,3 @@ + return FALSE; + + if (g_strcmp0 (get_gl_vendor (), "NVIDIA")) Aren't we missing a '== 0' here?
Created attachment 299931 [details] [review] Refresh all background instances after suspend if needed -- (In reply to Debarshi Ray from comment #40) > Aren't we missing a '== 0' here? strcmp, The Deceiver, strikes again. And of course, this means that the vendor string was wrong too, is should be "NVIDIA Corporation". Thanks for catching it!
Wouldn't it be better to always redraw the FBO on resumes, regardless if we are using NVIDIA or not. String comparison is maybe a little too specific and haxxy. I mean - a redraw per resume isn't that much of an overhead.
That makes the patch a lot simpler too: 14 const LoginManager = imports.misc.loginManager; 253 // NVIDIA drivers don't preserve FBO contents across 254 // suspend/resume, see 255 // https://bugzilla.gnome.org/show_bug.cgi?id=739178 257 LoginManager.getLoginManager().connect('prepare-for-sleep', 258 function(lm, suspending) { 259 if (suspending) 260 return; 261 Meta.Background.refresh_all(); 262 });
Created attachment 299935 [details] [review] Refresh background on resume
(In reply to Alex Hultman from comment #42) > Wouldn't it be better to always redraw the FBO on resumes, regardless if we > are using NVIDIA or not. Why do more work than we need to? > String comparison is maybe a little too specific > and haxxy. It's what everyone does, including game engines, etc.
I'm not arguing. Your patch will be much welcomed to my workstation. But no - string comparison is not whats done in game engines or what "everyone" does (I'm most certainly not that kind of "everyone"). Anyhow - you know more GNOME than I do and this patch of yours works so lets just commit it and be happy.
afaik when it comes to match driver vendors, that is done with string comparisons in the opengl world...
Will push to master after 3.16 is out. 46d57bf..6b3be51 gnome-3-14 -> gnome-3-14 Attachment 299832 [details] pushed as 6b3be51 - meta-background: Add a function to refresh all background instances
4b0cbaf..d19e78a gnome-3-14 -> gnome-3-14 Attachment 299931 [details] pushed as d19e78a - Refresh all background instances after suspend if needed
Comment on attachment 299832 [details] [review] meta-background: Add a function to refresh all background instances Attachment 299832 [details] pushed as 7c5fe42 - meta-background: Add a function to refresh all background instances
Attachment 299931 [details] pushed as c3bf4a3 - Refresh all background instances after suspend if needed
My problem is solved [maxim@localhost ~]$ inxi -S -G System: Host: localhost.localdomain Kernel: 3.19.3-200.fc21.x86_64 x86_64 (64 bit) Desktop: Gnome 3.14.4 Distro: Fedora release 21 (Twenty One) Graphics: Card: NVIDIA G92 [GeForce GT 330] Display Server: Fedora X.org 116.3 driver: N/A Resolution: 1280x1024@60.02hz GLX Renderer: GeForce GT 330/PCIe/SSE2 GLX Version: 3.3.0 NVIDIA 340.76
The patch for 3.14.4 does fix the problem for images as wallpapers, but for colors as wallpapers I'm still getting garbage. Just thought I'd throw that in here, I haven't looked into why or anything.
works a charm. (also on an integrated 'Intel HD Graphics 5500', from which i suffered the same problem (which doesn't match the discussions above))
(In reply to John Chadwick from comment #53) > The patch for 3.14.4 does fix the problem for images as wallpapers, but for > colors as wallpapers I'm still getting garbage. Just thought I'd throw that > in here, I haven't looked into why or anything. Commenting in closed bugs isn't a good way to attract attention. Anyway, I couldn't reproduce this. How are you setting the background color? Which GPU and driver version are you using?
*** Bug 748785 has been marked as a duplicate of this bug. ***
Can someone give instructions to apply the workaround patches and rebuild the gnome packages. Thanks.
This appears to not work if I have 'systemctl hibernate' scheduled from a /etc/cron.d/hibernate-at-4am script. I still see the desktop corruption. However, if I alt-f2 then 'systemctl hibernate', I see no corruption.
I upgraded my Debian from Wheezy to Jessie and the problem has gone with the upgraded Gnome. Not sure the fix will be backported
For your information, here are the Gnome Shell versions for Debian: Debian Wheezy : 3.4.2 (bug) Debian Jessie : 3.14.2 (fixed)
Did the fix make it to 3.16.3 as well? I am seeing this again in Fedora 22.
(In reply to Julian Sikorski from comment #61) > Did the fix make it to 3.16.3 as well? Yes, 3.16.1 to be exact.
(In reply to Florian Müllner from comment #62) > (In reply to Julian Sikorski from comment #61) > > Did the fix make it to 3.16.3 as well? > > Yes, 3.16.1 to be exact. Then it is still broken. I have to alt+f2 -> r or set a new background wallpaper after resuming from hibernate.
(In reply to Hussam Al-Tayeb from comment #63) > Then it is still broken. I have to alt+f2 -> r > or set a new background wallpaper after resuming from hibernate. Does it work if instead of 'systemctl hibernate' you call the logind dbus interface? Something like: gdbus call --system --dest org.freedesktop.login1 --object-path /org/freedesktop/login1 --method org.freedesktop.login1.Manager.Hibernate false
Yes. That worked. The background looked fine. The up arrow in the lockscreen looked fuzzy but after unlocking, the background looked fine. Is this way of hibernating something I can use through a cron job in (/etc/cron.d/) without requiring my user is logged on?
(In reply to Hussam Al-Tayeb from comment #65) > Is this way of hibernating something I can use through a cron job in > (/etc/cron.d/) without requiring my user is logged on? I'm actually not sure. Try it? Anyway, at this point, we're getting offtopic. This is how systemd works. The API we use to know that we're resuming from suspend/hibernate is provided by logind and its docs[1] say: "Note that this will only be sent out for suspend/resume cycles done via logind, i.e. generally only for high-level user-induced suspend cycles, and not automatic, low-level kernel induced ones which might exist on certain devices with more aggressive power management." I don't know of any lower level API that we could use to know when the machine is resuming. [1] http://www.freedesktop.org/wiki/Software/systemd/inhibit/
(In reply to Florian Müllner from comment #62) > (In reply to Julian Sikorski from comment #61) > > Did the fix make it to 3.16.3 as well? > > Yes, 3.16.1 to be exact. For me it still happens with 3.16.3 - not every time, however. I suspend my laptop (suspend to RAM) either by closing the lid, or by letting it run until it suspends.
Would it be possible to tell we resumed from hibernate when clock changes? For example, I hibernated at 11:55 and resumed at 12:55. The kernel is reporting a different date or time so redraw the background.
yes, timerfd() will fire any time the system is resumed if the TFD_TIMER_CANCEL_ON_SET flag is used. Though, it's clearly cleaner to use the systemd apis (except I guess they don't cover all cases)
Does only the user background get refreshed or the greeter one as well?
Hi again. There is another place which needs to be refreshed upon resume from hibernate. The up-arrow in the lockscreen gets corrupted. Should I open a new bug report? Thank you!
It doesn't work that well for me. I would re-open but I cannot. I see many corruptions in background and the login shield even when I haven't suspended. After the monitor blanks for a while I can get corruptions.
My login shield is completely corrupted and won't update even if I pull it up and down and login and logout. I use 358.16 currently.
Please report it to nvidia, especially if it worked with previous driver versions.
(In reply to Rui Matos from comment #74) > Please report it to nvidia, especially if it worked with previous driver > versions. I think he is referring to the background in the gdm session. That one still doesn't get refreshed. Only the user session's backgrounds are refreshed.
A new extension, NV_robustness_video_memory_purge, was designed and implemented by NVIDIA to notify applications of events where the contents of video memory were lost. This shall enable the use of a workaround as described in comment #19: "If NVIDIA gives me a way to detect when modeswitches happen, even if it's super-specific to their architecture and driver, I will be more than delighted to write some code to repaint the FBO as a workaround." This extension will be implemented in the upcoming r367 beta driver version (expected to be released within 2 weeks). The text of the extension is pending official publication by Khronos, and has been provided to RedHat through separate channels. -- Arthur Huillet NVIDIA Linux graphics
(In reply to Arthur Huillet from comment #76) > A new extension, NV_robustness_video_memory_purge, was designed and > implemented by NVIDIA to notify applications of events where the contents of > video memory were lost. > This shall enable the use of a workaround as described in comment #19: > "If NVIDIA gives me a way to detect when modeswitches happen, even if it's > super-specific to their architecture and driver, I will be more than > delighted to write some code to repaint the FBO as a workaround." > > This extension will be implemented in the upcoming r367 beta driver version > (expected to be released within 2 weeks). > The text of the extension is pending official publication by Khronos, and > has been provided to RedHat through separate channels. > > -- > Arthur Huillet > NVIDIA Linux graphics Perhaps nvidia can post a patch for mutter/gnome-shell to use the new function so users such as myself try it? :)
(In reply to Arthur Huillet from comment #76) > A new extension, NV_robustness_video_memory_purge, was designed and > implemented by NVIDIA to notify applications of events where the contents of > video memory were lost. > This shall enable the use of a workaround as described in comment #19: > "If NVIDIA gives me a way to detect when modeswitches happen, even if it's > super-specific to their architecture and driver, I will be more than > delighted to write some code to repaint the FBO as a workaround." > > This extension will be implemented in the upcoming r367 beta driver version > (expected to be released within 2 weeks). > The text of the extension is pending official publication by Khronos, and > has been provided to RedHat through separate channels. > > -- > Arthur Huillet > NVIDIA Linux graphics By the way, I tried the nvidia 367.18 driver on my nvidia 630 GT (a fermi GF108 card) and it was locking up with XID errors 5 minutes after booting. reverting to the 364.19 driver fixed it. But I did see the new extension in the release notes.
[When replying, please avoid full quotes of previous comments for no reason, but instead only keep relevant lines. Thank you!]
Created attachment 329063 [details] [review] cogl-winsys-glx: Allow creating a GL context with robustness enabled This API allows callers to specify that they're able to handle a GL context with robustness enabled. We make this opt-in since robustness contexts change how glGetError() behaves. -- The context creating bits aren't right according to the spec[0] yet because the currently available NVIDIA driver that implements this doesn't work like that yet. I'm told it will change to the spec'ed API on a future release. I'm attaching this as is now for review of the general approach and for anyone that wants to test with the 367.18 driver. [0] https://www.opengl.org/registry/specs/NV/robustness_video_memory_purge.txt
Created attachment 329064 [details] [review] cogl-context: Add a cogl_get_graphics_reset_status API If the driver supports the GL_ARB_robustness extension we can expose the graphics reset status this way.
Created attachment 329065 [details] [review] cogl: Ignore GL_CONTEXT_LOST when checking for GL errors When using a context with robustness, glGetError() may return GL_CONTEXT_LOST at any time and this error doesn't get cleared until the application calls glGetGraphicsResetStatus() . This means that our error checking can't call glGetError() in a loop without checking for that return value and returning in that case.
Created attachment 329066 [details] [review] compositor: Handle GL context reset errors Emit a signal so that interested parties can recreate their FBOs and queue a full scene graph redraw to ensure we don't end up showing graphical artifacts. This relies on the GL driver supporting the ARB_robustness extension and cogl creating a suitable GL context. For now we only make use of it with the X backend since the only driver with which this is useful is NVIDIA.
Created attachment 329067 [details] [review] MetaBackground: invalidate contents on loss of graphics context We use FBOs so we need to cause them to be recreated if we get a GL context lost error.
Created attachment 329068 [details] [review] MetaSurfaceActorX11: invalidate the stex on loss of graphics context MetaShapedTexture uses FBOs when mipmapping so we need to cause them to be recreated if we get a GL context lost error.
Created attachment 329071 [details] [review] Revert "meta-background: Add a function to refresh all background instances" We now handle GL context lost errors properly so we don't need this anymore. This reverts commit 7c5fe42835ea303a44a837646cfb8f7e1a72e681.
Created attachment 329073 [details] [review] Revert "Refresh all background instances after suspend if needed" This is now handled entirely in cogl and mutter. This reverts commit c3bf4a325d5b0c2b0b0115e8177d62e33c847749.
Created attachment 329074 [details] [review] main: Reload theme on loss of graphics context The theme machinery uses FBOs in some cases (mainly for shadows) which need to be reloaded if we get a GL context lost error.
Will those patches add a dependency on the new driver for people using nvidia cards? Pre-Fermi cards are on a legacy driver branch that is at feature freeze and won't get the new extension.
Nvidia fixed my xorg driver crash. I will try those patches.
Created attachment 329790 [details] [review] cogl-winsys-glx: Allow creating a GL context with robustness enabled This API allows callers to specify that they're able to handle a GL context with robustness enabled. We make this opt-in since robustness contexts change how glGetError() behaves.
Created attachment 329791 [details] [review] cogl-context: Add a cogl_get_graphics_reset_status API If the driver supports the GL_ARB_robustness extension we can expose the graphics reset status this way.
Created attachment 329792 [details] [review] cogl: Ignore GL_CONTEXT_LOST when checking for GL errors When using a context with robustness, glGetError() may return GL_CONTEXT_LOST at any time and this error doesn't get cleared until the application calls glGetGraphicsResetStatus() . This means that our error checking can't call glGetError() in a loop without checking for that return value and returning in that case.
Created attachment 329793 [details] [review] restart: Make meta_restart() work without a message In some cases there's no meaningful message to show.
Created attachment 329794 [details] [review] compositor: Handle GL context reset errors -- As discussed with Owen on irc, I changed this to restart the shell on any other robustness errors since there's isn't much we can do according to the spec as we can't easily tear down a GL context and create a new one.
Created attachment 329795 [details] [review] MetaBackground: invalidate contents on loss of graphics context We use FBOs so we need to cause them to be recreated if we get a GL context lost error.
Created attachment 329796 [details] [review] MetaSurfaceActorX11: invalidate the stex on loss of graphics context MetaShapedTexture uses FBOs when mipmapping so we need to cause them to be recreated if we get a GL context lost error.
Review of attachment 329071 [details] [review]: let's keep this in for users who are stuck with older drivers
Review of attachment 329073 [details] [review]: ditto
Ok, the 367.27 driver is out, complying with the spec, and these rebased patches now work with it.
Ok, I tried the patches. The background refreshes after resuming. The floating arrow in the lock screen is still corrupted though. Can that be refreshed as well or is it a different issue?
Created attachment 329802 [details] [review] screenShield: Stop using an offscreen buffer for the arrow actor This isn't a performance critical actor and the NVIDIA driver discards offscreen buffers in some cases which would require us to go through extra hoops to handle here which isn't worth it. -- (In reply to Hussam Al-Tayeb from comment #101) > The floating arrow in the lock screen is still corrupted though. Try adding this gnome-shell patch too.
I applied that patch, restarted, and the floating arrow is still corrupted after resuming from hibernate. So this particular issue is still there. However, the refreshed background much faster now after resuming from hibernate.
Created attachment 329812 [details] [review] screenShield: Make the arrow redraw on style-changed We need to invalidate the underlying ClutterCanvas for it to re-create the PBO and emit the draw signal. -- Can you test with this one on top too? Note that I can't reproduce this particular issue here, but I can see why it's failing.
Review of attachment 329812 [details] [review]: Wait, StDrawingArea already does this on its style_changed() vfunc so this really shouldn't be needed
(In reply to Rui Matos from comment #105) > Review of attachment 329812 [details] [review] [review]: > > Wait, StDrawingArea already does this on its style_changed() vfunc so this > really shouldn't be needed It did however fix my issue.
Created attachment 329855 [details] [review] screenShield: Chain up Arrow's vfuncs Style changes weren't reaching the drawing area widget. Paint was being chained directly which works but the correct thing to do is chain up and let St.Widget's implementation do the propagation. -- Thanks for testing. This should be the right fix.
(In reply to Rui Matos from comment #107) > Created attachment 329855 [details] [review] [review] > screenShield: Chain up Arrow's vfuncs > > Style changes weren't reaching the drawing area widget. Paint was > being chained directly which works but the correct thing to do is > chain up and let St.Widget's implementation do the propagation. > -- > > Thanks for testing. This should be the right fix. I reverted the old patch in my local tree and applied this one. The arrow shows correctly after resuming from hibernate. Looks perfect!
Review of attachment 329855 [details] [review]: I don't think chaining up the paint is correct - if we look at st-widget.c, we see: static void st_widget_paint (ClutterActor *actor) { st_widget_paint_background (ST_WIDGET (actor)); /* Chain up so we paint children. */ CLUTTER_ACTOR_CLASS (st_widget_parent_class)->paint (actor); } So if you chain up afterwards, you draw the background on *top* of whatever drawing you are doing. Compare st-label.c or st-icon.c - their paint implementations call st_widget_draw_background() as the first thing, then they do custom drawing, then they call paint on their child.
Review of attachment 329795 [details] [review]: Looks fine
Review of attachment 329793 [details] [review]: Should add (allow none) annotation to meta_restart(), good to commit with that.
Review of attachment 329793 [details] [review]: [ I mean allow-none with the dash ]
Review of attachment 329794 [details] [review]: ::: src/compositor/compositor.c @@ +1103,3 @@ +static void +redraw_all_children (ClutterActor *actor) It would be quite useful to document why this rather than clutter_actor_queue_redraw(stage); - my thought is that this will cause all the FBO caches for offscreen effects to be redrawn - but that's not sufficient - the fbo's actually need to be destroyed and recreated which this won't cause. @@ +1149,3 @@ + since we don't enable robustness in that case. */ + g_assert (!meta_is_wayland_compositor ()); + meta_restart (NULL); Suggest a break; here - we typically use a break; on the last statement of the case, even if it's a default; statement that will stay last. (Without a break; it reads to me like meta_restart() is a non-returning function - and the restart is async.) ::: src/core/display.c @@ +345,3 @@ + display_signals[GL_CONTEXT_LOST] = + g_signal_new ("gl-context-lost", I think this name is now wrong, since it doesn't apply to a generic context that you have to recreate, it refers to a purged context that you can recover by following certain steps. Suggest gl-video-memory-purged since that describes what those who connect need to do. (gl-context-purged also would make sense in the context of the extension but is less descriptive.)
Review of attachment 329802 [details] [review]: I can't figure out why Giovanni added this in the first place - we already cache both the cairo drawing for the arrow and the shadow - so drawing this widget should be about as cheap as possible. As far as I can tell, things should work fine without, but just make sure that you've checked that the arroow shadow draws correctly now.
Review of attachment 329796 [details] [review]: ::: src/compositor/meta-surface-actor-x11.c @@ +431,3 @@ + + meta_shaped_texture_set_texture (stex, NULL); + meta_shaped_texture_set_texture (stex, priv->texture); Conceptually, this at the wrong level - MetaSurfaceActorX11 doesn't "know" that MetaShapedTexture uses a MetaTextureTower or that MetaTextureTower uses a FBO. *With* a comment added here along the lines of: /* Setting the texture to NULL will cause all the FBO's cached by the * shaped texture's MetaTextureTower to be discarded and recreated. * I'm OK with this because we're really just working around something. If FBO's had this behavior per the GL spec then I think the callback would be best on CoglOffscreen or CoglContext and not on the MetaDisplay.
Review of attachment 329790 [details] [review]: ::: cogl/cogl/cogl-glx.h @@ +78,3 @@ +void +cogl_glx_set_allow_robustness (CoglBool allow); This obviously is a low-overhead way of arranging things, and avoids patching Clutter, but I don't think it fits in well with how Cogl works - there are no similar global functions that affect the behavior. I'd expect cogl_[xlib?]_renderer_set_allow_robustness() or something like that. ::: cogl/cogl/winsys/cogl-winsys-glx.c @@ +1042,3 @@ + GL_TRUE, + GLX_CONTEXT_RESET_NOTIFICATION_STRATEGY_ARB, + GLX_LOSE_CONTEXT_ON_RESET_ARB, Hmm, so this is going to X error unless *both* robustness and GL_NV_robustness_video_memory_purge are present and can be enabled. This makes the name set_allow_robustness() questionable. I'm a bit confused about whether enabling robustness is useful without the video_memory_purge() I'd suggest either: * Only request reset_video_memory_purge if the extension is available * Or rename enable_robustness to request_reset_on_video_memory_purge() - but then it's unclear why you get all the other context-loss notifications - so I like this less. In general, I think I'd rather see the code check for the robustness extension before trying to enable this stuff - generating an X error on every startup is harmless but ugly.
Review of attachment 329791 [details] [review]: ::: cogl/cogl/cogl-context.c @@ +792,3 @@ + +CoglGraphicsResetStatus +cogl_get_graphics_reset_status (CoglContext *context) Needs some basic docs, at least to describe that this requires enabling the extension. ::: cogl/cogl/cogl-context.h @@ +380,3 @@ + COGL_GRAPHICS_RESET_STATUS_INNOCENT_CONTEXT_RESET, + COGL_GRAPHICS_RESET_STATUS_UNKNOWN_CONTEXT_RESET, + COGL_GRAPHICS_RESET_STATUS_PURGED_CONTEXT_RESET, Needs docs
Created attachment 330156 [details] [review] cogl-winsys-glx: Add support for NV_robustness_video_memory_purge This adds API to allow callers to specify that they're interested in video memory purge errors. -- (In reply to Owen Taylor from comment #116) > I'd expect cogl_[xlib?]_renderer_set_allow_robustness() Ok, changed to something similar > Hmm, so this is going to X error unless *both* robustness and > GL_NV_robustness_video_memory_purge are present and can be enabled. This > makes the name set_allow_robustness() questionable. I'm a bit confused about > whether enabling robustness is useful without the video_memory_purge() I was confused too and asked Arthur who clarified that indeed as of the latest spec version we don't need to enable robust buffer access. We only need to check for the availability of the GetGraphicsResetStatusARB api and enable both GLX_CONTEXT_RESET_NOTIFICATION_STRATEGY_ARB and GLX_GENERATE_RESET_ON_VIDEO_MEMORY_PURGE_NV. > I'd suggest either: > > * Only request reset_video_memory_purge if the extension is available This is what I did here, and I also changed the cogl api as you suggest to reflect the clarification above.
Created attachment 330158 [details] [review] cogl-context: Add a cogl_get_graphics_reset_status API If the driver supports the GL_ARB_robustness extension we can expose the graphics reset status this way. -- Added documentation
Created attachment 330159 [details] [review] cogl: Ignore GL_CONTEXT_LOST when checking for GL errors -- rebased
Created attachment 330161 [details] [review] clutter/x11: Add API to request video memory purges to be reported -- This is the clutter call
Created attachment 330162 [details] [review] restart: Make meta_restart() work without a message -- Added allow-none
Created attachment 330163 [details] [review] compositor: Handle GL video memory purged errors -- (In reply to Owen Taylor from comment #113) > +static void > +redraw_all_children (ClutterActor *actor) > > It would be quite useful to document why this rather than > clutter_actor_queue_redraw(stage); - my thought is that this will cause all > the FBO caches for offscreen effects to be redrawn - but that's not > sufficient - the fbo's actually need to be destroyed and recreated which > this won't cause. This was actually a leftover of an earlier version where I was still experimenting. I don't think it's needed and replaced it with a simple full stage redraw. Also addressed your other points.
Created attachment 330164 [details] [review] MetaBackground: invalidate contents on video memory purged errors -- rebased
Created attachment 330165 [details] [review] MetaSurfaceActorX11: invalidate the stex on video memory purged errors -- Added a comment as suggested
Created attachment 330166 [details] [review] main: Reload theme on video memory purge errors -- rebased
Created attachment 330168 [details] [review] screenShield: Stop using an offscreen buffer for the arrow actor -- (In reply to Owen Taylor from comment #114) > I can't figure out why Giovanni added this in the first place - we already > cache both the cairo drawing for the arrow and the shadow - so drawing this > widget should be about as cheap as possible. As far as I can tell, things > should work fine without, but just make sure that you've checked that the > arroow shadow draws correctly now. Yes, it seems to work fine without this, including the shadow.
Created attachment 330169 [details] [review] screenShield: Chain up Arrow's style_changed vfunc -- (In reply to Owen Taylor from comment #109) > Review of attachment 329855 [details] [review] [review]: > > I don't think chaining up the paint is correct Right. Changed it to only chain up style_changed
(In reply to Rui Matos from comment #127) > Yes, it seems to work fine without this, including the shadow. Actually, after a closer look, the arrow shadows weren't being painted properly, see bug 767954. It still works fine with this patch though.
Review of attachment 330165 [details] [review]: Looks good
Review of attachment 330166 [details] [review]: Looks fine. (I shudder at the thought of doing more IO on resume from suspend, but it's nicely encapsulated to only NVIDIA driver versions that need it.)
Review of attachment 330168 [details] [review]: OK
Review of attachment 330169 [details] [review]: OK
Review of attachment 330162 [details] [review]: OK
Review of attachment 330161 [details] [review]: Looks fine
Review of attachment 330163 [details] [review]: Looks good
Review of attachment 330164 [details] [review]: Looks good
Review of attachment 330158 [details] [review]: Looks good
Review of attachment 330156 [details] [review]: Looks good except for docs/comments ::: cogl/cogl/cogl-xlib-renderer.h @@ +176,3 @@ +void +cogl_xlib_renderer_request_reset_on_video_memory_purge (CoglRenderer *renderer, + CoglBool enable); I'd like to see docs here. I think this is the proper place to explain: - that certain NVIDIA drivers lose frame buffer contents on suspend, etc. - that the NV_robustness_video_memory_purge extension allows getting notification - that if the extension isn't present, the driver doesn't have that issue ANd also describe how you find out about the reset. If enabling this can cause other resets to be reported, that should be mentioned here as something that a user needs to be aware of. ::: cogl/cogl/winsys/cogl-winsys-glx.c @@ +1031,3 @@ None }; + static const int attrib_list_reset_on_purge[] = I'd like a comment here that has a short explanation of the relationship between NV_robustness_video_memory_purge and the robustness extension - basically what you found out from ahuillet @@ +1050,3 @@ + if (display->renderer->xlib_want_reset_on_video_memory_purge && + strstr (glx_renderer->glXQueryExtensionsString (xlib_renderer->xdpy, Need to comment why you aren't using COGL_WINSYS_FEATURE
Created attachment 330457 [details] [review] cogl-winsys-glx: Add support for NV_robustness_video_memory_purge -- Added the requested docs and comments.
(In reply to Rui Matos from comment #120) > Created attachment 330159 [details] [review] [review] > cogl: Ignore GL_CONTEXT_LOST when checking for GL errors This one doesn't have the review flag yet, please don't forget about it
Review of attachment 330457 [details] [review]: Looks good
Review of attachment 330159 [details] [review]: Looks good
Attachment 330158 [details] pushed as 0f2be43 - cogl-context: Add a cogl_get_graphics_reset_status API Attachment 330159 [details] pushed as d4d2bf0 - cogl: Ignore GL_CONTEXT_LOST when checking for GL errors Attachment 330161 [details] pushed as 3691eb6 - clutter/x11: Add API to request video memory purges to be reported Attachment 330162 [details] pushed as 7ed14e0 - restart: Make meta_restart() work without a message Attachment 330163 [details] pushed as 7f6bcea - compositor: Handle GL video memory purged errors Attachment 330164 [details] pushed as 53993ba - MetaBackground: invalidate contents on video memory purged errors Attachment 330165 [details] pushed as cc6efeb - MetaSurfaceActorX11: invalidate the stex on video memory purged errors Attachment 330457 [details] pushed as 87f9927 - cogl-winsys-glx: Add support for NV_robustness_video_memory_purge
Attachment 330166 [details] pushed as 358f64d - main: Reload theme on video memory purge errors Attachment 330168 [details] pushed as a7562b4 - screenShield: Stop using an offscreen buffer for the arrow actor Attachment 330169 [details] pushed as b5dd4d1 - screenShield: Chain up Arrow's style_changed vfunc
I hate to mention bad news but this is broken again after today's mutter checkins. Background gets corrupted but floating arrow still renders correctly.
Reopening given the previous comment.
System: Host: Kernel: 3.16.0-4-amd64 x86_64 (64 bit) Desktop: Gnome 3.14.4 Distro: Debian GNU/Linux 8 Graphics: Card: NVIDIA GK107 [GeForce GTX 650] Display Server: X.Org 1.16.4 driver: nvidia Resolution: 1440x900@59.89hz, 1680x1050@59.88hz GLX Renderer: GeForce GTX 650/PCIe/SSE2 GLX Version: 4.4.0 NVIDIA 340.96 I have this problem. Any advice about the current status? Thanks in advance, AA
NVIDIA driver 375.10 seems to fix the corrupted background on switching VT. I tried like a dozen times already. The background corruption still happens however on resuming from hibernate.
FWIW, the following two commits broke the build when using the GLES2 driver, as neither GL_CONTEXT_LOST nor GL_GUILTY_CONTEXT_RESET_ARB are defined in the headers (at least not in the headers provided by mesa 13.0.4): * cogl: Ignore GL_CONTEXT_LOST when checking for GL errors https://git.gnome.org/browse/mutter/commit/?id=d4d2bf0f6c1737256b921c4f1dedd3a95138cab9 * cogl-context: Add a cogl_get_graphics_reset_status API https://git.gnome.org/browse/mutter/commit/?id=0f2be43af479f5bae41b00fc857324b27550e410
I've filed bug 781398 to deal with the GLES2 issue. See patches attached there, thanks!
Right now only the animated arrow is corrupted. Wallpaper refreshes correctly. So some improvement :)
Created https://gitlab.gnome.org/GNOME/mutter/merge_requests/13 that should fix the regression.
(In reply to Jonas Ådahl from comment #153) > Created https://gitlab.gnome.org/GNOME/mutter/merge_requests/13 that should > fix the regression. Wow. Thank you very very much for getting back to this bug. I'm cloning mutter sources now and will try the patch.
(In reply to Hussam Al-Tayeb from comment #154) > (In reply to Jonas Ådahl from comment #153) > > Created https://gitlab.gnome.org/GNOME/mutter/merge_requests/13 that should > > fix the regression. > > Wow. Thank you very very much for getting back to this bug. I'm cloning > mutter sources now and will try the patch. Looks fixed with the patch. Thanks again Jonas!
(In reply to Hussam Al-Tayeb from comment #155) > (In reply to Hussam Al-Tayeb from comment #154) > > (In reply to Jonas Ådahl from comment #153) > > > Created https://gitlab.gnome.org/GNOME/mutter/merge_requests/13 that should > > > fix the regression. > > > > Wow. Thank you very very much for getting back to this bug. I'm cloning > > mutter sources now and will try the patch. > > Looks fixed with the patch. Thanks again Jonas! Thanks for testing, always appreciated!
Could you provide a release version / date for this fix?
It appears the fix went into 3.28.0. But it's not working today in 3.32.x and 3.33.x so please see this new bug instead: https://gitlab.gnome.org/GNOME/gnome-shell/issues/1084 or if you are using Ubuntu: https://bugs.launchpad.net/bugs/1809407