GNOME Bugzilla – Bug 747489
No access to the Visual used to build the CoglOnscreen
Last modified: 2015-07-17 22:42:14 UTC
This is a GDK/X11 backend bug. Starting from GDK 3.16, the X11 visual selected for the system and RGBA GdkVisual is not the first one, but the most GLX-compatible. This means that the visuals selected by GDK, Cogl, and Clutter do not match any more — and the nVidia driver is apparently either fairly strict in issuing BadMatch errors, or sorts the visuals in a different order, thus leading to a BadMatch error. The whole thing gets fairly complicated because neither Clutter nor Cogl have accessors to retrieve the X11 Visual (or the Visual ID) they use, thus making synchronization impossible. We'll probably need to sneak in some API into 1.22 to fix this mess.
Or maybe not. We set the Visual on the window when we create it via GDK, but we ignore that visual and we just use the one from Cogl. We need to ensure that Cogl uses the visual from GDK when the window is foreign.
Created attachment 301110 [details] [review] renderer: Add support for foreign Xlib visuals If Cogl is being used with another toolkit the visual selection may be decided by that toolkit.
Created attachment 301111 [details] [review] glx: Add foreign visual support when selecting FBConfig If the CoglRenderer has a foreign visual we give precedence to finding a GLXFBConfig for that visual.
Created attachment 301112 [details] [review] glx: Plug a leak The glXGetVIsualFromFBConfig() function returns a newly allocated XVisualInfo object that we need to free.
Created attachment 301114 [details] [review] Bump Cogl requirement We will need to use additional, semi-private API.
Created attachment 301115 [details] [review] gdk: Set the foreign visual on the Cogl renderer Since release 3.16, GDK does not pick the first visual available, but it will try to look up the most GLX-compatible visual it can find. This means that the visual choosen by GDK and the one choosen by Cogl are going to differ. Cogl 1.20.1 added new API to allow embedding toolkits to define a "foreign" Visual, so we can use that API inside Clutter. This commit should fix the BadMatch crashes on the nVidia binary Linux driver for applications using Clutter and Clutter-GTK.
The first 3 patches are for Cogl, and add new API to set the foreign Visual on the CoglRenderer object. The last 2 patches are for Clutter, and use the newly added Cogl API to ensure that Cogl uses the correct visual coming from GDK.
Created attachment 301136 [details] [review] gdk: Set the foreign visual on the Cogl renderer / v2 Now with more debugging messages.
Created attachment 301137 [details] [review] gdk/stage: Check the visual of foreign windows If the visual does not match the system or RGBA ones then we cannot guarantee that the GdkWindow will be compatible with Cogl rendering.
Created attachment 301138 [details] [review] glx: Add foreign visual support when selecting FBConfig / v2 Add more debugging messages.
After some back and forth on IRC, and some testing, the patches turned out not to be enough. GDK does not ask for stencil/depth buffers explicitly, whereas Cogl does. This makes the nvidia driver return a different set of visuals. There is code in GDK that checks if a GLX visual is compatible with the system/ARGB one, and that Cogl should adapt in order to do the same.
*** Bug 747760 has been marked as a duplicate of this bug. ***
There was a stupid boolean flag check that got reversed, which meant that Clutter/Cogl was relying on drivers returning an ARGB visual first by default even when we weren't asking — which apparently is the case for the intel drivers. I just pushed the one-line fix to the master and clutter-1.22 branches, so I'd like some testing to see if that also fixes this issue with the binary nVidia drivers.
As noted on IRC already, here again for reference. After testing the current patches rebased on master including the fix mentioned in above comment, a matching visual could still not be found leading to the same crash. The reason is still mainly due to visual mismatch between the GDK visual and the one Clutter/Cogl wants which differs regarding the stencil/depth buffer attributes. The proposed solution were to either change GDK to request a visual with stencil/depth buffers or to do it like the x11 backend by replacing the specific window's visual with a new one that uses stencil/depth buffers. Hope this description is roughly correct.
*** Bug 743897 has been marked as a duplicate of this bug. ***
*** Bug 749303 has been marked as a duplicate of this bug. ***
As this causes several applications (like totem) to crash whenever the nvidia drivers are used (for instance, in a fresh Fedora 22 install + nvidia drivers), I suggest raising the Importance of this issue.
(In reply to rhi from comment #17) > As this causes several applications (like totem) to crash whenever the > nvidia drivers are used (for instance, in a fresh Fedora 22 install + nvidia > drivers), I suggest raising the Importance of this issue. Sure, we can raise it. Since there's nobody working on Cogl full time, it won't do what you may think the "Importance" field does. So, let me summarise the issue: * GDK changed the way its visuals are selected in order to add support for OpenGL * the visual GDK selects by default for the paint context it uses internally does not have stencil and depth buffers * Cogl selects its own visuals, and has always been doing that * the visual Cogl selects by default does have stencil and depth buffers No, we cannot change Cogl to drop the need for a stencil and depth buffer: Cogl uses them. We may change the default for GDK to request a stencil and depth buffer; it's not like it's going to cost us much, and chances are that any GdkGLContext created by the user will need those buffers. My original approach was to add new API to Cogl to select a "foreign" visual, so that Cogl would use the visual selected by GDK; the main issue is that the code selecting a visual inside Cogl is fairly complicated and the whole indirection we'd have to punch through makes it hard to implement cleanly. On top of that, it's not really enough, because Cogl will need the GLXFBConfig, and not really the visual, and going from one to the other is not entirely trivial. This whole mess stems from the fact that Cogl and GDK are not integrated; Cogl won't use the GDK OpenGL support (as it predates it by a good amount of years), so it won't be able to draw to an offscreen framebuffer; we could modify Clutter and Clutter-GTK to achieve that, but I don't have any spare cycles for that. As a note: I also don't have any nVidia hardware to test this work, so I'd be coding it blind. I'll have to drop the 'ASSIGNED' state; I had hoped to get something done before 1.22 got released, but I could get it to work in time. If somebody with hardware and time wants to take on this issue, I'd be glad to provide assistance. Just drop by in the #clutter IRC channel on irc.gnome.org.
Unfortunately, I don't have time and a testing machine for extensive tests. However, I have a nvidia card on my (production) machine, so if I can provide further information, please let me know. The error indeed only occurs with the nvidia binary drivers.
Same issue. Has anyone provided it to Nvidia? Looks like we hardly can do anything without their support.
(In reply to Eugene Yashchenko from comment #20) > Same issue. Has anyone provided it to Nvidia? Looks like we hardly can do > anything without their support. nVidia has nothing to do with this: it's a client library issue, so there's no point in involving them at all.
Just to clarify: the reason why this only happens on nVidia is because the nVidia binary blob driver returns the list of available visuals in a different order than Mesa. This is perfectly valid, and compliant to the GLX specs.
Then, why not revert https://git.gnome.org/browse/clutter/patch/?id=60dbeb9425f51fc602ba2fe89b2a968ef4b527ed For 1.22.x branch? Seems that this won't be solved for this cycle as looks like major changes are needed :/ Thanks a lot
(In reply to Pacho Ramos from comment #23) > Then, why not revert > https://git.gnome.org/browse/clutter/patch/ > ?id=60dbeb9425f51fc602ba2fe89b2a968ef4b527ed > > For 1.22.x branch? Seems that this won't be solved for this cycle as looks > like major changes are needed :/ No, I won't revert it. If using the X11 backend "fixes" the issue for you, you're free to use the CLUTTER_BACKEND environment variable as a stop-gap measure. It's highly unlikely that it'll work as soon as you start using a Clutter-GTK application, though, since the visual of choice will still be different between GDK and Clutter/Cogl. Software is hard. The proper way to solve problems is to fix issue, not sweep them under the rug.
(In reply to Emmanuele Bassi (:ebassi) from comment #24) > > Software is hard. The proper way to solve problems is to fix issue, not > sweep them under the rug. Is there any way to make GDK switch the visual the cogl has chosen to?
(In reply to Eugene Yashchenko from comment #25) > (In reply to Emmanuele Bassi (:ebassi) from comment #24) > > > > Software is hard. The proper way to solve problems is to fix issue, not > > sweep them under the rug. > > Is there any way to make GDK switch the visual the cogl has chosen to? Not really, no, since it's the GdkVisual that is going to be used by every widget. Since Clutter/Cogl are the embedded toolkits, the other way around would be the usual way of going about to do this — but as I said above it's not enough. The proper way would be to write a Cogl winsys backend for GDK; then Clutter would use it when using GDK as the windowing system, and that would render using a GdkGLContext on an off-screen frame buffer, like GtkGLArea does. This would be a sizeable chunk of work, though.
(In reply to Emmanuele Bassi (:ebassi) from comment #24) > Software is hard. The proper way to solve problems is to fix issue, not > sweep them under the rug. Well, keeping the apps crashing on purpose won't either help to fix the issue :|, specially keeping it broken on a "stable" branch that is going to be used in a lot of downstreams that need to deal with nvidia drivers (that aren't even the culprit for this case)
(In reply to Pacho Ramos from comment #27) > (In reply to Emmanuele Bassi (:ebassi) from comment #24) > > Software is hard. The proper way to solve problems is to fix issue, not > > sweep them under the rug. > > > Well, keeping the apps crashing on purpose won't either help to fix the > issue :|, specially keeping it broken on a "stable" branch that is going to > be used in a lot of downstreams that need to deal with nvidia drivers (that > aren't even the culprit for this case) I wrote what is needed to fix it; I cannot even confirm that using the X11 backend is going to work around the issue, in the case of clutter-gtk, so confirmation of that is going to be helpful. I cannot prevent downstreams from reverting a commit in a distro patch; what I will not do is revert it upstream.
Created attachment 304943 [details] [review] gdk: Use the Cogl visual on Xlib winsys GDK 3.16 started selecting different visuals, to best comply with the requirements for OpenGL, and this has broken Clutter on GLX drivers that are fairly picky in how they select visuals and GLXFBConfig. GDK selects GLXFBConfig that do not include depth or stencil buffers; Cogl, on the other hand, needs both depth and stencil buffers, and keeps selecting the first available visual, assuming that the GLX driver will give us the best compliant one, as per specification. Sadly, some drivers will return incompatible configurations, and then bomb out when you try to embed Clutter inside GTK+, because of mismatched visuals. Cogl has an old, deprecated, Clutter-only API that allows us to retrieve the XVisualInfo mapping to the GLXFBConfig it uses; this means we should look up the GdkVisual for it when creating our own GdkWindows, instead of relying on the RGBA and system GdkVisuals exposed by GDK — at least on X11.
Created attachment 304944 [details] [review] gdk: Add function to retrieve the GdkVisual Straight from Cogl. This allows us to propagate the GdkVisual Cogl and Clutter use to embedding toolkits, like GTK+.
Created attachment 304945 [details] [review] build: Bump the required Clutter version We need new API to get the GdkVisual.
Created attachment 304946 [details] [review] embed: Use the GDK visual from Clutter Instead of relying on the widget one. This is similar to what we do when running with the X11 backend.
Cogl has a deprecated, Clutter-only API that allows us to retrieve the XVisualInfo mapping to the GLXFBConfig used when creating the GLXContext; this means that Clutter can retrieve the GdkVisual and expose it for clutter-gtk. The reason clutter-gtk may work on X11 is that, on X11, GtkClutterEmbed already overrides the widget's visual. This needs new API inside Clutter's GDK backend, which I'd be willing to backport to the stable branch, even if it's not really nice from an ABI standpoint.
attachment 304943 [details] [review] and attachment 304944 [details] [review] are patches for clutter. attachment 304945 [details] [review] and attachment 304946 [details] [review] are patches for clutter-gtk. Testing on nVidia machines is welcome.
Hey Emmanuele! I've rebuild a clutter/clutter-gtk on Fedora 22 with your patches and everything seems to be in order! Now clutter applications can start without crash. Thanks a lot!
Thanks for testing, much appreciated. I'll commit to master, and backport to 1.22 even if it adds a new symbol. I'll make sure to note the change in the release notes, so that distributions will pick up the new releases.
Attachment 304943 [details] pushed to the clutter master and clutter-1.22 branches Attachment 304944 [details] pushed to the clutter master and clutter-1.22 branches Attachment 304945 [details] pushed to the clutter-gtk master branch Attachment 304946 [details] pushed to the clutter-gtk master branch I will do a release for both as soon as possible, but may have to wait until next week. This problem has been fixed in our software repository. The fix will go into the next software release. Once that release is available, you may want to check for a software upgrade provided by your Linux distribution.
(In reply to Emmanuele Bassi (:ebassi) from comment #34) > attachment 304943 [details] [review] [review] and attachment 304944 [details] [review] > [review] are patches for clutter. > > attachment 304945 [details] [review] [review] and attachment 304946 [details] [review] > [review] are patches for clutter-gtk. > > Testing on nVidia machines is welcome. I just tested this with gnome-maps (I reverted the commit where we force the X11 backend to work-around this bug for the moment) and with cheese. Both seems to work normally.
Thanks a lot Emmanuel... and I would like to apologize because of maybe being too rude with you :(, sorry a lot for that
*** Bug 751162 has been marked as a duplicate of this bug. ***
*** Bug 743085 has been marked as a duplicate of this bug. ***
parole and totem on ubuntu are still failing even with 1.22.4. See https://bugs.launchpad.net/ubuntu/+source/parole/+bug/1462445 and https://bugzilla.xfce.org/show_bug.cgi?id=11825
(In reply to Jackson Doak from comment #42) > parole and totem on ubuntu are still failing even with 1.22.4. See > https://bugs.launchpad.net/ubuntu/+source/parole/+bug/1462445 This has no information about the versions of Clutter or Clutter-GTK being used… > and https://bugzilla.xfce.org/show_bug.cgi?id=11825 … and this refers to an Intel GPU, so it's not this bug — but it's also something I cannot reproduce and the crash apparently happens with the X11 backend, which points out at another cause entirely; most likely, a Mesa issue.