GNOME Bugzilla – Bug 615696
Shaders using GLSL 1.20 without #version.
Last modified: 2010-05-11 09:21:53 UTC
After integrating the gst-plugins-gl shaders into piglit for regular parse testing on Mesa, I found that 3 of them don't compile because they use GLSL 1.20 features (array constructors) without declaring #version 120. The 1.30 spec is quite clear that without #version, then 1.10 is the version parsed. Current Mesa and the next Mesa GLSL parser both reject this code. It's easy to avoid the 1.20 requirement by expanding the array setup. I implemented this fix, plus several improvements to gstglfilterblur's implementation, in git://people.freedesktop.org/~anholt/gst-plugins-gl
Hey Eric, Thanks for looking at this, having a compiler developer check the shaders is a lot better than just guessing what could have happened by looking at the result (when I wrote the shaders mesa glsl stuff was pretty young and bleeding edge). About blur and laplacian, I thought I had ported them to use the fragment sources in gst/gl/effects/gstgleffectssources.c. If you look at the convolution fragments there, the #version 120 requirement had already been removed together with array constructors. The kernel[i] != 0 thing was meant to be used in a generic convolution source, as a little optimization, as I thought avoiding a texture lookup could be worth some extra comparison. Same for norm_offset in blur shaders, they were meant to be as generic as possible. Now the question is: is the current approach with all the shaders in one file the best one? this implies, is it better to optimize blur and laplacian for their particular kernels, remove offsets, remove != 0 comparisons, or just to port them to the generic unoptimized convolution sources in effectssources.c? It was done this way because most of the shaders are shared between at least a couple of effects, sometimes with different parameters and I thought it was worth having everything in one place instead of duplicating code (apparently I forgot about filterblur and filterlaplacian, though).
I don't know how to test performance using gst-launch, but I suspect there was a big impact from making these changes. The 965+ takes about a 30% speed hit if there is conditional execution. The current Mesa compiler doesn't turn your conditionals into unconditional execution with conditional moves to avoid that, and pre-965 we can't do conditional execution at all. The loops are getting unrolled, though, and the arrays turned into scalars (loops are so painful to hardware that Mesa will unroll just about anything you hand it. The one piece of real world code I've seen that doesn't get unrolled also doesn't execute correctly!) Texture fetches do call out to the shared texture fetch unit, but given the locality of access, the latency of it should be pretty decent. Probably comparable with the cost of doing the divide I optimized out. Do you expect the convolution kernels to change over time? If not, then you could bake the constants in your shader sources at compile time instead of using uniforms, which would give the compiler the chance to do all the optimizations I was doing by hand.
On my Ironlake laptop, the gstfilterblur changes took this pipeline: gst-launch videotestsrc num-buffers=500 ! glupload ! glfilterblur ! glimagesink sync=false from 10.7 seconds wall time to 3.6 seconds.
Hey, sorry for the delay but, as always, I have very little spare time... I tested the optimized filterblur with nvidia proprietary drivers, and the performance gain was not so high (like 3 seconds unoptimized and 2.3 for the optimized one). High enough to merge your improvements anyway, given the big improvement with mesa compiler. I managed to upgrade to fedora 13 (with experimental nouveau 3d support), and I can see some shader failing because of array constructor and a couple of errors about unsupported opcodes (74, IF and 99, BGNLOOP) with xray, sin and glow effects. The glow one can be easily solved removing "if kernel[i] != 0" blocks from convolution shaders the other ones are more tricky and I need more time to figure out what's happening. I'll try to work on this (merge your work and fix the other shaders) as soon as I have a couple of free hours. I have a doubt: our shaders don't change over time but I always used uniforms as function parameter to reuse the same shader to do different things basing on the given uniforms. IIRC the orange book seemed to support this usage, but you suggest to bake this "constant" parameters in the shader source at compile time. Is this suggestion just a workaround for the mesa compiler not working properly or is this a general purpose thing that should always be done with fragment shaders (i.e. limit the uniforms usage to non constant parameters)? I can probably easily do it but I feel like losing some part of GLSL coolness this way.
Looks like nouveau is claiming support for GLSL without actually supporting even the required features of the GLSL in GLES2. The new compiler will be making that easier to support, but until then I keep the 915 version of the same problem hidden under a driconf option :) I haven't read the orange book thoroughly (I didn't find it to be a great starting point for writing shaders, really), but there are a few ways that numbers get into shaders: - Varyings (changes per fragment or primitive) - Uniforms (changes per draw call) - Constants (changes per compilation of the source) In figuring out how to get your data into the shader, the question is how often the numbers change. The more constant you can make them, the less resources they consume On the 965, varyings take up precious URB space that make it so we can't get as much parallelism between stages of the pipeline because they end up blocking on each other. Not a big deal for gstreamer, where you have basically no vertex shaing going on, and only the fragment stage is busy, which consumes no URB. Uniforms and varyings take up register space, which is shared between threads on a core, so we are less able to hide latency of texture fetches or mathbox (sqrt, reciprocal, pow, exp, log, etc.) calls through hyperthreading. Constants can be compiled right into the instruction stream (if they aren't optimized out!), which only consumes icache space, and even then no more than uniform access does and less than varying access does. There's about 24kb of icache, and the optimized fragment shaders were around 800b. Now, we're dreaming of making Mesa super smart and watch the uniforms used for a shader, then spawn a thread to go off and recompile a version where constant uniforms are turned into constants so we can have that version on hand if you keep your uniforms constant. That's a loooong way down the road, though. The next thing I wanted to check for this patch series was whether the if statement was helping even for the kernel that was 4/9 0.0s. My bet is that removing the if statement will be a big win.
(In reply to comment #5) > Looks like nouveau is claiming support for GLSL without actually supporting > even the required features of the GLSL in GLES2. The new compiler will be > making that easier to support, but until then I keep the 915 version of the > same problem hidden under a driconf option :) About nouveau they actually tell you there is no support for anything 3D related and they want no bug report about it :) So I guess it's just a matter of waiting them to support what's missing. About i915, thank you for pointing me to driconf :) As you might already know I've finally found a good use for my netbook and started (and now finished) porting all the shaders in gst-gl to work there too. > Now, we're dreaming of making Mesa super smart and watch the uniforms used for > a shader, then spawn a thread to go off and recompile a version where constant > uniforms are turned into constants so we can have that version on hand if you > keep your uniforms constant. That's a loooong way down the road, though. Ok, I've removed "constant" uniforms in some shader but not all yet since I'd like to move all the shaders in dedicated files and bake them with constant parameters at runtime but that will be a long time task as I really don't have time now for not-so-funny things like these :P Thank you again for all the precious information that helped me to better understand how everything works. > The next thing I wanted to check for this patch series was whether the if > statement was helping even for the kernel that was 4/9 0.0s. My bet is that > removing the if statement will be a big win. I've pushed a bunch of changes to make everything work on i915. Removed array constructors, removed if statements, removed atan (which let some subtle IF sneak in) calls, optimized shaders that used too much instructions, avoided extra indirections, and probably more I don't remember. Hope I didn't forget anything or introduce any regression :P Making everything work with a limited instruction subset and limited hardware was indeed great, it made me spot a lot of stupid things that I did when I wrote the shaders and forced me to make everything run faster :) So I think this bug is fixed, please let me know if you find anything wrong or want to teach me anything else :)