GNOME Bugzilla – Bug 695928
consider supporting multi-texturing, or pre-blending for background actor
Last modified: 2014-08-28 20:44:47 UTC
Right now we put all the various components of a background (it's background color/gradient, each visible frame of the slide show show) in their own actor. This gives us a great deal of flexibility in the shell to do transition effects however we want. My assumption was that there would little performance impact from doing it this way because all blending happens on the GPU in the GPU's memory. Owen pointed out, though, there are performance implications, just by virtue of there being so much pixel data, and so much piecewise blending. Given the flexibility isn't ~free like I thought it was, and given we aren't really doing much more than simple blending anyway (and not using the flexibility), it may make sense to put multiple textures on one MetaBackground object and do blending using multi-texturing, or by preblending the result into a new texture.
IRC conversation: <owen> Jasper: do we have an open bug somewhere to replace the two-actor-double-draw for the bg cross-fade with doing it in the shader - doing that would be a a pretty huge memory bandwidth saving for each overview redraw <owen> Jasper: Because you get rid of a read and write per each pixel of the screen. Extra credit is to use cogl_pipeline_set_blend(pipeline, "RGBA = SRC_COLOR") to get rid of *another* read and write for each pixel of the screen <halfline> owen: bg cross-fade in the overview? <owen> halfline: in the overview or normal view, it's probably more critical in the overview since in the normal view we're normally not showing hte background <halfline> are you talking about with slide shows? <owen> halfline: if I understand correctly from yesterday's discussion we're crossfading most of the time, not just at defined periods <halfline> right, slide shows were primarly designed to have long transition periods that span hours <halfline> so your suggestion is, rather than have two actors, have one actor that does multi-texturing to generate the frame ? <owen> halfline: yes. <owen> Think about the reads and writes <owen> Currently what we're doing is a) read undefined stage contents b) read first actor texture c) compute vignette d) blend, write back ot frame buffer e) read frame buffer contents f) read second actor g) compute vignette h) blend i) write back to frame buffer <owen> All we need to do is a) read first actor text b) read second actor text c) blend d) compute vignette) e) write back to frame buffer <halfline> hmm, that really surprises me <owen> So for a 2Mpix display we are doing 12MB per frame of memory traffic instead of 6MB that we could be doing <owen> (One reason why tile-based rendering can be a win is because you can do these repeated read-write cycles in a small amount of fast memory) <halfline> oh we're talking all gpu memory here though <owen> halfline: what about it surprises you? That it would be a performance win? Or that I think it works that way when you code works some different way? :-) <owen> halfline: gpu memory is memory too.... <owen> and it's not *necessarily* very fast memory <halfline> at first i misunderstood you and thought you were saying we were pulling the texture into main memory, doing blending in the cpu and rewriting it out, and that made no sense to me <halfline> so the primary reason i made each part it's own actor is so that we have the flexibility to do whatever kind of transitions / animations we want to do <halfline> you're saying you believe doing it this way has dramatic performance consquences, which I didn't realize <owen> Numbers above are off by a factor of 4 <halfline> there's no bug about the propsosed optimization, but if you think it's going to be a big win, then certainly having the extra flexibility isn't critical <owen> Multiplying by 4-byes per pixel, and 60fps, we're talking about a difference of 2.8GB/sec => 1.4G/sec <halfline> and we could make each background support multiple textures <halfline> does framebuffer compression mitigate this problem at all? <owen> The slowest memory bandwidth you have on vaguely contemporary hardware is about 5GB/sec, so we're not going to make thing 10x faster, but it's definitely significant <owen> halfline: no <mclasen_afk> we don't want to make it faster than 60fps anyway, right ? <Jasper> It's not. <halfline> i'll file a bug <owen> mclasen: yeah, but we'd only have that problem if this was the *only* thing involved in drawing a frame :-) <mclasen> right <Jasper> owen, I asked halfline about this and he said he wanted to leave the opportunity for slideshows open or something. <Jasper> So you could have a new image grow in from the center. <Jasper> I didn't really buy this given his "changing image all throughout the day" thing <owen> halfline: framebuffer compression is basically about scanout - it's a way that once you hae a static image, especially one with big compressibility, the scanout engine of the card doesn't have to suck bandwidth and hence power reading it <halfline> Jasper: no you were asking about backgrounds and their associated backgdrops <halfline> same thing applies though <Jasper> halfline, huh? <halfline> you weren't asking about slides in a slide show <Jasper> I was. <Jasper> I was asking why we had three actors instead of two in a slideshow. <halfline> you were asking about an image on top of its background color <Jasper> We talked past each other then. But we can put it down to one actor if we want. Multitetxuring supports three layers. <halfline> anyway, rationale for separate actors was extra flexibility <halfline> i didn't think it would have that much of a performance impact since it's all on the card <halfline> since it does have a performance impact, and we don't really need the flexibility <halfline> it makes sense to me to drop it <halfline> i'm writing up a bug now
more discussion: <Jasper> halfline, do you want to work on multitexturing, or should I? <tomeu> owen: yeah, so for the sake of keeping the api docs a bit cleaner, are you ok with moving theme.h out from meta/ ? <tomeu> or maybe just don't pass it to gtk-doc? <Jasper> tomeu, does preview-widget still exist? <tomeu> the code is there, don't know if it's used though <Jasper> tomeu, I think you should be able to rip that out <owen> tomeu: I'm fine with that.... hopefully we're moving toward the direction of killing the old theme code in favor of gtk+-based css, so depending on old theming is a poor idea if anybody is doing it <halfline> Jasper: i'm not going to work on it today if that's what you mean <Jasper> halfline, OK <owen> Jasper: I was more thinking about for 3.10 <Jasper> owen, OK <Jasper> owen, I didn't know that when we wanted to blend actors that we pulled them back from the framebuffer. <owen> Jasper: (unless we have a reason to believe that overview performance has regressed in a critical fashion. If only I had actually finished shell-perf.gnome.org to a useful state...) <halfline> Jasper: well it's not getting pulled all the way back. the point is, i think, that something has to read the pixel data to do the blending <halfline> and it's for every redrawn pixel <halfline> and it's for ever actor in the stack <halfline> *every <owen> Jasper: as halfline says, every time you do a blend the gpu has to read back from the framebuffer - it's all pipelined and async for maximum performance, but you don't escape the fundamental memory bandwidth equation <halfline> mclasen proposed a different idea in meat-space btw <halfline> which i mentioned on the bug report <owen> halfline: pre-render would be quite a win for llvmpipe <Jasper> pre-render? <halfline> do the blending up front and stuff the result in a texture <halfline> and just use that one texture <Jasper> So do the blending on the CPU? <owen> Jasper: you could use a fbo, and still do eveything GPU-side <owen> Jasper: it's a straight-forward memory/speed tradeoff <Jasper> owen, redirecting to an FBO would be faster? <Jasper> I thought FBOs were slow. <halfline> you would have to keep a copy of the original textures around to reblend when the opacity changes <owen> Jasper: using a FBO to render a texture once, then using that for subseuent frames <owen> Jasper: it's obviously going to be slower for the maximally-fast cross-fade case <owen> I'm not sure if it's a good idea or not - would be good to have some measurements in place when someone works on this <halfline> we could have a meta_background_flatten(background1, background2) api or so <owen> halfline: I think the prerender is most useful if you put the vignette into it, but then hyou have to figure out what you do with the thumbnail previews <owen> since the vignette is what I expect to be really slow with llvmpipe <halfline> we could leave the thumbnail previews alone, or flatten them too <halfline> seems less important, since it's a lot less pixels <Jasper> owen, well, we want to tweak the vignette parameters as we open the overview <halfline> ah that's a very good point <halfline> the vignette smoothly morphs in right now <halfline> we could keep them separate until the animation is finished <halfline> then flatten them
*** This bug has been marked as a duplicate of bug 735637 ***