After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 781153 - Performance issues when disabling client side decorations with "GTK_CSD=0"
Performance issues when disabling client side decorations with "GTK_CSD=0"
Status: RESOLVED OBSOLETE
Product: gtk+
Classification: Platform
Component: Backend: Win32
3.22.x
Other Windows
: Normal major
: ---
Assigned To: gtk-win32 maintainers
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2017-04-10 21:13 UTC by Eduard Braun
Modified: 2018-05-02 18:22 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Modified version of "fishbowl.ui" that allows disabling CSD (2.78 KB, application/xml)
2017-04-10 21:13 UTC, Eduard Braun
Details

Description Eduard Braun 2017-04-10 21:13:15 UTC
Created attachment 349623 [details]
Modified version of "fishbowl.ui" that allows disabling CSD

When disabling client side decorations via the environment variable "GTK_CSD=0" drawing performance seems significantly reduced.

This issue was initially discovered in MSYS2 builds of Inkscape trunk (see [1]) where disabling client side decorations causes severe performance regressions when moving objects but also has an overall negative impact on UI performance.

The issue can also be reproduced and quantified with the "Fishbowl" benchmark in "gtk3-demo" (one has to use a modified version of "fishbowl.ui" that does remove the titlebar customizations, otherwise CSDs will be forced, see attachment):
- with "GTK_CSD=1": ~950 icons at ~55 fps
- with "GTK_CSD=1": ~150 icons at ~55 fps


[1] http://inkscape.13.x6.nabble.com/slow-dragging-on-Windows-tp4979387.html
Comment 1 Eduard Braun 2017-04-10 21:20:58 UTC
Sorry, typo, the comparison should obviously read:
- with "GTK_CSD=1": ~950 icons at ~55 fps
- with "GTK_CSD=0": ~150 icons at ~55 fps
Comment 2 LRN 2017-04-12 13:38:36 UTC
First question that comes to mind: is this a regression? Was there ever a GTK3 version where GTK_CSD=0 worked faster?
Comment 3 Eduard Braun 2017-05-02 01:08:38 UTC
> First question that comes to mind: is this a regression? Was there ever a GTK3 version where GTK_CSD=0 worked faster?

I don't know. It's obviously something that is extremely hard to test as getting a working gtk stack for Windows is not at all trivial. The last time I tried to build an older gtk3 version with MSYS2 I immediately ran into problems.

Who'd be "the guy" to talk to? This is obviously a critical issue and somebody with some inside knowledge of gtk could obviously do a much better job of at least pointing towards possible issues. I for my part could not even try to guess here...

I'm more than willing to help with debugging / testing patches, but diving into gtk is just too big a task for me to take on right now. :-/
Comment 4 LRN 2017-05-02 03:33:06 UTC
(In reply to Eduard Braun from comment #3)
> Who'd be "the guy" to talk to? 

That would be me, unfortunately.

Try running with the environmental variable GDK_WIN32_LAYERED=0 and GTK_CSD=1 then see if it makes any difference (as opposed to just GTK_CSD=1).
Comment 5 Eduard Braun 2017-05-02 18:07:36 UTC
Here are some results from GTK 3.22.10

GTK_CSD=1, GDK_WIN32_LAYERED=1 => ~950 icons
GTK_CSD=1, GDK_WIN32_LAYERED=0 => ~360 icons

GTK_CSD=0, GDK_WIN32_LAYERED=1 => ~480 icons
GTK_CSD=0, GDK_WIN32_LAYERED=0 => ~430 icons

I could not reproduce the extremely low performance of "GTK_CSD=0" for whatever reason (it certainly was consistent the last time, not only on a single test run), but it's still pretty obvious.

Also please note that "fishbowl" does not seem to be the best tool to benchmark gtk performance, as the variation in icons over time is significant (I'd put an error bar of at least +/-50 items on the values above) with the frame rate going down and up, too (might also be some thermal throttling of my processor in the worst case), so if you now a better way to measure drawing performance please let me know!
Comment 6 Eduard Braun 2017-05-03 02:02:07 UTC
I also was able to compile several version of gtk now. Results vary notably:

GTK_CSD=0

    3.14.15 - 130
    3.16.7  - 170
    3.18.9  - 100
    3.20.10 - 100
    3.22.10 - 100
    
GTK_CSD=1

    3.14.15 - --- (CSD not enabled by environment variable)
    3.16.7  - --- (CSD not enabled by environment variable)
    3.18.9  -  10 (something seems really wrong...)
    3.20.10 - 430
    3.22.10 - 500

* note that the benchmark and it's parameters where different than above (as I had to create a version of fishbowl that actually works on all of the tested versions)

It seems obvious that CSD was only ever usable in 3.20.
Without CSD the only notable change was a slight increase of performance from 3.14 to 3.16 and then a larger decrease from 3.16 to 3.18 from where it seems to have stayed more or less the same.

The one constant is that with GTK_CSD=0 performance is *a lot* worse than with CSD enabled.
Comment 7 Eduard Braun 2017-05-13 15:20:19 UTC
Maybe helpful observation:
The low performance with GTK_CSD=0 might not be caused by a higher demand for resources, but a non-optimal usage thereof.

I just checked the fishbowl benchmark again:
- with GTK_CSD=1 usage of one CPU core maxes at around 95%
- with GTK_CSD=0 CPU usage is below 70%

Maybe the issue is more in scheduling/timings than an actual performance problem?

Could the default v-sync of Windows' DWM cause an issue?
Comment 8 Christoph Reiter (lazka) 2017-10-02 13:03:19 UTC
One difference is due to gdk_win32_window_begin_paint() returning TRUE in case of CSD=0. This results in pixman time doubling in the profiler for fishbowl.
Comment 9 LRN 2017-10-02 14:34:43 UTC
Fascinating. What if you modify gdk_win32_window_begin_paint() to return FALSE for non-GL non-layered windows? Does that affect the performance?
Comment 10 Christoph Reiter (lazka) 2017-10-02 16:01:19 UTC
It improves things a bit: NO-CSD: 600, NO-CSD-no-buffer: 825, CSD: 1100

(but, as far as I see this removes double buffering?)
Comment 11 LRN 2017-10-02 16:28:46 UTC
Returning FALSE prevents GTK from creating (and destroying, once it's not needed anymore) a new surface in each paint cycle.

Layered windows have a dedicated cache surface (which is used instead), because Windows WM does not keep old window contents around for such windows (thus making it impossible to make a partial repaint by painting over old stuff - there's no old stuff to paint on top of). There's special logic in the code that ensures that the cache surface is big enough, and that it's not created/destroyed on each repaint.

It could be that performance boost for CSD windows comes just from this.

And yes, currently returning FALSE there removes double buffering for non-CSD windows (as i described above, double buffering for CSD windows is handled in a special way).
Comment 12 Eduard Braun 2017-10-24 12:53:33 UTC
I could reproduce the performance boost Christoph described which improved things a bit even in Inkscape. Unfortunately it does not really affect the major degradation compared to gtk2.

One thing I noticed: Performance quickly degrades with growing window sizes and (this part might be specific to Inkscape code but maybe it helps to track down the underlying issue in gtk) at some window size drawing basically stops alltogether.
This means the objects on canvas are not redrawn anymore until one stops modifying them (I usually move around a rectangle). While Inkscape has code to delay redrawing if it would take too long (i.e. do not update the full canvas at a high rate while moving a large object across the whole screen) the behavior did not change when I specifically disabled said code.

Is there code in gtk which could block redrawing in certain scenarios? E.g. a delay time between consecutive repaints that could be exceeded?
Comment 13 Eduard Braun 2017-10-24 13:03:53 UTC
Maybe something related to bug 685460?
Comment 14 LRN 2018-03-26 21:17:32 UTC
I've looked into this a bit. Timed the renderer. Here's the pieces of code that had timing info (unless stated, not including the last mentioned line):

gdk_window_begin_paint_internal:
0)
> impl_class = GDK_WINDOW_IMPL_GET_CLASS (window->impl);
> ...
> window->current_paint.region = cairo_region_copy (region);

1)
>  window->current_paint.region = cairo_region_copy (region);
> ...
>  if (needs_surface)

2)
> if (needs_surface)
> ...
> if (!cairo_region_is_empty (window->current_paint.region))

3)
> if (!cairo_region_is_empty (window->current_paint.region))
> into gdk_window_clear_backing_region:
> ...
> cairo_set_operator (cr, CAIRO_OPERATOR_SOURCE);

4)
> cairo_set_operator (cr, CAIRO_OPERATOR_SOURCE);
> ...
> <exit function>

5)
> <exit from gdk_window_begin_paint_internal>
> ...
> <enter gdk_window_end_paint_internal>

gdk_window_end_paint_internal:
5)
> impl_class = GDK_WINDOW_IMPL_GET_CLASS (window->impl);
> ...
> if (window->current_paint.surface_needs_composite)

6)
> if (window->current_paint.surface_needs_composite)
> ...
> gdk_window_free_current_paint (window);

7)
> gdk_window_free_current_paint (window);
> ...
> <exit function>

In other words:
(0) backend begin_paint() implementation
(1) light region copying and one big GL-only section
(2) creation of an intermediate cairo surface (for GL) or just ref of existing backend surface (non-GL)
(3) set-up phase of the gdk_window_clear_backing_region()
(4) background clearing phase of the gdk_window_clear_backing_region()
(5) actual drawing in GTK layer
(6) backend end_paint () implementation (layered windows upload data to DWM here)
(7) composing intermediate cairo surface back onto the vindow (via GL - for GL, via cairo_paint - for non-GL)
(8) some invalidation code, no idea what it does

Here's timing information (for a window maximized on my 4K desktop - i.e. slightly smaller than 4K due to taskbar; non-CSD window is also a bit smaller because it has no headerbar):
(0) always negligible
(1) ~0.03ms for anything not using layered windows (GL or not GL), ~0.10 for layered windows and for non-layered windows with an intermediate surface (so, pretty much negligible)
(2) very fast (0.04ms) for layred windows, almost a no-op for non-layered non-GL windows, 6-8ms for GL windows (reasonably fast)
(3) always negligible
(4) non-layered non-GL - 35ms, layered - 5-10ms, GL - no-op
(5) always the same (around 3-4ms for one floating icon)
(6) layered - 12-13ms, all others - no-op
(7) GL - 20-22ms, non-layered window with an intermediate surface - 33ms, all others - no-op
(8) negligible

"non-layered non-GL" is what you get with GTK_CSD=0 and when begin_paint() returns FALSE for non-layered windows.
performance: 22fps

"non-layered windows with an intermediate surface" is what you get with GTK_CSD=0 and when begin_paint() returns TRUE for non-layered windows.
performance: 20fps

"GL" is what you get with GDK_GL=always (doesn't matter whether you're using CSD or non-CSD windows, the impact is negligible)
performance: 28fps

"layered" is what you get with GTK_CSD=1
performance: 34fps

Notably, drawing into a software cairo surface, turning it into a texture and then painting via GL is *faster* than painting on a window DC directly.
Drawing into a software cairo surface and then painting it (also with cairo - presumably using BitBlt) on a window DC is slightly slower than painting on a window DC directly.
Layered windows are the fastest (all drawing is software, but Window Manager does the compositing internally)

Because of the low performance (<60fps) fishbowl is only drawing 1 icon at all times. I.e. most of the delays are due to slowniness of putting pixels into the window, not due to compexity of the scene (fishbowl repaints itself completely on every frame, so most of the time is spent filling in the background).

My conclusion is that cairo just can't paint (i can't tell whether it's painting or blitting when doing cairo_paint(); there's a condition in side the cairo backend, but i don't know which branch it is taking) stuff onto window DCs fast enough.

I've tried to estimate the time spent on converting cairo image to a texture vs the time spend on rendering that texture by cheating and just commenting out the inner part of gdk_gl_texture_quads() function (starting from g_new() and ending with g_free()). Phase (7) time for GL windows went down from 20-22ms to roughly 13ms (obviously, windows are not drawn because of that...).

Presumably, doing a 100%-GL drawing will, therefore, be much faster. Cairo has an OpenGL backend (AFAIU - for drawing on textures), but i'm not sure whether it will be faster than doing software drawing and then converting.

I don't know what kind of performance GTK2 would demonstrate in such circumstances (there's no fishbowl for gtk2...), but its drawing mechanism is similar to the "non-layered non-GL" case, except that it uses nominally 24-bit cairo surface format (without alpha). It's difficult to approximate, as fishbowl repaints the whole bowl for each frame (normal applications tend to be less wasteful; hexchat (GTK2) is a good example - resizing its window with a channel open is slow at max size, but doing the same with an empty new server window is quite fast). But resizing/moving a very big rectangle in maximized inkscape window seems kind of slow-ish (hard to tell, because it doesn't have double-buffering...).

This seems to match the behaviour of Microsoft applications (Explorer, Paint) that they demonstrate when i make their windows cover almost the whole desktop and then resize these windows a bit. They slow down with the increased size. Firefox, drawing with DirectX (AFAIK), fares much better (even if it has not-exactly-trivial page content to fit at the same time). Windows Settings application (and other "modern" MS applications) seems to be using accelerated drawing of some kind (it has WS_EX_LAYERED child, but that doesn't mean anything - it might be using it as "layered light", with SetLayeredWindowAttributes() call)...or not, can't really tell. But it's somewhat fast-ish even when it has not-very-trivial content.

I blame Microsoft. And maybe cairo developers. But mostly Microsoft.
Comment 15 Eduard Braun 2018-03-26 22:01:46 UTC
Wow, thank you so much for the extensive investigation!

If I understand the numbers correctly it means we're actually spending the most time *deleting* stuff in the non-layered non-GL case (35ms in (4)), which also seems to be the biggest difference between GTK_CSD=1 and GTK_CSD=0? It would inherently limit the refresh rate to below 29 FPS (for doing "nothing")! Any chance of speeding this part up?

The other notable impact then seems to be (7) with another 33 ms in the case of a non-layered window with an intermediate surface. Do I understand correctly that this is the precise speedup we were able to observe previously (comments 8-13)? So basically again a "no-op" compositing/destoying a surface?


I'm not sure if Yale Zhang contacted you in the meantime? He did some investigations regarding Inkscape code lately and also pinned down some performance-critical parts in the GDK code and attempted to patch some things to make it faster. I think your observations are very similar in certain points. For the discussion (it's quite lengthy unfortunately and also discusses a lot of Inkscape code which might not be overly interesting in this context), see [1].

Maybe you could team up to get this figured out? As always, I'm happy to help with testing if possible, but the internals are a bit over my head right now.

[1] http://inkscape.13.x6.nabble.com/slow-sluggish-drawing-with-pencil-amp-calligraphy-tool-solved-td4981276.html#a4981304
Comment 16 LRN 2018-03-27 00:30:51 UTC
(In reply to Eduard Braun from comment #15)
> If I understand the numbers correctly it means we're actually spending the
> most time *deleting* stuff in the non-layered non-GL case (35ms in (4)),
> which also seems to be the biggest difference between GTK_CSD=1 and
> GTK_CSD=0? It would inherently limit the refresh rate to below 29 FPS (for
> doing "nothing")! Any chance of speeding this part up?

I've tried to comment out the code that paints the background there. This led to the time being moved to some other place (phase 5, if i remember correctly).

> The other notable impact then seems to be (7) with another 33 ms in the case
> of a non-layered window with an intermediate surface. Do I understand
> correctly that this is the precise speedup we were able to observe
> previously (comments 8-13)? So basically again a "no-op"
> compositing/destoying a surface?
Well, it's not exactly useless, as this is, basically, double-buffering ( haven't noticed any tearing when intermediate surface is not used, but then fishbowl isn't set up to make it obvious). But - yes, it's technically not needed, since having cairo draw directly on the window DC is slightly faster.

> I'm not sure if Yale Zhang contacted you in the meantime?
He didn't, AFAICS.

> [1]
> http://inkscape.13.x6.nabble.com/slow-sluggish-drawing-with-pencil-amp-
> calligraphy-tool-solved-td4981276.html#a4981304

I haven't read that yet, but "sluggish" suggests a lag, not a fps drop.
Comment 17 Eduard Braun 2018-03-27 08:09:53 UTC
> Well, it's not exactly useless, as this is, basically, double-buffering

Does GTK+ even need to do it's own double buffering here? I seem to remember Windows' DWM does something like that by default - maybe it already takes care of this step for us?


> I haven't read that yet, but "sluggish" suggests a lag, not a fps drop.

Ignore the title, the "sluggish" part referred to a slightly different issue which was cause by motion event compression, yet the thread also contains everything else related to Yale's investigations (I linked the specific message where it gets interesting GTK-wise above)
Comment 18 Christoph Reiter (lazka) 2018-03-27 08:17:12 UTC
(I did some benchmarks with gvsbuild (msvc stack) last week and things were slower there)
Comment 19 GNOME Infrastructure Team 2018-05-02 18:22:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gtk/issues/801.