After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 746468 - gtkcairoblur is slow
gtkcairoblur is slow
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: .General
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2015-03-19 15:38 UTC by Alexander Larsson
Modified: 2015-03-23 08:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Add performance test for gtkcairoblur (2.78 KB, patch)
2015-03-19 15:39 UTC, Alexander Larsson
committed Details | Review
gtkcairoblur: Minor restructure (2.03 KB, patch)
2015-03-19 15:39 UTC, Alexander Larsson
committed Details | Review
gtkcairoblur: Unroll inner loop for common radius values (4.23 KB, patch)
2015-03-19 15:39 UTC, Alexander Larsson
committed Details | Review
Add gtk_run_in_parallel helper (5.67 KB, patch)
2015-03-19 18:02 UTC, Alexander Larsson
none Details | Review
gtkcairoblur: Use gtk_run_in_parallel to compute the blurs (4.31 KB, patch)
2015-03-19 18:02 UTC, Alexander Larsson
none Details | Review
a quick demo (2.28 KB, patch)
2015-03-19 19:13 UTC, Allison Karlitskaya (desrt)
none Details | Review
demo with "chunk size" API (2.65 KB, text/plain)
2015-03-19 19:37 UTC, Allison Karlitskaya (desrt)
  Details
Add gtk_thread_pool_run_parallel helper (6.26 KB, patch)
2015-03-19 20:26 UTC, Alexander Larsson
none Details | Review
gtkcairoblur: Use gtk_thread_pool_run_parallel to compute the blurs (4.31 KB, patch)
2015-03-19 20:26 UTC, Alexander Larsson
none Details | Review
shadow-box: Bail out blur early if radius is 1px (1.19 KB, patch)
2015-03-20 14:02 UTC, Alexander Larsson
committed Details | Review
shadow-box: Blur only horizontally/vertically for the non-corner parts (11.08 KB, patch)
2015-03-20 14:02 UTC, Alexander Larsson
none Details | Review
box-shadow: For top/bottom and left/right parts, repeat a single line (6.32 KB, patch)
2015-03-20 14:02 UTC, Alexander Larsson
none Details | Review
shadow-box: Blur only horizontally/vertically for the non-corner parts (10.97 KB, patch)
2015-03-20 14:10 UTC, Alexander Larsson
committed Details | Review
box-shadow: For top/bottom and left/right parts, repeat a single line (6.32 KB, patch)
2015-03-20 14:10 UTC, Alexander Larsson
committed Details | Review

Description Alexander Larsson 2015-03-19 15:38:36 UTC
blurring keeps showing up in the profiles, lets make it faster.
Comment 1 Alexander Larsson 2015-03-19 15:39:11 UTC
Created attachment 299844 [details] [review]
Add performance test for gtkcairoblur

This just creates a large cairo surface and times bluring it
at various values of radius.
Comment 2 Alexander Larsson 2015-03-19 15:39:18 UTC
Created attachment 299845 [details] [review]
gtkcairoblur: Minor restructure

This just moves get_box_filter_size to the top and makes
it a macro (so it can be used as a constant later).
Comment 3 Alexander Larsson 2015-03-19 15:39:24 UTC
Created attachment 299846 [details] [review]
gtkcairoblur: Unroll inner loop for common radius values

This unrolls the inner blur loop for radius 1-10, allowing
the compiler to use a divide-by-constant operation instead
of a generic division.

Here is the blur-performance output before:

Radius  1: 124.95 msec, 32.01 kpixels/msec:
Radius  2: 117.27 msec, 34.11 kpixels/msec:
Radius  3: 123.57 msec, 32.37 kpixels/msec:
Radius  4: 118.17 msec, 33.85 kpixels/msec:
Radius  5: 119.32 msec, 33.52 kpixels/msec:
Radius  6: 124.17 msec, 32.21 kpixels/msec:
Radius  7: 121.04 msec, 33.05 kpixels/msec:
Radius  8: 130.64 msec, 30.62 kpixels/msec:
Radius  9: 119.47 msec, 33.48 kpixels/msec:
Radius 10: 117.95 msec, 33.91 kpixels/msec:
Radius 11: 122.38 msec, 32.68 kpixels/msec:
Radius 12: 121.92 msec, 32.81 kpixels/msec:
Radius 13: 125.45 msec, 31.89 kpixels/msec:
Radius 14: 121.63 msec, 32.89 kpixels/msec:
Radius 15: 120.18 msec, 33.28 kpixels/msec:

And after:

Radius  1: 42.26 msec, 94.65 kpixels/msec:
Radius  2: 59.15 msec, 67.62 kpixels/msec:
Radius  3: 60.29 msec, 66.35 kpixels/msec:
Radius  4: 64.53 msec, 61.99 kpixels/msec:
Radius  5: 60.07 msec, 66.59 kpixels/msec:
Radius  6: 62.43 msec, 64.07 kpixels/msec:
Radius  7: 60.36 msec, 66.27 kpixels/msec:
Radius  8: 59.59 msec, 67.13 kpixels/msec:
Radius  9: 76.17 msec, 52.51 kpixels/msec:
Radius 10: 79.41 msec, 50.37 kpixels/msec:
Radius 11: 118.92 msec, 33.64 kpixels/msec:
Radius 12: 121.31 msec, 32.97 kpixels/msec:
Radius 13: 118.30 msec, 33.81 kpixels/msec:
Radius 14: 116.82 msec, 34.24 kpixels/msec:
Radius 15: 116.99 msec, 34.19 kpixels/msec:

I.e. almost double performance for the unrolled radius values.
Comment 4 Alexander Larsson 2015-03-19 18:02:04 UTC
Created attachment 299860 [details] [review]
Add gtk_run_in_parallel helper

This is a helper that schedules the same job in N copies
on different threads, where N is limited by the number of CPUs
(as well as user input). The last job is run on the originating
cpu, and when that is done we block until all jobs are done.

The idea here is that you want to split up your job in equally sized
chunks, and run one chunk per processor, to maximize the amount of
parallelism.

The current implementation is pretty simple, using a shared
GThreadPool and busy waiting for joining the jobs. A more complex
implementation using custom threads and cpu affinity might have
less overhead.
Comment 5 Alexander Larsson 2015-03-19 18:02:10 UTC
Created attachment 299861 [details] [review]
gtkcairoblur: Use gtk_run_in_parallel to compute the blurs

This splits the row-blurring parts into N_CPU horizontal
strips and blurs each on a different CPU.

blur-performance before patch:

Radius  1: 37.02 msec, 108.05 kpixels/msec:
Radius  2: 54.15 msec, 73.87 kpixels/msec:
Radius  3: 53.32 msec, 75.02 kpixels/msec:
Radius  4: 55.95 msec, 71.49 kpixels/msec:
Radius  5: 54.13 msec, 73.90 kpixels/msec:
Radius  6: 54.49 msec, 73.41 kpixels/msec:
Radius  7: 53.86 msec, 74.27 kpixels/msec:
Radius  8: 55.00 msec, 72.73 kpixels/msec:
Radius  9: 67.52 msec, 59.25 kpixels/msec:
Radius 10: 71.39 msec, 56.03 kpixels/msec:
Radius 11: 106.47 msec, 37.57 kpixels/msec:
Radius 12: 105.90 msec, 37.77 kpixels/msec:
Radius 13: 105.72 msec, 37.84 kpixels/msec:
Radius 14: 105.46 msec, 37.93 kpixels/msec:
Radius 15: 107.28 msec, 37.29 kpixels/msec:

after patch:

Radius  1: 23.19 msec, 172.46 kpixels/msec:
Radius  2: 34.17 msec, 117.06 kpixels/msec:
Radius  3: 34.14 msec, 117.17 kpixels/msec:
Radius  4: 34.16 msec, 117.11 kpixels/msec:
Radius  5: 33.14 msec, 120.71 kpixels/msec:
Radius  6: 35.54 msec, 112.54 kpixels/msec:
Radius  7: 34.74 msec, 115.15 kpixels/msec:
Radius  8: 34.45 msec, 116.10 kpixels/msec:
Radius  9: 38.43 msec, 104.10 kpixels/msec:
Radius 10: 42.32 msec, 94.52 kpixels/msec:
Radius 11: 59.12 msec, 67.66 kpixels/msec:
Radius 12: 58.30 msec, 68.60 kpixels/msec:
Radius 13: 59.83 msec, 66.86 kpixels/msec:
Radius 14: 57.89 msec, 69.10 kpixels/msec:
Radius 15: 56.31 msec, 71.03 kpixels/msec:
Comment 6 Allison Karlitskaya (desrt) 2015-03-19 19:13:53 UTC
Created attachment 299868 [details] [review]
a quick demo

(after some discussion on IRC) I wouldn't mind adding something vaguely like this to GLib if you think it would help here...
Comment 7 Allison Karlitskaya (desrt) 2015-03-19 19:15:35 UTC
some more background on the IRC discussion:

splitting into fixed-sized chunks is a bad idea due to the possibility that one chunk may finish before another.  even tasks that 'seem' to be the same size can be subject to cache behaviour and contention with other processes in the system.

i think a more dynamic approach to the solution would be better: one that hands out individual rows (or small chunks of rows) to be processed at a time.
Comment 8 Allison Karlitskaya (desrt) 2015-03-19 19:37:23 UTC
Created attachment 299870 [details]
demo with "chunk size" API

here's an updated example that lets the user specify a chunk size.  this makes it easier to handle the case of the last chunk (which may be smaller than the others)

i'm not 100% convinced we want to encode this into the API (and it does add a tiny bit of extra work) but if we are going to have some API to facilitate this, I'm pretty sure this is what it would look like
Comment 9 Alexander Larsson 2015-03-19 20:26:32 UTC
Created attachment 299872 [details] [review]
Add gtk_thread_pool_run_parallel helper

This is a helper that schedules the same job in N copies
on different threads, where N is limited by the number of CPUs
(as well as user input). The last job is run on the originating
cpu, and when that is done we block until all jobs are done.

The idea here is that you want to split up your job in equally sized
chunks, and run one chunk per processor, to maximize the amount of
parallelism.
Comment 10 Alexander Larsson 2015-03-19 20:26:38 UTC
Created attachment 299873 [details] [review]
gtkcairoblur: Use gtk_thread_pool_run_parallel to compute the blurs

This splits the row-blurring parts into chunks of rows (at least 2000
pixels per chunk) and runs them in parallel.

blur-performance before patch:

Radius  1: 37.02 msec, 108.05 kpixels/msec:
Radius  2: 54.15 msec, 73.87 kpixels/msec:
Radius  3: 53.32 msec, 75.02 kpixels/msec:
Radius  4: 55.95 msec, 71.49 kpixels/msec:
Radius  5: 54.13 msec, 73.90 kpixels/msec:
Radius  6: 54.49 msec, 73.41 kpixels/msec:
Radius  7: 53.86 msec, 74.27 kpixels/msec:
Radius  8: 55.00 msec, 72.73 kpixels/msec:
Radius  9: 67.52 msec, 59.25 kpixels/msec:
Radius 10: 71.39 msec, 56.03 kpixels/msec:
Radius 11: 106.47 msec, 37.57 kpixels/msec:
Radius 12: 105.90 msec, 37.77 kpixels/msec:
Radius 13: 105.72 msec, 37.84 kpixels/msec:
Radius 14: 105.46 msec, 37.93 kpixels/msec:
Radius 15: 107.28 msec, 37.29 kpixels/msec:

after patch:

Radius  1: 23.19 msec, 172.46 kpixels/msec:
Radius  2: 34.17 msec, 117.06 kpixels/msec:
Radius  3: 34.14 msec, 117.17 kpixels/msec:
Radius  4: 34.16 msec, 117.11 kpixels/msec:
Radius  5: 33.14 msec, 120.71 kpixels/msec:
Radius  6: 35.54 msec, 112.54 kpixels/msec:
Radius  7: 34.74 msec, 115.15 kpixels/msec:
Radius  8: 34.45 msec, 116.10 kpixels/msec:
Radius  9: 38.43 msec, 104.10 kpixels/msec:
Radius 10: 42.32 msec, 94.52 kpixels/msec:
Radius 11: 59.12 msec, 67.66 kpixels/msec:
Radius 12: 58.30 msec, 68.60 kpixels/msec:
Radius 13: 59.83 msec, 66.86 kpixels/msec:
Radius 14: 57.89 msec, 69.10 kpixels/msec:
Radius 15: 56.31 msec, 71.03 kpixels/msec:
Comment 11 Alexander Larsson 2015-03-19 20:30:45 UTC
New version of the threaded blurring based on slightly modified (and with gtk_ namespace) version of ryans approach.
Comment 12 Alexander Larsson 2015-03-20 14:02:04 UTC
Created attachment 299950 [details] [review]
shadow-box: Bail out blur early if radius is 1px

For radius 1px the current implementation rounds down to a 1 px box
filter which is a no-op. Rather than creating useless shadow masks
in this case we bail out blurring early.

Another alternative would be to make radius 1px round up to a 2 px box
filter, but that would change the rendering of Adwaita which is probably
not a great idea this late in the cycle.
Comment 13 Alexander Larsson 2015-03-20 14:02:09 UTC
Created attachment 299951 [details] [review]
shadow-box: Blur only horizontally/vertically for the non-corner parts

There is no need to e.g. blur in the x-direction for the top part
of a box shadow. Also, there is no need to extend the mask in the
non-blurred direction.
Comment 14 Alexander Larsson 2015-03-20 14:02:15 UTC
Created attachment 299952 [details] [review]
box-shadow: For top/bottom and left/right parts, repeat a single line

Since these part really are the same in all of the x or y direction
and we don't blur in that direction we can just blur one line and
repeat it during drawing.
Comment 15 Alexander Larsson 2015-03-20 14:03:38 UTC
With the last 3 patches and the first ones (not the parallel crack) I get shadow rendering at about 5% cpu time during widget-factory renderer, rather than 35% or so like it was before.
Comment 16 Alexander Larsson 2015-03-20 14:10:17 UTC
Created attachment 299953 [details] [review]
shadow-box: Blur only horizontally/vertically for the non-corner parts

There is no need to e.g. blur in the x-direction for the top part
of a box shadow. Also, there is no need to extend the mask in the
non-blurred direction.
Comment 17 Alexander Larsson 2015-03-20 14:10:22 UTC
Created attachment 299954 [details] [review]
box-shadow: For top/bottom and left/right parts, repeat a single line

Since these part really are the same in all of the x or y direction
and we don't blur in that direction we can just blur one line and
repeat it during drawing.
Comment 18 Alexander Larsson 2015-03-20 14:11:10 UTC
New version that fixed a leftover debug chunk.
Comment 19 Alexander Larsson 2015-03-20 14:32:09 UTC
Marked the threading crack obsolete. Its interesting code, but should not be needed anymore for shadows.
Comment 20 Matthias Clasen 2015-03-21 00:08:23 UTC
I compared screenshots of widget-factory with and without these patches, and found only a single pixel slightly different.
Comment 21 Matthias Clasen 2015-03-21 00:18:32 UTC
Review of attachment 299844 [details] [review]:

sure
Comment 22 Matthias Clasen 2015-03-21 00:19:58 UTC
Review of attachment 299845 [details] [review]:

ok
Comment 23 Matthias Clasen 2015-03-21 00:21:20 UTC
Review of attachment 299950 [details] [review]:

::: gtk/gtkcssshadowvalue.c
@@ +316,3 @@
+
+  /* The code doesn't actually do any blurring for radius 1, as it ends up
+     with box filter size 1 */

Pet peeve of mine: I prefer

/* This style
 * of comments.
 */
Comment 24 Matthias Clasen 2015-03-21 00:28:02 UTC
Review of attachment 299846 [details] [review]:

please fix those cosmetics up while committing

::: gtk/gtkcairoblur.c
@@ +81,3 @@
+        sum += row[i];                                          \
+                                                                \
+    if (i >= offset)                                            \

Indentation hickup here: The if should be lined up with the one above. This is all inside the loop body.

@@ +93,2 @@
+  /* We unroll the values for d for radius 1-10 to avoid
+     a generic divide operation */

Prefer stars lined up here

@@ +105,3 @@
+    case get_box_filter_size(9): BLUR_ROW_KERNEL(get_box_filter_size(9));
+    case get_box_filter_size(10): BLUR_ROW_KERNEL(get_box_filter_size(10));
+    default: BLUR_ROW_KERNEL(d);

And spaces before ( here
Comment 25 Matthias Clasen 2015-03-21 00:34:06 UTC
Review of attachment 299953 [details] [review]:

Looks good

::: gtk/gtkcairoblur.c
@@ +183,1 @@
 {

Should line up the parameters here
Comment 26 Matthias Clasen 2015-03-21 00:34:06 UTC
Review of attachment 299953 [details] [review]:

Looks good

::: gtk/gtkcairoblur.c
@@ +183,1 @@
 {

Should line up the parameters here
Comment 27 Matthias Clasen 2015-03-21 00:36:43 UTC
Review of attachment 299950 [details] [review]:

I notice that _gtk_surface_blur also has a check for radius == 0, which could be similarly changed to <= 1
Comment 28 Matthias Clasen 2015-03-21 00:44:28 UTC
Review of attachment 299954 [details] [review]:

::: gtk/gtkcssshadowvalue.c
@@ +330,3 @@
                                     gboolean           blur_x,
+                                    gboolean           blur_y,
+                                    gboolean           repeat)

Three booleans in a row, the api is starting to get a bit ugly. Should we make that flags, just for readability ?

gtk_css_shadow_value_start_drawing (shadow, cr, BLUR_X|REPEAT)

is a lot more understandable than

gtk_css_shadow_value_start_drawing (shadow, cr, TRUE, FALSE, TRUE)
Comment 29 Alexander Larsson 2015-03-21 20:50:01 UTC
Attachment 299844 [details] pushed as 9ba185b - Add performance test for gtkcairoblur
Attachment 299845 [details] pushed as ae21e08 - gtkcairoblur: Minor restructure
Attachment 299846 [details] pushed as d0dc1f5 - gtkcairoblur: Unroll inner loop for common radius values
Attachment 299950 [details] pushed as 9c2a16f - shadow-box: Bail out blur early if radius is 1px
Attachment 299953 [details] pushed as 967cb56 - shadow-box: Blur only horizontally/vertically for the non-corner parts
Attachment 299954 [details] pushed as 8e03262 - box-shadow: For top/bottom and left/right parts, repeat a single line
Comment 30 Alexander Larsson 2015-03-21 20:52:11 UTC
I fixed this according to the comments and pushed to master, but we probably want them in 3-16 too. Question is, do we merge them now or wait until 3.16.1?
Comment 31 Ting-Wei Lan 2015-03-22 16:29:13 UTC
Attachment 299846 [details] fails to compile with clang 3.4:

gtkcairoblur.c:97:10: error: expression is not an integer constant expression
    case get_box_filter_size (2): BLUR_ROW_KERNEL (get_box_filter_size (2));
         ^~~~~~~~~~~~~~~~~~~~~~~
gtkcairoblur.c:39:44: note: expanded from macro 'get_box_filter_size'
#define get_box_filter_size(radius) ((int)(GAUSSIAN_SCALE_FACTOR * radius))
                                    ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gtkcairoblur.c:37:39: note: expanded from macro 'GAUSSIAN_SCALE_FACTOR'
#define GAUSSIAN_SCALE_FACTOR ((3.0 * sqrt(2 * G_PI) / 4))
                                      ^
Comment 32 Matthias Clasen 2015-03-22 17:10:10 UTC
(In reply to Alexander Larsson from comment #30)
> I fixed this according to the comments and pushed to master, but we probably
> want them in 3-16 too. Question is, do we merge them now or wait until
> 3.16.1?

I'll cherry pick them after .0 is released.
Comment 33 Matthias Clasen 2015-03-22 17:35:16 UTC
(In reply to Ting-Wei Lan from comment #31)
> Attachment 299846 [details] fails to compile with clang 3.4:
> 
> gtkcairoblur.c:97:10: error: expression is not an integer constant expression
>     case get_box_filter_size (2): BLUR_ROW_KERNEL (get_box_filter_size (2));
>          ^~~~~~~~~~~~~~~~~~~~~~~
> gtkcairoblur.c:39:44: note: expanded from macro 'get_box_filter_size'
> #define get_box_filter_size(radius) ((int)(GAUSSIAN_SCALE_FACTOR * radius))
>                                     ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> gtkcairoblur.c:37:39: note: expanded from macro 'GAUSSIAN_SCALE_FACTOR'
> #define GAUSSIAN_SCALE_FACTOR ((3.0 * sqrt(2 * G_PI) / 4))

It built fine with clang 3.5 here
Comment 34 Matthias Clasen 2015-03-22 17:36:18 UTC
(In reply to Matthias Clasen from comment #33)

> It built fine with clang 3.5 here

Let me take that back, I managed to mangle my CC= line
Comment 35 Matthias Clasen 2015-03-22 17:50:18 UTC
Pushed a fix
Comment 36 Alexander Larsson 2015-03-23 08:00:11 UTC
What is we use a define for "sqrt(2 * G_PI)" in the get_box_filter_size instead. Then it may work as a constant for clang?

I guess it doesn't really matter much though.