Bug 694917 – Drawing much slower than GIMP 2.8

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 694917 - Drawing much slower than GIMP 2.8


Summary:	Drawing much slower than GIMP 2.8


Status:	RESOLVED OBSOLETE

Product:	GIMP
Classification:	Other
Component:	General
Version:	git master
Hardware:	Other All

Importance:	Normal blocker
Target Milestone:	2.10
Assigned To:	GIMP Bugs
QA Contact:	GIMP Bugs

URL:
Whiteboard:

Duplicates:	782751 794301 794503 795017 (view as bug list)
Depends on:
Blocks:	141797 783108

Reported:	2013-03-01 07:36 UTC by Sodhi, Cedric
Modified:	2018-05-24 13:36 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Code to read the Windows performance counter (3.53 KB, text/plain) 2018-01-03 15:33 UTC, programmer_ceds		Details
Functions to read the Windows Performance Counter (3.81 KB, patch) 2018-01-03 18:04 UTC, programmer_ceds	needs-work	Details \| Review
attempt to improve painting with colored brush (12.42 KB, patch) 2018-01-16 19:49 UTC, Massimo	none	Details \| Review
attempt to improve painting with color brush (8.63 KB, patch) 2018-03-22 11:14 UTC, Massimo	committed	Details \| Review
the script-fu script (2.34 KB, text/plain) 2018-03-22 11:16 UTC, Massimo		Details
attempt to improve painting with small brushes, big images (1.29 KB, patch) 2018-03-29 15:04 UTC, Massimo	committed	Details \| Review
Brush 512 px used on my qualitative testing (256.04 KB, image/x-gimp-gbr) 2018-03-31 12:42 UTC, jose americo gobbo		Details
Screenshot of perf result (150.50 KB, image/png) 2018-03-31 13:44 UTC, Jehan		Details
app: do not copy needlessly paint_mask GimpTempBuf. (3.30 KB, patch) 2018-03-31 14:41 UTC, Jehan	committed	Details \| Review
cache_lookup() called from various places. (416.35 KB, image/png) 2018-03-31 15:20 UTC, Jehan		Details
Plug-in to test stroke time (750 bytes, text/x-python) 2018-04-01 16:56 UTC, Jehan		Details
4000x5000 canvas with a big active vectors (81.61 KB, image/x-xcf) 2018-04-01 17:02 UTC, Jehan		Details
4000x5000 canvas with a big active vectors (81.61 KB, image/x-xcf) 2018-04-01 17:06 UTC, Jehan		Details
Script to run the same test in console mode. (415 bytes, text/x-scheme) 2018-04-01 17:29 UTC, Jehan		Details
Graph Comparison for 7000 px brushstroke (25.79 KB, image/png) 2018-04-03 01:01 UTC, jose americo gobbo		Details
Perf output with bigger canvas and bigger brush (141.34 KB, image/png) 2018-04-03 22:56 UTC, Jehan		Details
Profile on a layer with alpha. (326.31 KB, application/pdf) 2018-04-03 23:38 UTC, Jehan		Details
Artefacts on painting with new parallelized code. (317.81 KB, image/png) 2018-04-04 23:59 UTC, Jehan		Details
Speed Performance last update GIMP 2.10 rc1 (commit 8e7a34297fe) (15.44 KB, application/vnd.oasis.opendocument.spreadsheet) 2018-04-08 17:45 UTC, jose americo gobbo		Details
Graph Speed Performance 2.10 rc1 (commit 8e7a34297fe) (46.43 KB, image/png) 2018-04-08 17:46 UTC, jose americo gobbo		Details
Spreadsheet of last test revised... adding some notes (commit 8e7a34297fe) (15.34 KB, application/vnd.oasis.opendocument.spreadsheet) 2018-04-08 23:28 UTC, jose americo gobbo		Details
Default x Legacy with 32f and 8i (99.68 KB, image/png) 2018-04-09 18:50 UTC, jose americo gobbo		Details
Spreadsheet of last test revised GIMP commit f77edcdf193e (24.61 KB, application/vnd.oasis.opendocument.spreadsheet) 2018-04-09 19:06 UTC, jose americo gobbo		Details

Description Sodhi, Cedric 2013-03-01 07:36:24 UTC

Brushes which have reasonable performance in 2.8.4 are rendered very slowly in git in any mode (8bpc to 32bpc).

With a spacing of 1px, fuzzy, rectangular brush, 25px in diameter, no dynamics, dragging out a line of 200 px length takes 3 to 4 seconds but happens instantaenously on 2.8.4.

Adding dynamics or using a bigger brush is out of the question. Waiting times with normal production-like settings go up to a minute for even the shortest lines.

Comment 1 Michael Natterer 2013-03-01 08:35:39 UTC

Good that you mention it, we haven't noticed.

Comment 2 Michael Schumacher 2013-03-01 12:44:48 UTC

I guess we don't need a bug about painting slowness yet , at least not at this stage in the development of 2.9?

Comment 3 Sodhi, Cedric 2013-03-01 13:33:56 UTC

I did not know that this was commonly known. I was surprised by the performance after I was told to use master for 32bpc.

Comment 4 Alexia Death 2013-03-26 16:30:32 UTC

Setting env variable GEGL_SWAP=ram helps somewhat, but not entierly. And yes this is very much a known development version bug.

Comment 5 Michael Schumacher 2015-06-06 16:56:25 UTC

We should check this for the current versions.

Comment 6 Michael Natterer 2017-01-03 14:44:18 UTC

There have been lots of optimizations in the meantime. Anyone willing
to do a quick test? If it's not considerably slower we should close
this as FIXED.

Comment 7 Joao S. O. Bueno 2017-01-03 14:49:11 UTC

I'built it on Saturday -- 
it felt ... "incredibly slow".  Sorry for the news.  

Maybe there is some optimization I am not turning on, but I don't think this bug can be closed.

Comment 8 Michael Natterer 2017-01-03 15:19:40 UTC

Much slower than the same brush with the same settings in 2.8?

Comment 9 Elle Stone 2017-01-03 18:17:25 UTC

On my own system (32gb ram and a relatively recent, fast processor), there doesn't seem to be any lag with a 25px brush, no matter how complicated the brush or the dynamics and regardless of which version of GIMP that I use. 

But with a larger brush (say around 180px) the lag is very noticeable on 2.9, (but not on 2.8), and even more noticeable if the brush is larger than 256px. And a complicated dynamics makes the problem worse. Whereas on recent versions of Krita (such as 3.1.1) the brush has to be really, really large to see a lag.

People are starting to work with larger canvases, which means using larger brushes, especially for things like creating soft background washes or making gradual fade-out masking when editing a photograph. With GIMP-2.9, I have to size the brushes and hence the canvas to what GIMP can handle without too much lag.

There was a time when Krita brushes were considerably slower than they are now, and I know there was a huge push towards making painting faster. It might be worth looking at the Krita code.

Comment 10 Jehan 2017-03-21 14:55:10 UTC

This is much better than it used to be and usable on common sizes for screen (even the very big UHD ones), but you can still see a lag difference.

I just tested on my computer, both 2.8 and master. I opened a 4K UHD canvas (3840x2160) with a default 20px brush. Just crossing the canvas quickly with the mouse: on master, you can perceive a (small) lag of the actual drawing appearing after the mouse on the canvas (whereas 2.8 feels more instant, the drawing follows the mouse better).

Testing higher resolution, the 8K UHD (7680x4320), the lag is even more noticeable, with the drawing really late behind a fast mouse (whereas on GIMP 2.8, it still feels instant).

So yes, the difference is still quite noticeable, even though there was a lot of progress.

On our use case, we work a lot for the screen, so lower resolutions/dimensions are OK, but when working for print purpose, you could much easily go into very high dimensions. So we should work on improving this.

Comment 11 Jehan 2017-05-10 20:38:55 UTC

Additional note: I believe this is not a blocker for 2.10 though. That's not horrible slowness as we demonstrate daily. Improving it before 2.10 would be nice though if you have an idea what to do, but not if that were to push the release for months.

Comment 12 Jehan 2017-12-04 23:08:21 UTC

Setting to blocker because the difference between painting in 2.8 and 2.10 is still quite huge. So that's quite a regression. Therefore if someone could have a look into this, it would be very good.

Yet if that is the last blocker standing before a release, the "block" could be relaxed. It does not make it impossible to paint.
Also finding what to speed up and how will not be an easy task.

Comment 13 programmer_ceds 2018-01-03 15:32:12 UTC

One thing that might be worth considering is the mix of gfloat and gdouble declarations throughout GIMP (and possibly GEGL and BABL - which I haven't checked).

For example the first of the following sections of code takes approx 860000 Windows performance counts whereas the second takes around 470000 performance counts:

gfloat f1 = 1.0;
gfloat f2 = 1.0;

for (i = 0; i < 100000000; i++)
{
 f2 += f1;
 f1 += 0.1;
}

gdouble d1 = 1.0;
gdouble d2 = 1.0;

for (i = 0; i < 100000000; i++)
{
 d2 += d1;
 d1 += 0.1;
}

To start with I thought that this suggested that 32-bit FP number handling under 64-bit Windows 7 was slower but the problem was actually the lack of a typecast in the first example.

Changing the first example to use f1 += (gfloat)0.1; makes the time for the first loop the same as for the second as the 0.1 was forced to be a float rather than the double assumed by gcc by default. This removed the instruction required to convert the (assumed) double to a float.

100 million was chosen for the loop in order to get some idea of how long the floating point operations would take for a brush with a size of 1000 pixels drawn 100 times - whilst this brush size is extreme and the drawing is glacially slow it should make any improvements to the speed more apparent. I also set the spacing to 1.

I have attached the code for a function that reads the Windows performance counter into a 64-bit variable and code for a second function that gives an elapsed count from the original count to the current time. These values are then displayed using gimp_tool_message() when the mouse button is released. (Unless I am mistaken the current timing facilities in GIMP only resolve to the millisecond which is inadequate for these purposes). Not being familiar with Linux I don't know if there is a similar API call.

The performance counter readings can be converted to a actual times but for the current purposes relative values are sufficient. Obviously the counts will be affected by any other programs that are running but repeating the tests gives an idea of where the time is being taken.

Anyway my point is that if there are any loops that are repeated many times and mix gfloat and gdouble variables there will be a time penalty involved in converting between the two formats.

Obviously using 64-bit floating point variables throughout would double the size of the working buffers but I am not sure why GIMP (and GEGL and BABL?) couldn't standardise on 32-bit floats throughout - do you really need the extra precision when specifying opacity for example? Maybe there are some places where the extra precision is required but perhaps this should be handled carefully.

I have spent some time looking through the code attempting to determine where the time is going but as yet haven't had that eureka moment. I have found that do_layer_blend() is taking about 44% of the total time for drawing a line but am ashamed to admit that I can't determine which function is being called by layer_mode.function but did find that this itself was taking about 40% of the time taken by do_layer_blend().

Comment 14 programmer_ceds 2018-01-03 15:33:53 UTC

Created attachment 366246 [details]
Code to read the Windows performance counter

Functions to read the Windows performance counter so allowing timing with a much higher resolution than the existing GIMP timing functions

Comment 15 Jehan 2018-01-03 16:10:02 UTC

> Functions to read the Windows performance counter so allowing timing with a much higher resolution than the existing GIMP timing functions

About your attachment, I understand this is alternative (better) implementation of GIMP_TIMER_START/END on Win32. I'd appreciate a proper git-formatted patch (the part on gimp-utils.*) and I would just push it.

> I have spent some time looking through the code attempting to determine where the time is going but as yet haven't had that eureka moment.

Ok. If you find anything, we'd appreciate patches as well.
The performance of painting tools is still a terrible regression compared to 2.8.

Comment 16 programmer_ceds 2018-01-03 18:04:15 UTC

Created attachment 366258 [details] [review]
Functions to read the Windows Performance Counter

Functions to read the Windows Performance Counter to allow higher resolution timing than is possible with the current GIMP timing functions.

Comment 17 programmer_ceds 2018-01-03 18:07:56 UTC

OK, hopefully the patch file is in the correct format.

I would still be interested to know how to determine the function called in do_layer_blend() - I am editing a single layer with the mode set to Normal and the paintbrush mode set to Normal (legacy)

Comment 18 Jehan 2018-01-03 18:41:19 UTC

Review of attachment 366258 [details] [review]:

Sorry, I think I wasn't clear.
We can't have conditional public functions like here working only on one OS (well we theoretically can, but I would not allow this as this would make a mess!). But we can have a single function signature with different implementations depending on OS. :-)

What I wanted is to have alternative implementations for GIMP_TIMER_START()/GIMP_TIMER_END().
Now I see you have different signatures, like you return a timer with your start function (why do you call it now() by the way? I find it misleading. start() would be better), but I don't mind that we change the signature of gimp_timer_start() and that we make them real (more complex) functions with different signatures.

This being said, when I time code this way, I usually use g_get_monotonic_time(), which is supposed to get much more accurate time by using the system clock. I indeed see we don't use it in our macro. Have you checked the implementation to see if it doesn't do what you need already?

> I would still be interested to know how to determine the function called in do_layer_blend()

I can't really help you unless I look more closely on this code. I'm hoping you will do this. ;-)

Comment 19 Jehan 2018-01-03 18:47:33 UTC

Actually it looks like glib code uses also QueryPerformanceCounter() though the call is different (they don't do this GetProcAddress() call. Does that change anything?).

https://git.gnome.org/browse/glib/tree/glib/gmain.c#n2746

If that is similar, I'd suggest our code uses g_get_monotonic_time().
Actually that's also always what I used until now and didn't even know we had these 2 GIMP_TIMER_* macros. :-)

Comment 20 programmer_ceds 2018-01-04 18:06:16 UTC

I hadn't spotted that g_get_monotonic_time() existed. The main difference between a call of this function and the one that I was using is that g_get_monotonic_time(0 converts the time to usec - on my system the performance frequency call returns a value of 2929755(counts per second). This gives a value of 0.341 which is used to multiply the performance count in order to convert it to usec - thereby reducing the resolution by a factor of approximately 3 - probably this doesn't matter given the numbers that we are talking about. So ignore my functions and carry on as before.

Comment 21 Jehan 2018-01-05 01:10:29 UTC

Also for the records, I just checked the implementation of `g_timer_*()` because I was wondering if we should update GIMP_TIMER_* by using g_get_monotonic_time(). Well it turns out GTimer already uses this internally.

So in the end, our macros already use the proper Win32 implementation (QueryPerformanceCounter() so, it would seem).

But now let's get back to accelerating the painting in GIMP!

Comment 22 programmer_ceds 2018-01-12 17:57:40 UTC

I'd like to get back to looking at this but since I pulled the latest sources from git I can't build GIMP. I get stuck with poppler-glib and poppler-data as noted on the GIMP users site - text posted there shown below.

What am I doing wrong please?



"I am trying to build the version of GIMP that requires poppler-glib.

I have tried cloning from git

git clone https://github.com/danigm/poppler.git

I have then changed to the poppler folder and run ./autogen.sh --prefix=$PREFIX

then ./configure --prefix=$PREFIX

When I run make I get errors of the form that cairo_pattern_mesh_set_control_point is undefined and a suggestion that cairo_mesh_pattern_set_control_point be used instead.

Any suggestions as to what I am doing wrong?

Also any information on satisfying the poppler-data requirement.

I have sorted the other recent requirements for an updated glib and mypaint-brushes but am struggling here."

Comment 23 Jehan 2018-01-12 18:38:34 UTC

I have answered on the gimp-user mailing list. Also you may want to rather ask these questions on the gimp-developer list.

Also please next time, do not post this kind of unrelated comment on a bug report, this makes them very hard to understand in the future (this one has already more than 20 comments!). Ask on the dev mailing list, or on IRC. On last resort, you may even ask me by email directly (I'd really prefer you to NOT do this and try to find your answer on mailing list and IRC, but that's still better than polluting a bug report :P).
I hope you understand. :-)
Thanks!

Comment 24 Massimo 2018-01-16 19:49:55 UTC

Created attachment 366901 [details] [review]
attempt to improve painting with colored brush

Played with perf:

Test case is R'G'B' u8 (2000x2000)
paint brush
Pepper brush, size 200, spacing 1, dynamics Pressure Opacity

The attached patch seems to improve

Comment 25 Jehan 2018-01-17 12:29:53 UTC

Thanks Massimo!

Doing quick tests, I could still reproduce brush lags on a 4K image, but I guess the improvement is simply subtler than that.
I don't have time to review the code just now because I need to head out. I just noticed the "foo" commit message. Is it to say that you don't consider this pushable, and this is just a POC?

Comment 26 Massimo 2018-01-17 13:36:32 UTC

(In reply to Jehan from comment #25)
> Thanks Massimo!
> 
> Doing quick tests, I could still reproduce brush lags on a 4K image, but I
> guess the improvement is simply subtler than that.

Obviously the comparison is with 2.8. 
Does 2.8 have a lag on the same 4k image?

You can always set up a brush/image big enough to see a lag.

> I don't have time to review the code just now because I need to head out. I
> just noticed the "foo" commit message. Is it to say that you don't consider
> this pushable, and this is just a POC?

Yes it is a WIP. 
To be less subjective I added code to time the stroke of a path with
emulate dynamics and the paintbrush paint method and stroking a
long path (zigzagging over the image) here goes from 10.x s to 5.x s.

Comment 27 Jehan 2018-01-17 21:14:34 UTC

> Does 2.8 have a lag on the same 4k image?

No it doesn't, which was my main issue.

Of course, I know I would see a lag at some point, but I chose 4K exactly because *on my computer at least* (on another computer, depending on its material configuration, it could be different of course), GIMP 2.8 shows no lag whereas master shows crazy lag. My steps were quite simple: just draw randomly and quickly like crazy for 30 seconds (basically draw circles continuously without ever raising the pen) on a 4K canvas and release: in GIMP 2.8, the strokes are following the pen; in master, strokes are lagging behind more and more, and after 30 seconds, when I raise the pen, I just the stroke being drawn for a few more seconds before it reaches the last pen position.

> To be less subjective I added code to time the stroke of a path with
> emulate dynamics and the paintbrush paint method and stroking a
> long path (zigzagging over the image) here goes from 10.x s to 5.x s.

Is this code fully automatized? Could we make it into a unit test?
Of course, that would be a bit a problem since once again, the actual time taken for the stroke would also depends on the hardware. But maybe we could still set a minimum time taken by the stroke or something, so that we could easily detect if something has gone bad in painting?

Or maybe, even better, we could just have a separate script (not one of the unit tests run by `make check`) and we would run it on a server (always the same) daily/weekly/whateverly on newer code to detect any issue brought by new code (or at the opposite an improvement).

Comment 28 Michael Natterer 2018-01-18 00:23:47 UTC

If it depends on the length of the continuous stroke (in distance, or
time, or whatever is the slowing factor), maybe we should focus on
that too.

Massimo's patch clearly improves painting and scaling individual dabs,
but should be completely unrelated to at what time/distance/whatever
in the stroke the function is called.

We *also* need to look for something that becomes more expensive the
longer the stroke is, perhaps a list that is traversed all the time
or whatever becomes more expensive over time/distance/whatever.

Maybe profiling allocations during a stroke is a good approach?

Comment 29 Massimo 2018-01-18 10:32:56 UTC

(In reply to Jehan from comment #27)
> > Does 2.8 have a lag on the same 4k image?
> 
> No it doesn't, which was my main issue.
> 
> Of course, I know I would see a lag at some point, but I chose 4K exactly
> because *on my computer at least* (on another computer, depending on its
> material configuration, it could be different of course), GIMP 2.8 shows no
> lag whereas master shows crazy lag. My steps were quite simple: just draw
> randomly and quickly like crazy for 30 seconds (basically draw circles
> continuously without ever raising the pen) on a 4K canvas and release: in
> GIMP 2.8, the strokes are following the pen; in master, strokes are lagging
> behind more and more, and after 30 seconds, when I raise the pen, I just the
> stroke being drawn for a few more seconds before it reaches the last pen
> position.
> 

I'm on X11 and have color management enabled, rulers shown and SWM, 
I am using an external monitor, the laptop monitor is off.

I must say that I see a lag in 2.8 like in 2.9 if not greater.

So probably I'm not the one to evaluate the relative performance
of 2.8 vs 2.9. I cannot count how many seconds it takes to end painting
after 30 seconds of random circling.

> 
> Is this code fully automatized? Could we make it into a unit test?
> Of course, that would be a bit a problem since once again, the actual time
> taken for the stroke would also depends on the hardware. But maybe we could
> still set a minimum time taken by the stroke or something, so that we could
> easily detect if something has gone bad in painting?
> 

Probably I was not clear, I added a pair of GIMP_TIMER_START/END
around gimp_paint_core_stroke_vectors in app/vectors/gimpvectors.c
and saved an xcf with a long path.

Then I built GIMP with/out the patch and compared the time it takes
to

Edit->Stroke Path (Stroke with a paint tool (Paintbrush), and Emulate
brush dynamics)


After having saved tool options (pepper brush, size 200, spacing 1))
it basically is

<Ctrl>1 <Alt>E K <Enter> <Alt>S wait <Ctrl>Q D


And even for this test, compared to 2.8 a patched 2.9 is faster.

Comment 30 Jehan 2018-01-18 13:52:27 UTC

> I must say that I see a lag in 2.8 like in 2.9 if not greater.
> 
> So probably I'm not the one to evaluate the relative performance
> of 2.8 vs 2.9. I cannot count how many seconds it takes to end painting
> after 30 seconds of random circling.

Didn't you say earlier that a same stroke went from taking ~ 10 sec to draw to 5 sec to draw after your patch?
That looks like pretty perceptible lag to me. ;-)

To be clear, my steps earlier were mainly *before* your patch. I made very quick tests with your patch in, but really couldn't make *real* time until now to sit down for an hour to really make a deeper comparison. Things are a bit hectic here and I may not make this time for a few days (but I'll try!).
So don't take my previous statement at high value. I may have said shit (I should not make comments after quick tests! New lesson of the day!).

In any case, you are likely the one to improve things since you already went further than I (I did try to profile GIMP, time various functions, test various code changes, but I was quite unsuccessful)!

> And even for this test, compared to 2.8 a patched 2.9 is faster.

At least, before your patch, 2.9 was slower than 2.8, right? We all agree on this part at least? :-)

If so, my quick test was most likely shit. I'll find the time to test better.

Comment 31 Elle Stone 2018-01-18 15:07:27 UTC

(In reply to Massimo from comment #29)
> After having saved tool options (pepper brush, size 200, spacing 1))

GIMP 2.9 painting slows down for me when I'm using a brush size larger than 256px, especially when combined with a small spacing. I don't mean painting slows down just a little, I mean it slows down to the point where I try to avoid using a large brush, in other words, I tailor "what I want to paint" to the need to avoid large brushes.

Not all brushes get super-slow when the size is >256px. It seems to depend on the complexity of the stain and also on the dynamics - some dynamics make the brush faster than disabling all the dynamics options ("Dynamics Off") and some make the brush slower.

I'm not sure if the default GIMP brushes have suitably large brushes for showing just how slow painting can get when the brush size is larger than 256px. I'm not even talking about super-large brushes, rather brushes around 512px.

Setting GIMP 2.8 to match GIMP 2.9 in terms of "Environment" and available brushes, using Color Management, SWM, etc, and picking a large brush (close to 512px) with a fair amount of detail, using "Dynamics Off", and drawing large circles and rectangles on a 4000x4000px image, 2.9 is slower than 2.8. I don't have a timer, so I don't know how much slower.

Comment 32 Ell 2018-01-19 15:06:14 UTC

FWIW, the dashboard can measure CPU activity time now, which can be used to get rough numbers for how long certain things take.

Comment 33 Jehan 2018-01-23 15:00:35 UTC

Massimo > sorry I haven't had the time to review your patch, but I will do. That's still high on my priority list. Don't hesitate to continue working on it though and propose an updated version if you said it was WIP.

Comments from Mitch's comment 28 can be useful too.

In any case, I pretty much trust what you will come up with. I have been stuck on this one forever.

Comment 34 Jehan 2018-01-23 15:03:23 UTC

P.S.: and as you said, 2.8 is the base reference, so if we could at least be as fast, that's good already to not make it a blocker.

Of course, if we manage to improve it to be faster (maybe after 2.10 if not now), that would be even better. ;-)
After all, the more we go, the bigger are the images worked on commonly. So it only makes sense to continuously improve painting.

Comment 35 Massimo 2018-01-23 17:32:56 UTC

(In reply to Jehan from comment #33)
> Massimo > sorry I haven't had the time to review your patch, but I will do.
> That's still high on my priority list. Don't hesitate to continue working on
> it though and propose an updated version if you said it was WIP.
> 

I said that it was a WIP because my initial intention was to
avoid the modulus operation in this loop:

https://git.gnome.org/browse/gimp/tree/app/paint/gimpbrushcore.c#n1718

and reading the code it seemed it was looping like to copy a pattern,
but in practice 'width' seems to be at most 2 pixels wider than 
'pixmap_width', so I don't know if it is  useful to call babl_process
and then copy its output or let babl_process convert the whole line.

Anyway I lost interest in trying to optimize this code, the expectations
are too high.

Comment 36 jose americo gobbo 2018-03-10 13:25:01 UTC

Only to refresh a bit the comparison between 2.8.22 and 2.9.9 commit 8d9580fd2b.
My test was made on A3 format 300 ppi in both cases.
I have made a stroke line using Shift key along with the major side.
On 2.8.22 was necessary 0,5 seconds circa instead on 2.9.9 circa 2 seconds.
The relation is almost 4 times faster on 2.8.22 than 2.9.9, perhaps, with a machine with more RAM or more processors the time could be shorter, but the relation between the two versions not necessarily would be changed.
In 2.9.9 we have new templates until A0 300 ppi, so, perhaps is interesting begin to talk about which computer specifications are necessary to use an A0 format with a good performance to painting tasks.

Comment 37 Jehan 2018-03-20 16:27:47 UTC

Americo > have you tried the same comparison before and after applying Massimo's patch? If you could try, this would be useful. Thanks.

Comment 38 Massimo 2018-03-22 11:14:28 UTC

Created attachment 370003 [details] [review]
attempt to improve painting with color brush

I modified and split the previous patch in 3 commits attached to this comment as a patchset.
So they can be reviewed/reverted/etc separately and I'm going to attach to the next
comment a script-fu useful to show the effect of each change (they affect
mostly or only painting/stroking with color brushes).

On my system it goes from nearly 9 seconds to more then 4.67 seconds,
whereas gimp-console-2.8 takes nearly 9 seconds, YMMV.


Following is the copy paste from the terminal of the commands
used, starting in the gimp source tree with master checked out,
built and installed, to apply the patch on a new git branch,
build each new commit, run the script, compute the hash of the output png and
finally get back to the 'master' git branch, delete the 'test' git branch
and run the same script with gimp-console-2.8 (where I have installed it).


[massimo@pluto gimp (master)]$ git checkout -b test origin/master
Branch test set up to track remote branch master from origin.
Switched to a new branch 'test'
[massimo@pluto gimp (test)]$ git am test-stroke.patch
Applying: foo-fish
Applying: foo-area-format
Applying: foo-modulus
[massimo@pluto gimp (test)]$ for j in {3..0..-1} ; do LC_ALL=C git checkout test~$j |& grep \ is && make -Capp >&/dev/null && LC_ALL=C app/gimp-console-2.9 -b '(load "test-stroke.scm")' && md5sum tmp-2.9.9.png; done
HEAD is now at 829d44d900... app: make "color" parameter of gimp_palettes_add_color_history() const
This is a development version of GIMP.  Debug messages may appear here.

2.9.9
gimp-edit-stroke-vectors: ~ 9.258006096s
Exported tmp-2.9.9.png
plug-in 'script-fu' aborted before sending its procedure return values
746488e2c628f7fa3b5f6ce349474efc  tmp-2.9.9.png
HEAD is now at 5b09ef0a34... foo-fish
This is a development version of GIMP.  Debug messages may appear here.

2.9.9
gimp-edit-stroke-vectors: ~ 8.391999006s
Exported tmp-2.9.9.png
plug-in 'script-fu' aborted before sending its procedure return values
746488e2c628f7fa3b5f6ce349474efc  tmp-2.9.9.png
HEAD is now at d2976573c0... foo-area-format
This is a development version of GIMP.  Debug messages may appear here.

2.9.9
gimp-edit-stroke-vectors: ~ 5.473551989s
Exported tmp-2.9.9.png
plug-in 'script-fu' aborted before sending its procedure return values
746488e2c628f7fa3b5f6ce349474efc  tmp-2.9.9.png
HEAD is now at c0ee3c96c0... foo-modulus
This is a development version of GIMP.  Debug messages may appear here.

2.9.9
gimp-edit-stroke-vectors: ~ 4.676603079s
Exported tmp-2.9.9.png
plug-in 'script-fu' aborted before sending its procedure return values
746488e2c628f7fa3b5f6ce349474efc  tmp-2.9.9.png
[massimo@pluto gimp ((c0ee3c9...))]$ git checkout master && git branch -D test
La precedente posizione di HEAD era c0ee3c96c0... foo-modulus
Si è passati al branch 'master'
Your branch is up-to-date with 'origin/master'.
Ramo test eliminato (era c0ee3c96c0).
[massimo@pluto gimp (master)]$ LD_LIBRARY_PATH=$HOME/prefix/lib64 LC_ALL=C ~/tmp/bin/gimp-console-2.8 -b '(load "test-stroke.scm")'
2.8.23
gimp-edit-stroke-vectors: ~ 9.428511143s
Exported tmp-2.8.23.png
[massimo@pluto gimp (master)]$

Comment 39 Massimo 2018-03-22 11:16:02 UTC

Created attachment 370004 [details]
the script-fu script

Notes:
Besides timing the effects of the previous patchset, it shows that to obtain
output similar to gimp-2.8 it is necessary to edit scripts.

To do so, it uses (gimp-version) so when gimp-2.9.10 is out it has to be adapted

Comment 40 Michael Natterer 2018-03-22 17:36:26 UTC

Thank Massimo! But I think you forgot to attach 2 or the 3 patches?

Comment 41 Massimo 2018-03-22 17:52:17 UTC

(In reply to Michael Natterer from comment #40)
> Thank Massimo! But I think you forgot to attach 2 or the 3 patches?

Well that's all. It is three commits in a single file

 git format-patch --stdout -3 > test-stroke.patch

and when applied with git am they appear as three commits

Comment 42 Jehan 2018-03-25 17:40:59 UTC

*** Bug 794503 has been marked as a duplicate of this bug. ***

Comment 43 Ell 2018-03-25 19:25:54 UTC

Review of attachment 370003 [details] [review]:

Thanks.  Squashed and pushed to master, with a few tweaks:

commit f561231e1f59494907103b30bf3d14373c0986ae
Author: Massimo Valentini <mvalentini@src.gnome.org>
Date:   Thu Jan 18 10:24:51 2018 +0100

    app: various speedups to gimp_brush_core_color_area_with_pixmap()
    
    In gimp_brush_core_color_area_with_pixmap(), use the native area
    format when painting the brush, instead of always going through
    "RGBA float", and create the pixmap -> area fish only once, instead
    of once per scanrow.
    
    In gimp_brush_core_paint_line_pixmap_mask(), avoid modulus
    calculation at each pixel.
    
    See bug #694917.

 app/paint/gimpbrushcore.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------
 1 file changed, 72 insertions(+), 47 deletions(-)

Comment 44 Jehan 2018-03-26 00:44:06 UTC

For the record, I tested your test script:

* both GIMP 2.8 and master takes about between 13 and 17 seconds to run it (so let's say an average around 15 seconds).

* Once your script is in, GIMP master takes between 7 to 9 seconds.

So there is clearly an improvement here! Thanks Massimo for the patches.

BUT I still see huge lag when painting interactively on canvas! These tests make me think that probably the issue is not happening in painting core but in the display code which is simply not able to follow up with the canvas updates fast enough.

Maybe trying to optimize the painting code was the wrong idea to begin with!

Comment 45 Jehan 2018-03-26 00:46:50 UTC

(When I say *wrong idea*, of course I mean: it's awesome it got optimized and more optimizations are welcome of course! ;p I just meant that this may not be the culprit code for the current slow painting issue)

Comment 46 Jehan 2018-03-26 01:11:21 UTC

To illustrate, I just made a video right now.

First I draw with GIMP 2.8. As you can see, it is quite fast, and drawing is following the mouse closely enough.

Then I draw with GIMP master (even after Massimo's commits). Nearly immediately the drawing is lagging behind the mouse pointer. At some point (around 0:44), I stop drawing and even put the cursor over the Brushes dock so that it is well visible outside the canvas, and we can see the canvas being drawn for more than 10 seconds afterwards!

https://youtu.be/pBEqmvLLcQ0

Of course, in real life, you don't need to continuously scramble your pen over a full canvas of 5000px at high speed. Also the problem is not that terrible on Aryeom's computer, probably because it is a faster one (though I still have a quite ok computer, not at all an antiquity!).
So this is not a blocker. Yet there is clearly a regression somewhere.

Also considering the painting scripting tests, maybe the issue is indeed not in painting but only in canvas updating.

Comment 47 Massimo 2018-03-26 15:15:08 UTC

(In reply to Jehan from comment #46)
> To illustrate, I just made a video right now.
> 
> First I draw with GIMP 2.8. As you can see, it is quite fast, and drawing is
> following the mouse closely enough.
> 
> Then I draw with GIMP master (even after Massimo's commits). Nearly
> immediately the drawing is lagging behind the mouse pointer. At some point
> (around 0:44), I stop drawing and even put the cursor over the Brushes dock
> so that it is well visible outside the canvas, and we can see the canvas
> being drawn for more than 10 seconds afterwards!
> 
> https://youtu.be/pBEqmvLLcQ0
> 

Although I don't experience such a lag with a 20 px "2. Hardness 050" brush
on a 5000x4000 image, I ran perf to profile painting like in your video and
it seems there is quite a lot of time spent invalidating 
(allocating/freeing/lookingup) a GEGL graph.

One simple thing to try is in gegl_transform_get_invalidated_by_change
which seems to cost more than 3% of "my test" run.

Now GIMP uses a gegl:transform in its projection/layer graph,
but often it is a nop because the layer offset is zero. This
means the gegl:transform matrix is the identity, so here:

https://git.gnome.org/browse/gegl/tree/operations/transform/transform-core.c#n990

it is useful to move the creation/destruction of the buffer sampler
after the

if (... is_identity (...)) return region;

cause 'context_rect' is not used before.


In my tests the cost reduces from >3% to 0.76% and
gegl_buffer_sampler_new_at_level disappears from perf output.

I don't notice the effect, but it should be correct and hardly worse
than the current code.

Comment 48 Massimo 2018-03-27 07:41:20 UTC

(In reply to Massimo from comment #47)
> 
> Although I don't experience such a lag with a 20 px "2. Hardness 050" brush
> on a 5000x4000 image, I ran perf to profile painting like in your video and
> it seems there is quite a lot of time spent invalidating 
> (allocating/freeing/lookingup) a GEGL graph.
> 

Retrying today it seems invalidation cost is now reduced to a reasonable
value.

Looking at the code and perf output a possible improvement seems to be here:


https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render.c#n92

Doesn't seem wise to always create this cairo buffer if it is only used
in an internal else branch. 

https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render.c#n240

Perhaps it is possible to create it only when needed or perform directly 
the color_transform on the cairo_data

Comment 49 Øyvind Kolås (pippin) 2018-03-27 22:13:36 UTC

(In reply to Massimo from comment #47)
> One simple thing to try is in gegl_transform_get_invalidated_by_change
> which seems to cost more than 3% of "my test" run.
...
> https://git.gnome.org/browse/gegl/tree/operations/transform/transform-core.
> c#n990
> 
> it is useful to move the creation/destruction of the buffer sampler
> after the
...
> In my tests the cost reduces from >3% to 0.76% and
> gegl_buffer_sampler_new_at_level disappears from perf output.

This improvement has been applied to GEGL master:

commit 34f217b7d4297c7c49e16fd27ff601e9a0d07c83
Author: Øyvind Kolås <pippin@gimp.org>
Date:   Wed Mar 28 00:07:21 2018 +0200

    transform: sometimes do less work in gegl_transform_get_invalidated_by_change
    
    As spotted by massimo in bug #694917 - the construction of this object is not
    neccesary in the case where we return early (finding the context rect could
    further also be special cased to always avoid the overhead of
    constructing/destroying a gobject).

Comment 50 Øyvind Kolås (pippin) 2018-03-27 22:38:02 UTC

(In reply to Massimo from comment #48)
> https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render.
> c#n92
> 
> Doesn't seem wise to always create this cairo buffer if it is only used
> in an internal else branch. 
> 
> https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render.
> c#n240
> 
> Perhaps it is possible to create it only when needed or perform directly 
> the color_transform on the cairo_data

Indeed - and this isn't a cairo-buffer it is a linear GeglBuffer wrapper around
the already used pixel_data fished out of a cairo surface - this temporary interfacing object can indeed be created/destroyed in a local scope.

commit a7b0d55fc5f7d6569ed98154e0311c96225d1e22
Author: Øyvind Kolås <pippin@gimp.org>
Date:   Wed Mar 28 00:35:08 2018 +0200

    app: only create GeglBuffer wrapper for cairo-data when used
    
    As suggested by massimo in bug #694917, move unconditional creation/destruction
    of a wrapper GeglBuffer object from top-level scope of the function to the
    single conditional scope where it is used.

Comment 51 Massimo 2018-03-28 12:03:21 UTC

Looking at GIMP code executed a lot while painting, there is the function
rtree_insert(), which looks weird:

https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n152

    if (node->w >= w && w < 2 * node->w && node->h >= h && h < 2 * node->h)

Unless I'm not yet well awake, it seems to me that the second and fourth
conditions are true when the first and third are.
          
Assuming w>0 (guaranteed by gimp_display_xfer_get_surface caller()) and
node->w>0 and neglecting overflows:
          
 node->w           >= w   
 node->w + node->w >= w + node->w 
 2 * node->w       > w + 0
 2 * node->w > w
        
That would mean that the second loop is identical to the first and it makes
me wonder if there is a typo and the intent was
          
    if (w <= node->w && node->w < 2 * w ...)
                         
or something entirely different.
        
Anyway if it is correct it should be possible to simplify it

Comment 52 Massimo 2018-03-28 19:37:35 UTC

I think that here: 

https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116

it is wrong/harmful to insert zero sized rectangles in the tree
because they can never be selected and just make the list longer.

So I'd say rtree_node_create() should just return NULL when w or h are 0

https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n70

Comment 53 Massimo 2018-03-29 10:21:00 UTC

Another function used a lot while painting is GEGL gegl_downscale_2x2_u8_nl()

https://git.gnome.org/browse/gegl/tree/gegl/gegl-algorithms.c#n540

Here I think lut accesses for the alpha channel could/should be spared.

Comment 54 Michael Natterer 2018-03-29 10:29:40 UTC

gimpdisplayxfer.c was sortof "donated" to us by Chris Wilson because
were were apparently doing something quite wrong/inefficient.

I think you are now the next best thing to an expert we have :)

Comment 55 Chris Wilson 2018-03-29 10:56:58 UTC

(In reply to Massimo from comment #51)
> Looking at GIMP code executed a lot while painting, there is the function
> rtree_insert(), which looks weird:
> 
> https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n152
> 
>     if (node->w >= w && w < 2 * node->w && node->h >= h && h < 2 * node->h)
> 
> Unless I'm not yet well awake, it seems to me that the second and fourth
> conditions are true when the first and third are.
>           
> Assuming w>0 (guaranteed by gimp_display_xfer_get_surface caller()) and
> node->w>0 and neglecting overflows:
>           
>  node->w           >= w   
>  node->w + node->w >= w + node->w 
>  2 * node->w       > w + 0
>  2 * node->w > w
>         
> That would mean that the second loop is identical to the first and it makes
> me wonder if there is a typo and the intent was
>           
>     if (w <= node->w && node->w < 2 * w ...)
>                          
> or something entirely different.

That's the intent. The first pass was meant to be a "best fit". So only split this node if approximately the same size as the desired region.

so node->w has to be wide enough for the request w, but not too wide:

so w >= node->w && (node->w - w) < w

(Now why only splitting into 2 children, I can't remember.)

Comment 56 Chris Wilson 2018-03-29 11:01:55 UTC

(In reply to Massimo from comment #52)
> I think that here: 
> 
> https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116

rtree_node_insert() ?

Inside rtree_node_insert() we are asking whether or not we need to split this node. If the residual is empty, we just use the current node as the leaf.

> it is wrong/harmful to insert zero sized rectangles in the tree
> because they can never be selected and just make the list longer.
> 
> So I'd say rtree_node_create() should just return NULL when w or h are 0

Yes, an empty request wasn't considered.

Comment 57 Chris Wilson 2018-03-29 11:06:10 UTC

I'm sure there are much better algorithms for quick packing than the current rtree; the intent was to be able to reuse the SHM image for transport (avoiding having to overwrite active regions as that requires a RTT to the server) so rtree was just a means to parcel up the remaining space in the SHM image, and then being able to swap over to a new SHM image batching up the requests/replies.

Comment 58 Massimo 2018-03-29 12:30:30 UTC

(In reply to Chris Wilson from comment #56)
> (In reply to Massimo from comment #52)
> > I think that here: 
> > 
> > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116
> 
> rtree_node_insert() ?
> 

Yes, see:

  if (((node->w - w) | (node->h - h)) > 1)

I mean when say (node->w - w) == 0 and (node->h - h) == 2

(2 | 0) > 1 is true and in the else branch 

          node->children[1] = rtree_node_create (rtree, prev,
                                                 node->x + w, node->y,
                                                 ww, h);

will insert an empty rectangle in front of the non empty rectangle
which will never be used and always checked prior the non empty

This seems to happen often when painting with a brush that will have to 
update often a different area of the same size.

> Inside rtree_node_insert() we are asking whether or not we need to split
> this node. If the residual is empty, we just use the current node as the
> leaf.
> 
> > it is wrong/harmful to insert zero sized rectangles in the tree
> > because they can never be selected and just make the list longer.
> > 
> > So I'd say rtree_node_create() should just return NULL when w or h are 0
> 
> Yes, an empty request wasn't considered.

Comment 59 Chris Wilson 2018-03-29 12:49:02 UTC

(In reply to Massimo from comment #58)
> (In reply to Chris Wilson from comment #56)
> > (In reply to Massimo from comment #52)
> > > I think that here: 
> > > 
> > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116
> > 
> > rtree_node_insert() ?
> > 
> 
> Yes, see:
> 
>   if (((node->w - w) | (node->h - h)) > 1)
> 
> I mean when say (node->w - w) == 0 and (node->h - h) == 2

Gotcha. You can skip the empty child, and skip the NULL.

Comment 60 jose americo gobbo 2018-03-29 14:56:07 UTC

I have an issue with my test in comment #36:
I did again the comparison (GIMP 2.10 rc versus GIMP 2.8.22) with 'Ctrl + Shift' key to tracing a brushstroke along the major side of an A3 format and we have the same performance before/after of the last patches (Massimo and Pippin).
But, when you are painting freely the performance was improved.
So, when you are using 'Ctrl + Shift', is this a way different from the free painting?

Comment 61 Massimo 2018-03-29 15:04:37 UTC

Created attachment 370303 [details] [review]
attempt to improve painting with small brushes, big images

(In reply to Massimo from comment #52)
> I think that here: 
> 
> https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116
> 
> it is wrong/harmful to insert zero sized rectangles in the tree
> because they can never be selected and just make the list longer.
> 
> So I'd say rtree_node_create() should just return NULL when w or h are 0
> 
> https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n70


Attaching a patch to be tested that keeps the same rectangle packing
behaviour, so to behave exactly as before for what concerns batching
the updates, but should be lighter when looping to find the first
good rectangle to use.
     
In my test (painting 10 s, brush size 20 px, image size 5000x4000 RGBu8)
'perf' shows a reduction of cycles percentage from 4.47% to 1.89%, I
don't see lag with it, I don't without it.

Comment 62 Massimo 2018-03-29 16:14:28 UTC

(In reply to jose americo gobbo from comment #60)
> I have an issue with my test in comment #36:
> I did again the comparison (GIMP 2.10 rc versus GIMP 2.8.22) with 'Ctrl +
> Shift' key to tracing a brushstroke along the major side of an A3 format and
> we have the same performance before/after of the last patches (Massimo and
> Pippin).
> But, when you are painting freely the performance was improved.
> So, when you are using 'Ctrl + Shift', is this a way different from the free
> painting?

Your test exercises code that is used when painting, but it knows in advance
the path to stroke and doesn't interleave drawing with giving the feedback. 

So it has something in common with free painting and something specific.

Comment 63 Jehan 2018-03-30 22:21:43 UTC

(In reply to Massimo from comment #61)
> Created attachment 370303 [details] [review] [review]
> attempt to improve painting with small brushes, big images
>
> Attaching a patch to be tested that keeps the same rectangle packing
> behaviour, so to behave exactly as before for what concerns batching
> the updates, but should be lighter when looping to find the first
> good rectangle to use.

Awesome work. I reviewed, and that looks good. Tested also and I don't see anything obviously broken.
Let's push!
      
> In my test (painting 10 s, brush size 20 px, image size 5000x4000 RGBu8)
> 'perf' shows a reduction of cycles percentage from 4.47% to 1.89%, I
> don't see lag with it, I don't without it.

Unfortunately I don't see any sensible improvements on speed. But I'm sure we will find the issue at some point! :-)

Comment 64 Jehan 2018-03-30 22:22:17 UTC

Review of attachment 370303 [details] [review]:

Per previous comment.

commit 2aa4426d4f30f576bb04c90ef573fdc4a88ee046 (HEAD -> master, origin/master, origin/HEAD)
Author: Massimo Valentini <mvalentini@src.gnome.org>
Date:   Thu Mar 29 16:50:15 2018 +0200

    app: improvements in code executed a lot while painting.
    
    This keeps the same rectangle packing behaviour, so to behave exactly as
    before for what concerns batching the updates, but should be lighter
    when looping to find the first good rectangle to use.
    
    In rtree_insert(), some conditions in the if tests are implied by
    previous conditions. And therefore the 2 successive for loops are
    actually identical.
    
    In rtree_node_insert(), it is wrong/harmful to insert zero sized
    rectangles in the tree because they can never be selected and just make
    the list longer. So rtree_node_create() should just return NULL when w
    or h are 0.
    
    See bug 694917, comments 51 to 61.

 app/display/gimpdisplayxfer.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comment 65 Jehan 2018-03-30 22:34:16 UTC

(In reply to Massimo from comment #53)
> Another function used a lot while painting is GEGL gegl_downscale_2x2_u8_nl()
> 
> https://git.gnome.org/browse/gegl/tree/gegl/gegl-algorithms.c#n540
> 
> Here I think lut accesses for the alpha channel could/should be spared.

I also see a lot of time being spend there. 5.40% is spent (including 5.28% in self!). So any optimization there could definitely be worth it!

Comment 66 Ell 2018-03-31 12:27:57 UTC

The following commit improves painting speed by ~20%, although the actual number most likely depends pretty heavily on the specific details of the test performed:

commit 49285463e6c0cc03c361902df4e3006fa135dde3
Author: Ell <ell_se@yahoo.com>
Date:   Sat Mar 31 08:13:36 2018 -0400

    app: align projection update area to coarse grid
    
    When adding a rectangle to a projection's update area, align the
    rectangle to a coarse grid, to reduce the complexity of the overall
    area.  We currently align the rectangle to a 32x32 grid, which
    seems to be a good tradeoff between the overhead of processing a
    complex area, and the overhead of processing a large area.

 app/core/gimpprojection.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

You can measure the impact using the CPU activity timer in the dashboard, by painting for a set amount of time, and seeing how long it takes to catch up.

Comment 67 jose americo gobbo 2018-03-31 12:41:17 UTC

(In reply to Massimo from comment #62)
> (In reply to jose americo gobbo from comment #60)
> > I have an issue with my test in comment #36:
> > I did again the comparison (GIMP 2.10 rc versus GIMP 2.8.22) with 'Ctrl +
> > Shift' key to tracing a brushstroke along the major side of an A3 format and
> > we have the same performance before/after of the last patches (Massimo and
> > Pippin).
> > But, when you are painting freely the performance was improved.
> > So, when you are using 'Ctrl + Shift', is this a way different from the free
> > painting?
> 
> Your test exercises code that is used when painting, but it knows in advance
> the path to stroke and doesn't interleave drawing with giving the feedback. 
> 
> So it has something in common with free painting and something specific.

I think very interestingly these improvements around the fast painting with small brushes with low spacing... but, we must also add tests with large size brushes (512 px, for instance) also with spacing between 3 to 10% in large formats as A3 and A2 with 300ppi.
For instance, I did a qualitative comparison with big size brush (512 px, .gbr and .gih) in a 5000 x 7000 pixels between GIMP 2.8.22 and 2.10 rc1 and we have a great difference in the painting performance (painting freely on the canvas with 12.5% zoom).

The qualitative data of my testing in a 7000 x 5000 pixels document was:

Spacing		GIMP 2.8.22			GIMP 2.10 RC1
1		slow with lag but acceptable	slow++++ not acceptable
3		Regular with discrete lag	slow+++ not acceptable
5		Fast, Excellent			slow++ acceptable (+/-)
10		Fast, Excellent			slow+ acceptable
20		Fast, Excellent			slow acceptable
---

To compare qualitatively painting performing is possible to say that GIMP 2.8.22 is working with spacing 1% as GIMP 2.10 rc1 is working with spacing 20%.

Comment 68 jose americo gobbo 2018-03-31 12:42:15 UTC

Created attachment 370379 [details]
Brush 512 px used on my qualitative testing

Comment 69 jose americo gobbo 2018-03-31 12:54:06 UTC

I must do a short note about https://bugzilla.gnome.org/show_bug.cgi?id=768118 (Spacing Dynamic) and these tests:
I think useful verify if is possible to fix this bug to test the spacing effects dynamically, too.

Comment 70 Massimo 2018-03-31 12:56:06 UTC

(In reply to jose americo gobbo from comment #68)
> Created attachment 370379 [details]
> Brush 512 px used on my qualitative testing

With brush/images that big it might be worth testing increasing
these values:

https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n58


which is already possible setting this env variable

https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n212

Comment 71 Elle Stone 2018-03-31 13:08:14 UTC

How does one set GIMP_DISPLAY_RENDER_BUF_SIZE? I'm guessing with something like:

GIMP_DISPLAY_RENDER_BUF_SIZE=XXX
export GIMP_DISPLAY_RENDER_BUF_SIZE

assuming one is using a script to start GIMP.

So what is the default GIMP_DISPLAY_RENDER_BUF_SIZE, and what would be a good value to try?

Comment 72 Jehan 2018-03-31 13:44:54 UTC

Created attachment 370382 [details]
Screenshot of perf result

(In reply to Ell from comment #66)
> The following commit improves painting speed by ~20%, although the actual
> number most likely depends pretty heavily on the specific details of the
> test performed:
> 
> commit 49285463e6c0cc03c361902df4e3006fa135dde3

I believe it may be faster indeed though I still see some clear lag (more than 2.8) and also right now I am not testing with the exact same setup (I don't have my small tablet at hands with which I did previous tests; graphics tablets generate a lot of events and therefore it may make lags worse, I guess). So I'll test tonight again with the tablet in similar conditions.

Looking at the perf report, I can also see 2.5% is spent in self of gimp_operation_normal_process_sse4(), and 1.49% in self of conv_rgb8_rgbaF() (babl).

gegl_downscale_2x2_u8_nl() is still one of the biggest blocks with 5.1% now in self.

The biggest cost altogether is g_hash_table_lookup(), with 5.61% in self, but obviously this is called in various places. It is mostly costly from cache_lookup() (GEGL), i.e. 2.12%.

combine_paint_mask_to_canvas_mask() also is costy with 1.91% of cycles.

I am attaching a screenshot of the perf result. I would upload the `perf.data` file itself, but after a test, it doesn't seem like this file contains the debug symbols and the report is quite useless without. If you need my report of running perf and if you know what I should do/upload for you to get useful info from it, tell me what/how please. ;-)

Hope these info help!

Comment 73 Massimo 2018-03-31 14:00:39 UTC

(In reply to Elle Stone from comment #71)
> How does one set GIMP_DISPLAY_RENDER_BUF_SIZE? I'm guessing with something
> like:
> 
> GIMP_DISPLAY_RENDER_BUF_SIZE=XXX
> export GIMP_DISPLAY_RENDER_BUF_SIZE
> 
> assuming one is using a script to start GIMP.
> 
> So what is the default GIMP_DISPLAY_RENDER_BUF_SIZE, and what would be a
> good value to try?

I did not try but it should be

GIMP_DISPLAY_RENDER_BUF_SIZE=widthxheight

the current value would be

GIMP_DISPLAY_RENDER_BUF_SIZE=256x256

values worth trying would be power of 2, so 512x256 512x512 
or 1024x256

Comment 74 Jehan 2018-03-31 14:41:39 UTC

Created attachment 370386 [details] [review]
app: do not copy needlessly paint_mask GimpTempBuf.

Looking at combine_paint_mask_to_canvas_mask() code, it looks like we were calling gimp_temp_buf_copy() needlessly. I don't see any kind of write operation, it's read-only.
Is there any reason why we copy the temp buffer first?

Could someone review and tell me if that makes sense? With this change, perf report of combine_paint_mask_to_canvas_mask() are a bit better (still 1.5% but better than before).

Comment 75 Jehan 2018-03-31 14:57:39 UTC

Review of attachment 370386 [details] [review]:

Ok. Ell reviewed on IRC. So I push.

commit 04a798d786617257ffa2b75ad24a14afd7b5c087 (HEAD -> master, origin/master, origin/HEAD)
Author: Jehan <jehan@girinstud.io>
Date:   Sat Mar 31 16:36:55 2018 +0200

    app: do not copy needlessly paint_mask GimpTempBuf.
    
    We are not doing any write operation on this mask data so copying all
    the data just to read it and unreffing it in the end is only a cost on
    performance.
    See also bug 694917.

 app/paint/gimppaintcore-loops.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

Comment 76 jose americo gobbo 2018-03-31 15:18:02 UTC

Only to illustrate the performance between 2.8.22 and 2.10 rc1 follow two short screencast videos:

1) GIMP 2.8.22 > https://www.youtube.com/watch?v=ubIlFxM4vgQ
2) GIMP 2.10 rc1 > https://www.youtube.com/watch?v=ML_DVEUOnK4

Document 7000 x 5000 pixels
Paintbrush tool
Brush 512 px
No Dynamics
Mouse

Comment 77 Jehan 2018-03-31 15:20:23 UTC

Created attachment 370388 [details]
cache_lookup() called from various places.

Another screenshot detailing most expensive call stacks of the cache_lookup(), happening either in paint core, projection flush, display shell, etc.

In case it helps.

Comment 78 Jehan 2018-03-31 15:22:18 UTC

(In reply to jose americo gobbo from comment #76)
> Only to illustrate the performance between 2.8.22 and 2.10 rc1 follow two
> short screencast videos:
> 
> 1) GIMP 2.8.22 > https://www.youtube.com/watch?v=ubIlFxM4vgQ
> 2) GIMP 2.10 rc1 > https://www.youtube.com/watch?v=ML_DVEUOnK4
> 
> Document 7000 x 5000 pixels
> Paintbrush tool
> Brush 512 px
> No Dynamics
> Mouse

Yep, definitely not good.

Comment 79 Michael Natterer 2018-03-31 15:29:11 UTC

Americo, how large are your tile caches on 2.8 and 2.10?

Comment 80 jose americo gobbo 2018-03-31 16:01:35 UTC

In both cases, I used 1Gib. But I have tested also with 3Gib, but the performance doesn't change or rather is not possible see any difference.

Comment 81 Jehan 2018-04-01 16:56:30 UTC

Created attachment 370417 [details]
Plug-in to test stroke time

For testing paint speed, I just wrote a quick&dirty plug-in which will simply stroke the current path with current brush and foreground color. In the end, it will display the time spent (look in the status bar).

Plug-in entry is found under Debug > Test stroke path.

Comment 82 Jehan 2018-04-01 17:02:14 UTC

Created attachment 370418 [details]
4000x5000 canvas with a big active vectors

So I was testing with this 4000x5000 image just now, which has a Paths layer with a lot of vectors.

My tests on GIMP 2.10 (brush Hardness 050 and size 20) were returning between 6 to 9 seconds.
The same XCF loaded on 2.8 (same brush and size) were returning in 2 to 4 seconds.

Comment 83 Jehan 2018-04-01 17:06:16 UTC

Created attachment 370420 [details]
4000x5000 canvas with a big active vectors

So I was testing with this 4000x5000 image just now, which has a Paths layer with a lot of vectors.

My tests on GIMP 2.10 (brush Hardness 050 and size 20) were returning between 6 to 9 seconds.
The same XCF loaded on 2.8 (same brush and size) were returning in 2 to 4 seconds.

Comment 84 Jehan 2018-04-01 17:29:23 UTC

Created attachment 370421 [details]
Script to run the same test in console mode.

And for the record, running this same plug-in on the same image, same brush, etc. with gimp-console runs in about 2 seconds in GIMP 2.8 and 3 seconds in master (test repeated many times as were my previous ones):

>  $ gimp-console-2.8 -b '(load "test-stroke2.scm")(gimp-quit TRUE)' 
> Test stroke path-Warning: Stroke done in 2.12259793282 seconds
> $ app/gimp-console-2.10 -b '(load "test-stroke2.scm")(gimp-quit TRUE)'
Test stroke path-Warning: Stroke done in 3.19817018509 seconds

So for non-interactive running, painting is quite fast (though still a bit slower than 2.8 in this test, but would be acceptable). This same action takes about 5/6 more seconds with the GUI on master (whereas on 2.8, time spent is barely different in console or with the GUI)!

Comment 85 jose americo gobbo 2018-04-01 20:21:08 UTC

I am not able to enable these scripts :( But, I have tested by hand directly using a 7000 x 5000 px document with a path in 7000 px side.
The brush used was of attachment 370379 [details] (512 px) with 'Dynamics Off'.

With 2.10 rc1, updated now, the time elapsing was 4.45 sec.
With 2.8.22 the time elapsing was 1 sec.

I suggest to use big brush, 512 px or more, because painting with small brush in a big canvas is not usual when the zoom is 33% or less.
And the point that we must compare is how is working 2.10 rc with big size brushes when we compare with GIMP 2.8.22.

Comment 86 Elle Stone 2018-04-01 21:42:23 UTC

I agree with Americo that these tests and improvements are very nice as far as they go but large brushes are much more the problem than small brushes. 

My sense is that painting using GIMP-2.9/2.10 already is faster than it used to be. And getting large brushes to work smoothly on large canvases with smaller spacings might not happen for 2.10, and maybe that's not even the goal of this thread. But for people with fast machines who want to paint using GIMP-2.10, the real sticking point likely won't be painting with small brushes with large and larger spacings, but rather painting with medium and large brushes with small and smaller spacings. 

For example, I tried replicating Americo's results with the vbr brush from his videos in Comment 76 . On my machine - intel core i7 with 32gb ram - the zigzags were easy and fast to draw at the original spacing of around 25. But change to 12 and drawing slows down. Change to 6, and it's very, very slow - too slow to be useable. 

I did try:

GIMP_DISPLAY_RENDER_BUF_SIZE=widthxheight

comparing 128x128 with 512x512, but really didn't see any practical difference. Maybe the syntax I uses is wrong? 

And maybe people with "reasonably much RAM" aren't setting up the relevant items in Preferences correctly? I know I don't have a clue whether my settings are correct or not, and also don't know how the swap and tmp files play into helping GIMP be faster.

Comment 87 Jehan 2018-04-01 22:37:21 UTC

(In reply to Elle Stone from comment #86)
> I agree with Americo that these tests and improvements are very nice as far
> as they go but large brushes are much more the problem than small brushes. 

Well both small and large brushes are a problem. Probably using large brushes is even slower which is likely the same bug exacerbated. :-/
 
> My sense is that painting using GIMP-2.9/2.10 already is faster than it used
> to be. And getting large brushes to work smoothly on large canvases with
> smaller spacings might not happen for 2.10, and maybe that's not even the
> goal of this thread. But for people with fast machines who want to paint
> using GIMP-2.10, the real sticking point likely won't be painting with small
> brushes with large and larger spacings, but rather painting with medium and
> large brushes with small and smaller spacings. 

All use cases are possible. It is not because you work on bigger canvases that it means you also use bigger brushes. You may work with accuracy and details on very big images. ;-)

Anyway what I mean is that I don't think it is worth trying to divide into separate problems. When we will find the right optimization, it will improve any use case because there is a much larger problem than anything specific to brush size or brush spacing (if we still have a problem, this time only on big size/small spacing after we fix this, then we can focus on more specific issues; my guess though is that it will mostly be fixed or at least greatly improved in same time).
 
> And maybe people with "reasonably much RAM" aren't setting up the relevant
> items in Preferences correctly? I know I don't have a clue whether my
> settings are correct or not

Yeah I'm not sure if there is any kind of "intelligence" for the default value (do we have any code setting reasonable default settings depending on hardware?), but if we don't have, would be nice to have (at some point).

> and also don't know how the swap and tmp files
> play into helping GIMP be faster.

Well mostly the rule is: try never to swap! I can tell you will know when you swap. This suddenly becomes extremely extreeeemeeeelyyyy slow. :P
Also now you have the dashboard where you can see when you start to swap.

Anyway here this is not the problem. We have slowness without swapping.

Comment 88 Elle Stone 2018-04-01 23:28:04 UTC

(In reply to Jehan from comment #87)
> (In reply to Elle Stone from comment #86)

> > My sense is that painting using GIMP-2.9/2.10 already is faster than it used
> > to be. And getting large brushes to work smoothly on large canvases with
> > smaller spacings might not happen for 2.10, and maybe that's not even the
> > goal of this thread. But for people with fast machines who want to paint
> > using GIMP-2.10, the real sticking point likely won't be painting with small
> > brushes with large and larger spacings, but rather painting with medium and
> > large brushes with small and smaller spacings. 
> 
> All use cases are possible. It is not because you work on bigger canvases
> that it means you also use bigger brushes. You may work with accuracy and
> details on very big images. ;-)

Well, yes, I suspect most people do both! I settled on roughly 4000px by 6000px as the largest size canvas that my system can comfortably handle when using GIMP. I use large brushes (>=512px) mostly for background washes, and small to very small brushes for sketching, and other size brushes for other tasks, etc.

> Anyway what I mean is that I don't think it is worth trying to divide into
> separate problems. When we will find the right optimization, it will improve
> any use case because there is a much larger problem than anything specific
> to brush size or brush spacing (if we still have a problem, this time only
> on big size/small spacing after we fix this, then we can focus on more
> specific issues; my guess though is that it will mostly be fixed or at least
> greatly improved in same time).
>  

That sounds good!

Comment 89 Ell 2018-04-02 21:11:49 UTC

The following commit in master, and the referenced GEGL commit, reduce the amount of data processed by the GEGL zoom tile handler (corresponding to the gegl_downscale_foo() calls), which improves painting performance when zoomed out.  The actual impact varies depending on the specific use case.

commit 4acdc7392af74d09a281f334d06841e32c006547
Author: Ell <ell_se@yahoo.com>
Date:   Sat Mar 31 11:49:01 2018 -0400

    app: use gegl_tile_handler_damage_rect() in TileHandlerValidate
    
    Use the recently-added gegl_tile_handler_damage_rect() function
    during GimpTileHandlerValidate invalidation, instead of manually
    voiding the tile pyramid.  This function avoids voiding mipmapped
    tiles entirely when only a subarea of the tile needs to be redrawn.
    
    See GEGL commit 3210f4ffc3c569a2acd9483811cb141070112bc6.

 app/gegl/gimptilehandlervalidate.c | 28 +---------------------------
 app/gegl/gimptilehandlervalidate.h |  1 -
 2 files changed, 1 insertion(+), 28 deletions(-)

Comment 90 jose americo gobbo 2018-04-03 01:01:50 UTC

Created attachment 370465 [details]
Graph Comparison for 7000 px brushstroke

Test made with brush, attachment 370379 [details] (512px), No Dyna, horizontal brustroke along 7000 px.

Comment 91 jose americo gobbo 2018-04-03 19:05:43 UTC

I did another test only with 2.10 rc1. To compare the speedy painting with 8bit (perceptual) and 32f (linear).

The test settings was that:
Minimal number of undo levels: 1
Maximum undo memory: 10 Mb
Tile cache size: 4Gib
Maximum new image size: 500 Mb
Number of threads to use: 1
OpenCl disabled

---
Processor Quadcore Intel with 8Gib memory
Brush 512 px
No Dyna
---

With this settings, GIMP spent to trace a brushstroke long 3000 px:
8bit Perceptual > 6s
32f Linear > 18s

Comment 92 jose americo gobbo 2018-04-03 19:16:28 UTC

> With this settings, GIMP spent to trace a brushstroke long 3000 px:
> 8bit Perceptual > 6s (Normal Legacy blend)
> 32f Linear > 18s (Normal Legacy blend)
with 32f linear and Default Normal blend > 6s

Comment 93 Jehan 2018-04-03 19:40:56 UTC

Americo, when you say you use 2.10 rc1, are you using master and updating regularly GIMP? Because there were tremendous improvements between the RC1 released and master already.

I can definitely see them when just testing and now my test from comment 82 takes a little bit under 6 seconds. This is still twice slower than 2.8 (but used to be 3 times slower, so…).
Now of course, with big brushes and small spacing, your tests suggest that the difference of speed increases. But I just wanted to make sure that is done on master.

Master is still not as good as was 2.8, but code is moving a lot so any test has to be done on master code, otherwise this is not really meaningful.
Thanks!

Comment 94 jose americo gobbo 2018-04-03 19:54:58 UTC

(In reply to Jehan from comment #93)
> Americo, when you say you use 2.10 rc1, are you using master and updating
> regularly GIMP? Because there were tremendous improvements between the RC1
> released and master already.

Yes, the 2.10 rc1 was the last updated, a half hour ago.
To complete the test:
8bit Perceptual > Legacy Normal Blend = 6s
8bit Perceptual > Default Normal Blend = 8s
8bit Linear > Legacy Normal Blend = 18s
8bit Linear > Default Normal Blend = 5s (seems a bit more fast).

Comment 95 Jehan 2018-04-03 22:56:58 UTC

Created attachment 370504 [details]
Perf output with bigger canvas and bigger brush

Ok so I decided to up the game, with a 10000x8000 canvas and a bigger than 300px brush. That was crazy slow, as expected (I compared also with 2.8: it had some lag, but nothing compared to master).

Here is the output of perf running on GIMP.
As you can see, nearly 20% is in self of combine_paint_mask_to_canvas_mask()! And 14% in gimp_operation_normal_process_sse4()!

Slightly lesser offender (but still!) are: canvas_buffer_to_paint_buf_alpha() with 6,5%, conv_rgba8_rgb8(), conv_rgb8_rgba8(), conv_rgbaF_rgba8() and conv_rgba8_rgbaF(), all around 5% (so many conversions occurring!).

Then gegl_downscale_2x2_u8_rgba() still around 4.4%.

If we managed to optimize these all, or maybe cut the calls (are these called more often than they should?), that would definitely improve the perfs.

Comment 96 Øyvind Kolås (pippin) 2018-04-03 23:05:25 UTC

Please also profile when drawing on a buffer with an alpha channel - based on what you say here it seems like this is being done on a layer with no alpha, and that more direct babl fast paths are already available for the case of painting on layers with alpha.

Comment 97 Jehan 2018-04-03 23:38:03 UTC

Created attachment 370505 [details]
Profile on a layer with alpha.

(In reply to Øyvind Kolås (pippin) from comment #96)
> Please also profile when drawing on a buffer with an alpha channel - based
> on what you say here it seems like this is being done on a layer with no
> alpha, and that more direct babl fast paths are already available for the
> case of painting on layers with alpha.

Indeed that was a single layer image with no opacity.
I did a new profiling when painting on an alpha layer.

The 2 same functions are still the biggest offenders but reversed with gimp_operation_normal_process_sse4() being now 18% of the cycles and combine_paint_mask_to_canvas_mask() "only" 11.2%.

I also realize that we can get even more details on the lines where the most cycles are spent in these functions. So I give this information too by making a 3-page PDF, where the 2 additional pages give these details for the top functions.

Comment 98 jose americo gobbo 2018-04-04 00:26:05 UTC

*** Bug 794301 has been marked as a duplicate of this bug. ***

Comment 99 jose americo gobbo 2018-04-04 00:52:10 UTC

*** Bug 782751 has been marked as a duplicate of this bug. ***

Comment 100 Ell 2018-04-04 23:17:59 UTC

Some paint loops are now parallelized in master, which improves performance for big brushes, especially with expensive modes or complex dynamics:

commit c8d4c079a204dded5429f336e394e492a581226a
Author: Ell <ell_se@yahoo.com>
Date:   Wed Apr 4 16:57:35 2018 -0400

    app: parallelize gimpbrush-transform.cc
    
    Use gimp_parallel_distribute_foo() to parallelize the brush
    transform functions.

 app/core/gimpbrush-transform.cc | 683 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------
 1 file changed, 365 insertions(+), 318 deletions(-)

commit 3df757ffd7311e621817aa7aa66e61efecf80de6
Author: Ell <ell_se@yahoo.com>
Date:   Wed Apr 4 17:02:34 2018 -0400

    app: parallelize gimppaintcore-loops.cc
    
    Ditto.

 app/paint/gimppaintcore-loops.cc | 430 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------
 1 file changed, 229 insertions(+), 201 deletions(-)

commit cb239e60f6ef91ba66373a285ab2f5cbeb700eb8
Author: Ell <ell_se@yahoo.com>
Date:   Wed Apr 4 17:12:21 2018 -0400

    app: parallelize gimp-gegl-loops.cc
    
    Ditto.

 app/gegl/gimp-gegl-loops.cc | 827 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------------------
 1 file changed, 448 insertions(+), 379 deletions(-)

Comment 101 Jehan 2018-04-04 23:59:33 UTC

Created attachment 370537 [details]
Artefacts on painting with new parallelized code.

I see some weird artefacts with the multi-threading. Cf. the screenshot using the classical "Hardness 050". The strokes should be all smooth (if you set threads to 1 in the preferences, you get this back), but they are like "dirty" with some vertical strokes.

Comment 102 jose americo gobbo 2018-04-05 01:31:57 UTC

I did again the test with 7000 px brushstroke with 512px brush (no dyna) and we have now of 22% faster.

Comment 103 Ell 2018-04-05 06:21:26 UTC

(In reply to Jehan from comment #101)
> Created attachment 370537 [details]
> Artefacts on painting with new parallelized code.
> 
> I see some weird artefacts with the multi-threading. Cf. the screenshot
> using the classical "Hardness 050". The strokes should be all smooth (if you
> set threads to 1 in the preferences, you get this back), but they are like
> "dirty" with some vertical strokes.

Should be fixed in master:

commit 082a6404c9613beeef4feafdb91ff43994c096dc
Author: Ell <ell_se@yahoo.com>
Date:   Thu Apr 5 02:14:29 2018 -0400

    app: fix iterator area in canvas_buffer_to_paint_buf_alpha()
    
    The entire ROI was processed in all threads, instead of only the
    thread-specific area, causing artifacts.

 app/paint/gimppaintcore-loops.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comment 104 Ell 2018-04-05 21:46:57 UTC

More parallelization.  This mostly impacts painting with dynamics:

commit 4336e2e52a5bd5dfdc31ccdf2e0594cf93b3ca90
Author: Ell <ell_se@yahoo.com>
Date:   Thu Apr 5 17:30:51 2018 -0400

    app: parallelize gimpbrushcore-loops.cc
    
    Use gimp_parallel_distribute_foo() to parallelize the brush core
    loops.

 app/paint/gimpbrushcore-loops.cc | 204 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------
 1 file changed, 126 insertions(+), 78 deletions(-)

Comment 105 Simon Budig 2018-04-06 14:26:33 UTC

*** Bug 795017 has been marked as a duplicate of this bug. ***

Comment 106 jose americo gobbo 2018-04-07 13:55:56 UTC

I did again a new speed performance test with the last GIMP 2.10 rc1:

Video 1 with GIMP 2.8.22: https://www.youtube.com/watch?v=-LJhXkOFTVc
Video 2 with GIMP 2.10 rc1: https://www.youtube.com/watch?v=4ZvF0KRV-tk

The document size is 7000 x 2500 px
Brush Size: 512 px
Spacing: 1
Dynamics Off

---
When I compare GIMP 2.8.22 with the last GIMP 2.10 rc1 with these conditions:
To have the similar performance on GIMP 2.10 rc1 is necessary to use spacing equal 10 instead spacing 1.

The time elapsing to paint the brushstroke along the 7000 px:
GIMP 2.8.22: 1s circa
GIMP 2.10 rc1: 10s circa

Comment 107 Ell 2018-04-08 14:25:27 UTC

Painting is now done in a separate thread, so that updating the display doesn't stall it:

commit 8e7a34297fe993c1e8de9d85ecd61c73a1e4afd4
Author: Ell <ell_se@yahoo.com>
Date:   Sun Apr 8 09:28:23 2018 -0400

    app: move painting to a separate thread
    
    Add gimppainttool-paint.[ch], which takes care of painting during
    motion events in GimpPaintTool.  Perform the actual painting in a
    separate thread, so that display updates, which can have a
    significant synchronization overhead, don't stall painting.
    
    Allow specific paint tools to opt-out of a separate paint thread,
    and avoid it in GimpMybrushTool, since it doesn't seem to work.
    
    The separate paint thread can be explicitly disabled by setting the
    GIMP_NO_PAINT_THREAD environment variable.

 app/tools/Makefile.am           |   2 +
 app/tools/gimpmybrushtool.c     |   5 ++-
 app/tools/gimppainttool-paint.c | 312 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 app/tools/gimppainttool-paint.h |  30 +++++++++++++
 app/tools/gimppainttool.c       |  19 ++++----
 app/tools/gimppainttool.h       |   2 +
 6 files changed, 359 insertions(+), 11 deletions(-)

This should be a pretty significant boost, at least in display-bound cases.  I'm seeing 2.10 being about as fast as, or faster than, 2.8 is many cases now (under equivalent conditions).  Fingers crossed that you guys report similar results -- please test!

And another display-related optimization:

commit db50c72c24fffd07d3bdcadc6a633b864d065fac
Author: Ell <ell_se@yahoo.com>
Date:   Sun Apr 8 09:16:53 2018 -0400

    app: align display paint area to a coarse grid
    
    Align rectangles added to the display paint area, in
    gimp_display_paint_area(), to a coarse grid, to reduce the
    complexity of ther overall area.  This is similar to commit
    49285463e6c0cc03c361902df4e3006fa135dde3, however the alignment
    happens in display space, instead of image space.

 app/display/gimpdisplay.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comment 108 jose americo gobbo 2018-04-08 17:45:16 UTC

Created attachment 370661 [details]
Speed Performance last update GIMP 2.10 rc1 (commit 8e7a34297fe)

I did a test more extensive and following the Pippin advice (testing with alpha and not).
The improvement is great on 8i precision comparing with previous 2.10 rc1 releases, but yet is great the difference with the 2.8.22 performance, circa 5 times slower.
Instead with 32f precision with alpha, the performance grows and is 3 times circa slower than 2.8.22 performance.

Comment 109 jose americo gobbo 2018-04-08 17:46:30 UTC

Created attachment 370662 [details]
Graph Speed Performance 2.10 rc1 (commit 8e7a34297fe)

Comment 110 jose americo gobbo 2018-04-08 23:15:02 UTC

(In reply to jose americo gobbo from comment #109)
> Created attachment 370662 [details]
> Graph Speed Performance 2.10 rc1 (commit 8e7a34297fe)

All data of this test for the different precisions are referred only to 'legacy' blend mode.

Comment 111 jose americo gobbo 2018-04-08 23:28:23 UTC

Created attachment 370668 [details]
Spreadsheet of last test revised... adding some notes (commit 8e7a34297fe)

Comment 112 jose americo gobbo 2018-04-09 18:50:10 UTC

Created attachment 370706 [details]
Default x Legacy with 32f and 8i

Graph Comparison between Default and Legacy Blends to 32f (Perceptual and Linear) and 8i (Perceptual).

Conditions of the test:
Brush size 512 px (attachment 370379 [details])
7000 x 2500 px document
Brushstroke 7000 px along the document width (with Ctrl+Shift on Paintbrush)
Dynamics Off
4 processors / 4 threads (Gimp 2.8.22/GIMP 2.10 rc1)
Memory 8 Gib
Tile Cache 4 Gib

Comment 113 jose americo gobbo 2018-04-09 19:06:04 UTC

Created attachment 370707 [details]
Spreadsheet of last test revised GIMP commit f77edcdf193e

The spreadsheet has the complete data of the tests with Legacy and Default Blend modes verified with 32f (Perceptual and Linear) and 8i (Perceptual).
I verified that:

8i
---
+ With Perceptual/Legacy is faster than Perceptual/Default blend mode.

32f
---
+ With Perceptual, the Legacy blend is faster than Default blend.
+ With Linear, the Default blend is faster than Legacy blend mode.
+ The Linear/Default and Perceptual/Legacy blends are very similar as speed performance.

Test conditions:
Brush size 512 (attachment 370379 [details])
7000 x 2500 px document
Brushstroke 7000 px along document width (Ctrl+Shift with Paintbrush)
Dynamics Off
4 processors / 4 threads (GIMP 2.8.22/GIMP 2.10 rc1)
8 Gib Memory
4 Gib Tile Cache

Comment 114 Ell 2018-04-19 09:51:59 UTC

When it comes to simple painting using no dynamics, and cheap blend modes, there's a good chance we're, at least partially, memory bound.  The following commit should improve painting speed in such cases (I'm seeing at least ~10% speedup, but this number can probably vary depending on circumstances):

commit f2a1fd5bf0e170e79bc3c2f1f162d2f73ecb9747
Author: Ell <ell_se@yahoo.com>
Date:   Sat Apr 14 19:02:21 2018 -0400

    app: refactor gimppaintcore-loops to coalesce iteration
    
    The gimppaintcore-loops functions perform very little actual
    computational work (in case of do_layer_blend(), at least for
    simple blend modes), which makes the cost of buffer iteration, and
    memory bandwidth, nonnegligible factors.  Since these functions are
    usually called in succession, acessing the same region of the same
    buffers, using the same foramts, coalescing them into a single
    function, which performs all the necessary processing in a single
    step, can improve performance when these functions are the
    bottleneck.
    
    Add a gimp_paint_core_loops_process() function, which does just
    that: it takes a set of algorithms to run, and a set of parameters,
    and performs all of them in one go.  The individual functions are
    kept for convenience, but are merely wrappers around
    gimp_paint_core_loops_process().
    
    Be warned: the implementation uses unholy C++ from outer space, in
    order to make this (sort of) managable.  See the comments for more
    details.

 app/paint/gimppaintcore-loops.cc | 1494 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
 app/paint/gimppaintcore-loops.h  |  116 ++++++----
 app/paint/gimppaintcore.c        |   75 +++----
 3 files changed, 1315 insertions(+), 370 deletions(-)

Comment 115 GNOME Infrastructure Team 2018-05-24 13:36:10 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gimp/issues/454.