GNOME Bugzilla – Bug 694917
Drawing much slower than GIMP 2.8
Last modified: 2018-05-24 13:36:10 UTC
Brushes which have reasonable performance in 2.8.4 are rendered very slowly in git in any mode (8bpc to 32bpc). With a spacing of 1px, fuzzy, rectangular brush, 25px in diameter, no dynamics, dragging out a line of 200 px length takes 3 to 4 seconds but happens instantaenously on 2.8.4. Adding dynamics or using a bigger brush is out of the question. Waiting times with normal production-like settings go up to a minute for even the shortest lines.
Good that you mention it, we haven't noticed.
I guess we don't need a bug about painting slowness yet , at least not at this stage in the development of 2.9?
I did not know that this was commonly known. I was surprised by the performance after I was told to use master for 32bpc.
Setting env variable GEGL_SWAP=ram helps somewhat, but not entierly. And yes this is very much a known development version bug.
We should check this for the current versions.
There have been lots of optimizations in the meantime. Anyone willing to do a quick test? If it's not considerably slower we should close this as FIXED.
I'built it on Saturday -- it felt ... "incredibly slow". Sorry for the news. Maybe there is some optimization I am not turning on, but I don't think this bug can be closed.
Much slower than the same brush with the same settings in 2.8?
On my own system (32gb ram and a relatively recent, fast processor), there doesn't seem to be any lag with a 25px brush, no matter how complicated the brush or the dynamics and regardless of which version of GIMP that I use. But with a larger brush (say around 180px) the lag is very noticeable on 2.9, (but not on 2.8), and even more noticeable if the brush is larger than 256px. And a complicated dynamics makes the problem worse. Whereas on recent versions of Krita (such as 3.1.1) the brush has to be really, really large to see a lag. People are starting to work with larger canvases, which means using larger brushes, especially for things like creating soft background washes or making gradual fade-out masking when editing a photograph. With GIMP-2.9, I have to size the brushes and hence the canvas to what GIMP can handle without too much lag. There was a time when Krita brushes were considerably slower than they are now, and I know there was a huge push towards making painting faster. It might be worth looking at the Krita code.
This is much better than it used to be and usable on common sizes for screen (even the very big UHD ones), but you can still see a lag difference. I just tested on my computer, both 2.8 and master. I opened a 4K UHD canvas (3840x2160) with a default 20px brush. Just crossing the canvas quickly with the mouse: on master, you can perceive a (small) lag of the actual drawing appearing after the mouse on the canvas (whereas 2.8 feels more instant, the drawing follows the mouse better). Testing higher resolution, the 8K UHD (7680x4320), the lag is even more noticeable, with the drawing really late behind a fast mouse (whereas on GIMP 2.8, it still feels instant). So yes, the difference is still quite noticeable, even though there was a lot of progress. On our use case, we work a lot for the screen, so lower resolutions/dimensions are OK, but when working for print purpose, you could much easily go into very high dimensions. So we should work on improving this.
Additional note: I believe this is not a blocker for 2.10 though. That's not horrible slowness as we demonstrate daily. Improving it before 2.10 would be nice though if you have an idea what to do, but not if that were to push the release for months.
Setting to blocker because the difference between painting in 2.8 and 2.10 is still quite huge. So that's quite a regression. Therefore if someone could have a look into this, it would be very good. Yet if that is the last blocker standing before a release, the "block" could be relaxed. It does not make it impossible to paint. Also finding what to speed up and how will not be an easy task.
One thing that might be worth considering is the mix of gfloat and gdouble declarations throughout GIMP (and possibly GEGL and BABL - which I haven't checked). For example the first of the following sections of code takes approx 860000 Windows performance counts whereas the second takes around 470000 performance counts: gfloat f1 = 1.0; gfloat f2 = 1.0; for (i = 0; i < 100000000; i++) { f2 += f1; f1 += 0.1; } gdouble d1 = 1.0; gdouble d2 = 1.0; for (i = 0; i < 100000000; i++) { d2 += d1; d1 += 0.1; } To start with I thought that this suggested that 32-bit FP number handling under 64-bit Windows 7 was slower but the problem was actually the lack of a typecast in the first example. Changing the first example to use f1 += (gfloat)0.1; makes the time for the first loop the same as for the second as the 0.1 was forced to be a float rather than the double assumed by gcc by default. This removed the instruction required to convert the (assumed) double to a float. 100 million was chosen for the loop in order to get some idea of how long the floating point operations would take for a brush with a size of 1000 pixels drawn 100 times - whilst this brush size is extreme and the drawing is glacially slow it should make any improvements to the speed more apparent. I also set the spacing to 1. I have attached the code for a function that reads the Windows performance counter into a 64-bit variable and code for a second function that gives an elapsed count from the original count to the current time. These values are then displayed using gimp_tool_message() when the mouse button is released. (Unless I am mistaken the current timing facilities in GIMP only resolve to the millisecond which is inadequate for these purposes). Not being familiar with Linux I don't know if there is a similar API call. The performance counter readings can be converted to a actual times but for the current purposes relative values are sufficient. Obviously the counts will be affected by any other programs that are running but repeating the tests gives an idea of where the time is being taken. Anyway my point is that if there are any loops that are repeated many times and mix gfloat and gdouble variables there will be a time penalty involved in converting between the two formats. Obviously using 64-bit floating point variables throughout would double the size of the working buffers but I am not sure why GIMP (and GEGL and BABL?) couldn't standardise on 32-bit floats throughout - do you really need the extra precision when specifying opacity for example? Maybe there are some places where the extra precision is required but perhaps this should be handled carefully. I have spent some time looking through the code attempting to determine where the time is going but as yet haven't had that eureka moment. I have found that do_layer_blend() is taking about 44% of the total time for drawing a line but am ashamed to admit that I can't determine which function is being called by layer_mode.function but did find that this itself was taking about 40% of the time taken by do_layer_blend().
Created attachment 366246 [details] Code to read the Windows performance counter Functions to read the Windows performance counter so allowing timing with a much higher resolution than the existing GIMP timing functions
> Functions to read the Windows performance counter so allowing timing with a much higher resolution than the existing GIMP timing functions About your attachment, I understand this is alternative (better) implementation of GIMP_TIMER_START/END on Win32. I'd appreciate a proper git-formatted patch (the part on gimp-utils.*) and I would just push it. > I have spent some time looking through the code attempting to determine where the time is going but as yet haven't had that eureka moment. Ok. If you find anything, we'd appreciate patches as well. The performance of painting tools is still a terrible regression compared to 2.8.
Created attachment 366258 [details] [review] Functions to read the Windows Performance Counter Functions to read the Windows Performance Counter to allow higher resolution timing than is possible with the current GIMP timing functions.
OK, hopefully the patch file is in the correct format. I would still be interested to know how to determine the function called in do_layer_blend() - I am editing a single layer with the mode set to Normal and the paintbrush mode set to Normal (legacy)
Review of attachment 366258 [details] [review]: Sorry, I think I wasn't clear. We can't have conditional public functions like here working only on one OS (well we theoretically can, but I would not allow this as this would make a mess!). But we can have a single function signature with different implementations depending on OS. :-) What I wanted is to have alternative implementations for GIMP_TIMER_START()/GIMP_TIMER_END(). Now I see you have different signatures, like you return a timer with your start function (why do you call it now() by the way? I find it misleading. start() would be better), but I don't mind that we change the signature of gimp_timer_start() and that we make them real (more complex) functions with different signatures. This being said, when I time code this way, I usually use g_get_monotonic_time(), which is supposed to get much more accurate time by using the system clock. I indeed see we don't use it in our macro. Have you checked the implementation to see if it doesn't do what you need already? > I would still be interested to know how to determine the function called in do_layer_blend() I can't really help you unless I look more closely on this code. I'm hoping you will do this. ;-)
Actually it looks like glib code uses also QueryPerformanceCounter() though the call is different (they don't do this GetProcAddress() call. Does that change anything?). https://git.gnome.org/browse/glib/tree/glib/gmain.c#n2746 If that is similar, I'd suggest our code uses g_get_monotonic_time(). Actually that's also always what I used until now and didn't even know we had these 2 GIMP_TIMER_* macros. :-)
I hadn't spotted that g_get_monotonic_time() existed. The main difference between a call of this function and the one that I was using is that g_get_monotonic_time(0 converts the time to usec - on my system the performance frequency call returns a value of 2929755(counts per second). This gives a value of 0.341 which is used to multiply the performance count in order to convert it to usec - thereby reducing the resolution by a factor of approximately 3 - probably this doesn't matter given the numbers that we are talking about. So ignore my functions and carry on as before.
Also for the records, I just checked the implementation of `g_timer_*()` because I was wondering if we should update GIMP_TIMER_* by using g_get_monotonic_time(). Well it turns out GTimer already uses this internally. So in the end, our macros already use the proper Win32 implementation (QueryPerformanceCounter() so, it would seem). But now let's get back to accelerating the painting in GIMP!
I'd like to get back to looking at this but since I pulled the latest sources from git I can't build GIMP. I get stuck with poppler-glib and poppler-data as noted on the GIMP users site - text posted there shown below. What am I doing wrong please? "I am trying to build the version of GIMP that requires poppler-glib. I have tried cloning from git git clone https://github.com/danigm/poppler.git I have then changed to the poppler folder and run ./autogen.sh --prefix=$PREFIX then ./configure --prefix=$PREFIX When I run make I get errors of the form that cairo_pattern_mesh_set_control_point is undefined and a suggestion that cairo_mesh_pattern_set_control_point be used instead. Any suggestions as to what I am doing wrong? Also any information on satisfying the poppler-data requirement. I have sorted the other recent requirements for an updated glib and mypaint-brushes but am struggling here."
I have answered on the gimp-user mailing list. Also you may want to rather ask these questions on the gimp-developer list. Also please next time, do not post this kind of unrelated comment on a bug report, this makes them very hard to understand in the future (this one has already more than 20 comments!). Ask on the dev mailing list, or on IRC. On last resort, you may even ask me by email directly (I'd really prefer you to NOT do this and try to find your answer on mailing list and IRC, but that's still better than polluting a bug report :P). I hope you understand. :-) Thanks!
Created attachment 366901 [details] [review] attempt to improve painting with colored brush Played with perf: Test case is R'G'B' u8 (2000x2000) paint brush Pepper brush, size 200, spacing 1, dynamics Pressure Opacity The attached patch seems to improve
Thanks Massimo! Doing quick tests, I could still reproduce brush lags on a 4K image, but I guess the improvement is simply subtler than that. I don't have time to review the code just now because I need to head out. I just noticed the "foo" commit message. Is it to say that you don't consider this pushable, and this is just a POC?
(In reply to Jehan from comment #25) > Thanks Massimo! > > Doing quick tests, I could still reproduce brush lags on a 4K image, but I > guess the improvement is simply subtler than that. Obviously the comparison is with 2.8. Does 2.8 have a lag on the same 4k image? You can always set up a brush/image big enough to see a lag. > I don't have time to review the code just now because I need to head out. I > just noticed the "foo" commit message. Is it to say that you don't consider > this pushable, and this is just a POC? Yes it is a WIP. To be less subjective I added code to time the stroke of a path with emulate dynamics and the paintbrush paint method and stroking a long path (zigzagging over the image) here goes from 10.x s to 5.x s.
> Does 2.8 have a lag on the same 4k image? No it doesn't, which was my main issue. Of course, I know I would see a lag at some point, but I chose 4K exactly because *on my computer at least* (on another computer, depending on its material configuration, it could be different of course), GIMP 2.8 shows no lag whereas master shows crazy lag. My steps were quite simple: just draw randomly and quickly like crazy for 30 seconds (basically draw circles continuously without ever raising the pen) on a 4K canvas and release: in GIMP 2.8, the strokes are following the pen; in master, strokes are lagging behind more and more, and after 30 seconds, when I raise the pen, I just the stroke being drawn for a few more seconds before it reaches the last pen position. > To be less subjective I added code to time the stroke of a path with > emulate dynamics and the paintbrush paint method and stroking a > long path (zigzagging over the image) here goes from 10.x s to 5.x s. Is this code fully automatized? Could we make it into a unit test? Of course, that would be a bit a problem since once again, the actual time taken for the stroke would also depends on the hardware. But maybe we could still set a minimum time taken by the stroke or something, so that we could easily detect if something has gone bad in painting? Or maybe, even better, we could just have a separate script (not one of the unit tests run by `make check`) and we would run it on a server (always the same) daily/weekly/whateverly on newer code to detect any issue brought by new code (or at the opposite an improvement).
If it depends on the length of the continuous stroke (in distance, or time, or whatever is the slowing factor), maybe we should focus on that too. Massimo's patch clearly improves painting and scaling individual dabs, but should be completely unrelated to at what time/distance/whatever in the stroke the function is called. We *also* need to look for something that becomes more expensive the longer the stroke is, perhaps a list that is traversed all the time or whatever becomes more expensive over time/distance/whatever. Maybe profiling allocations during a stroke is a good approach?
(In reply to Jehan from comment #27) > > Does 2.8 have a lag on the same 4k image? > > No it doesn't, which was my main issue. > > Of course, I know I would see a lag at some point, but I chose 4K exactly > because *on my computer at least* (on another computer, depending on its > material configuration, it could be different of course), GIMP 2.8 shows no > lag whereas master shows crazy lag. My steps were quite simple: just draw > randomly and quickly like crazy for 30 seconds (basically draw circles > continuously without ever raising the pen) on a 4K canvas and release: in > GIMP 2.8, the strokes are following the pen; in master, strokes are lagging > behind more and more, and after 30 seconds, when I raise the pen, I just the > stroke being drawn for a few more seconds before it reaches the last pen > position. > I'm on X11 and have color management enabled, rulers shown and SWM, I am using an external monitor, the laptop monitor is off. I must say that I see a lag in 2.8 like in 2.9 if not greater. So probably I'm not the one to evaluate the relative performance of 2.8 vs 2.9. I cannot count how many seconds it takes to end painting after 30 seconds of random circling. > > Is this code fully automatized? Could we make it into a unit test? > Of course, that would be a bit a problem since once again, the actual time > taken for the stroke would also depends on the hardware. But maybe we could > still set a minimum time taken by the stroke or something, so that we could > easily detect if something has gone bad in painting? > Probably I was not clear, I added a pair of GIMP_TIMER_START/END around gimp_paint_core_stroke_vectors in app/vectors/gimpvectors.c and saved an xcf with a long path. Then I built GIMP with/out the patch and compared the time it takes to Edit->Stroke Path (Stroke with a paint tool (Paintbrush), and Emulate brush dynamics) After having saved tool options (pepper brush, size 200, spacing 1)) it basically is <Ctrl>1 <Alt>E K <Enter> <Alt>S wait <Ctrl>Q D And even for this test, compared to 2.8 a patched 2.9 is faster.
> I must say that I see a lag in 2.8 like in 2.9 if not greater. > > So probably I'm not the one to evaluate the relative performance > of 2.8 vs 2.9. I cannot count how many seconds it takes to end painting > after 30 seconds of random circling. Didn't you say earlier that a same stroke went from taking ~ 10 sec to draw to 5 sec to draw after your patch? That looks like pretty perceptible lag to me. ;-) To be clear, my steps earlier were mainly *before* your patch. I made very quick tests with your patch in, but really couldn't make *real* time until now to sit down for an hour to really make a deeper comparison. Things are a bit hectic here and I may not make this time for a few days (but I'll try!). So don't take my previous statement at high value. I may have said shit (I should not make comments after quick tests! New lesson of the day!). In any case, you are likely the one to improve things since you already went further than I (I did try to profile GIMP, time various functions, test various code changes, but I was quite unsuccessful)! > And even for this test, compared to 2.8 a patched 2.9 is faster. At least, before your patch, 2.9 was slower than 2.8, right? We all agree on this part at least? :-) If so, my quick test was most likely shit. I'll find the time to test better.
(In reply to Massimo from comment #29) > After having saved tool options (pepper brush, size 200, spacing 1)) GIMP 2.9 painting slows down for me when I'm using a brush size larger than 256px, especially when combined with a small spacing. I don't mean painting slows down just a little, I mean it slows down to the point where I try to avoid using a large brush, in other words, I tailor "what I want to paint" to the need to avoid large brushes. Not all brushes get super-slow when the size is >256px. It seems to depend on the complexity of the stain and also on the dynamics - some dynamics make the brush faster than disabling all the dynamics options ("Dynamics Off") and some make the brush slower. I'm not sure if the default GIMP brushes have suitably large brushes for showing just how slow painting can get when the brush size is larger than 256px. I'm not even talking about super-large brushes, rather brushes around 512px. Setting GIMP 2.8 to match GIMP 2.9 in terms of "Environment" and available brushes, using Color Management, SWM, etc, and picking a large brush (close to 512px) with a fair amount of detail, using "Dynamics Off", and drawing large circles and rectangles on a 4000x4000px image, 2.9 is slower than 2.8. I don't have a timer, so I don't know how much slower.
FWIW, the dashboard can measure CPU activity time now, which can be used to get rough numbers for how long certain things take.
Massimo > sorry I haven't had the time to review your patch, but I will do. That's still high on my priority list. Don't hesitate to continue working on it though and propose an updated version if you said it was WIP. Comments from Mitch's comment 28 can be useful too. In any case, I pretty much trust what you will come up with. I have been stuck on this one forever.
P.S.: and as you said, 2.8 is the base reference, so if we could at least be as fast, that's good already to not make it a blocker. Of course, if we manage to improve it to be faster (maybe after 2.10 if not now), that would be even better. ;-) After all, the more we go, the bigger are the images worked on commonly. So it only makes sense to continuously improve painting.
(In reply to Jehan from comment #33) > Massimo > sorry I haven't had the time to review your patch, but I will do. > That's still high on my priority list. Don't hesitate to continue working on > it though and propose an updated version if you said it was WIP. > I said that it was a WIP because my initial intention was to avoid the modulus operation in this loop: https://git.gnome.org/browse/gimp/tree/app/paint/gimpbrushcore.c#n1718 and reading the code it seemed it was looping like to copy a pattern, but in practice 'width' seems to be at most 2 pixels wider than 'pixmap_width', so I don't know if it is useful to call babl_process and then copy its output or let babl_process convert the whole line. Anyway I lost interest in trying to optimize this code, the expectations are too high.
Only to refresh a bit the comparison between 2.8.22 and 2.9.9 commit 8d9580fd2b. My test was made on A3 format 300 ppi in both cases. I have made a stroke line using Shift key along with the major side. On 2.8.22 was necessary 0,5 seconds circa instead on 2.9.9 circa 2 seconds. The relation is almost 4 times faster on 2.8.22 than 2.9.9, perhaps, with a machine with more RAM or more processors the time could be shorter, but the relation between the two versions not necessarily would be changed. In 2.9.9 we have new templates until A0 300 ppi, so, perhaps is interesting begin to talk about which computer specifications are necessary to use an A0 format with a good performance to painting tasks.
Americo > have you tried the same comparison before and after applying Massimo's patch? If you could try, this would be useful. Thanks.
Created attachment 370003 [details] [review] attempt to improve painting with color brush I modified and split the previous patch in 3 commits attached to this comment as a patchset. So they can be reviewed/reverted/etc separately and I'm going to attach to the next comment a script-fu useful to show the effect of each change (they affect mostly or only painting/stroking with color brushes). On my system it goes from nearly 9 seconds to more then 4.67 seconds, whereas gimp-console-2.8 takes nearly 9 seconds, YMMV. Following is the copy paste from the terminal of the commands used, starting in the gimp source tree with master checked out, built and installed, to apply the patch on a new git branch, build each new commit, run the script, compute the hash of the output png and finally get back to the 'master' git branch, delete the 'test' git branch and run the same script with gimp-console-2.8 (where I have installed it). [massimo@pluto gimp (master)]$ git checkout -b test origin/master Branch test set up to track remote branch master from origin. Switched to a new branch 'test' [massimo@pluto gimp (test)]$ git am test-stroke.patch Applying: foo-fish Applying: foo-area-format Applying: foo-modulus [massimo@pluto gimp (test)]$ for j in {3..0..-1} ; do LC_ALL=C git checkout test~$j |& grep \ is && make -Capp >&/dev/null && LC_ALL=C app/gimp-console-2.9 -b '(load "test-stroke.scm")' && md5sum tmp-2.9.9.png; done HEAD is now at 829d44d900... app: make "color" parameter of gimp_palettes_add_color_history() const This is a development version of GIMP. Debug messages may appear here. 2.9.9 gimp-edit-stroke-vectors: ~ 9.258006096s Exported tmp-2.9.9.png plug-in 'script-fu' aborted before sending its procedure return values 746488e2c628f7fa3b5f6ce349474efc tmp-2.9.9.png HEAD is now at 5b09ef0a34... foo-fish This is a development version of GIMP. Debug messages may appear here. 2.9.9 gimp-edit-stroke-vectors: ~ 8.391999006s Exported tmp-2.9.9.png plug-in 'script-fu' aborted before sending its procedure return values 746488e2c628f7fa3b5f6ce349474efc tmp-2.9.9.png HEAD is now at d2976573c0... foo-area-format This is a development version of GIMP. Debug messages may appear here. 2.9.9 gimp-edit-stroke-vectors: ~ 5.473551989s Exported tmp-2.9.9.png plug-in 'script-fu' aborted before sending its procedure return values 746488e2c628f7fa3b5f6ce349474efc tmp-2.9.9.png HEAD is now at c0ee3c96c0... foo-modulus This is a development version of GIMP. Debug messages may appear here. 2.9.9 gimp-edit-stroke-vectors: ~ 4.676603079s Exported tmp-2.9.9.png plug-in 'script-fu' aborted before sending its procedure return values 746488e2c628f7fa3b5f6ce349474efc tmp-2.9.9.png [massimo@pluto gimp ((c0ee3c9...))]$ git checkout master && git branch -D test La precedente posizione di HEAD era c0ee3c96c0... foo-modulus Si è passati al branch 'master' Your branch is up-to-date with 'origin/master'. Ramo test eliminato (era c0ee3c96c0). [massimo@pluto gimp (master)]$ LD_LIBRARY_PATH=$HOME/prefix/lib64 LC_ALL=C ~/tmp/bin/gimp-console-2.8 -b '(load "test-stroke.scm")' 2.8.23 gimp-edit-stroke-vectors: ~ 9.428511143s Exported tmp-2.8.23.png [massimo@pluto gimp (master)]$
Created attachment 370004 [details] the script-fu script Notes: Besides timing the effects of the previous patchset, it shows that to obtain output similar to gimp-2.8 it is necessary to edit scripts. To do so, it uses (gimp-version) so when gimp-2.9.10 is out it has to be adapted
Thank Massimo! But I think you forgot to attach 2 or the 3 patches?
(In reply to Michael Natterer from comment #40) > Thank Massimo! But I think you forgot to attach 2 or the 3 patches? Well that's all. It is three commits in a single file git format-patch --stdout -3 > test-stroke.patch and when applied with git am they appear as three commits
*** Bug 794503 has been marked as a duplicate of this bug. ***
Review of attachment 370003 [details] [review]: Thanks. Squashed and pushed to master, with a few tweaks: commit f561231e1f59494907103b30bf3d14373c0986ae Author: Massimo Valentini <mvalentini@src.gnome.org> Date: Thu Jan 18 10:24:51 2018 +0100 app: various speedups to gimp_brush_core_color_area_with_pixmap() In gimp_brush_core_color_area_with_pixmap(), use the native area format when painting the brush, instead of always going through "RGBA float", and create the pixmap -> area fish only once, instead of once per scanrow. In gimp_brush_core_paint_line_pixmap_mask(), avoid modulus calculation at each pixel. See bug #694917. app/paint/gimpbrushcore.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------- 1 file changed, 72 insertions(+), 47 deletions(-)
For the record, I tested your test script: * both GIMP 2.8 and master takes about between 13 and 17 seconds to run it (so let's say an average around 15 seconds). * Once your script is in, GIMP master takes between 7 to 9 seconds. So there is clearly an improvement here! Thanks Massimo for the patches. BUT I still see huge lag when painting interactively on canvas! These tests make me think that probably the issue is not happening in painting core but in the display code which is simply not able to follow up with the canvas updates fast enough. Maybe trying to optimize the painting code was the wrong idea to begin with!
(When I say *wrong idea*, of course I mean: it's awesome it got optimized and more optimizations are welcome of course! ;p I just meant that this may not be the culprit code for the current slow painting issue)
To illustrate, I just made a video right now. First I draw with GIMP 2.8. As you can see, it is quite fast, and drawing is following the mouse closely enough. Then I draw with GIMP master (even after Massimo's commits). Nearly immediately the drawing is lagging behind the mouse pointer. At some point (around 0:44), I stop drawing and even put the cursor over the Brushes dock so that it is well visible outside the canvas, and we can see the canvas being drawn for more than 10 seconds afterwards! https://youtu.be/pBEqmvLLcQ0 Of course, in real life, you don't need to continuously scramble your pen over a full canvas of 5000px at high speed. Also the problem is not that terrible on Aryeom's computer, probably because it is a faster one (though I still have a quite ok computer, not at all an antiquity!). So this is not a blocker. Yet there is clearly a regression somewhere. Also considering the painting scripting tests, maybe the issue is indeed not in painting but only in canvas updating.
(In reply to Jehan from comment #46) > To illustrate, I just made a video right now. > > First I draw with GIMP 2.8. As you can see, it is quite fast, and drawing is > following the mouse closely enough. > > Then I draw with GIMP master (even after Massimo's commits). Nearly > immediately the drawing is lagging behind the mouse pointer. At some point > (around 0:44), I stop drawing and even put the cursor over the Brushes dock > so that it is well visible outside the canvas, and we can see the canvas > being drawn for more than 10 seconds afterwards! > > https://youtu.be/pBEqmvLLcQ0 > Although I don't experience such a lag with a 20 px "2. Hardness 050" brush on a 5000x4000 image, I ran perf to profile painting like in your video and it seems there is quite a lot of time spent invalidating (allocating/freeing/lookingup) a GEGL graph. One simple thing to try is in gegl_transform_get_invalidated_by_change which seems to cost more than 3% of "my test" run. Now GIMP uses a gegl:transform in its projection/layer graph, but often it is a nop because the layer offset is zero. This means the gegl:transform matrix is the identity, so here: https://git.gnome.org/browse/gegl/tree/operations/transform/transform-core.c#n990 it is useful to move the creation/destruction of the buffer sampler after the if (... is_identity (...)) return region; cause 'context_rect' is not used before. In my tests the cost reduces from >3% to 0.76% and gegl_buffer_sampler_new_at_level disappears from perf output. I don't notice the effect, but it should be correct and hardly worse than the current code.
(In reply to Massimo from comment #47) > > Although I don't experience such a lag with a 20 px "2. Hardness 050" brush > on a 5000x4000 image, I ran perf to profile painting like in your video and > it seems there is quite a lot of time spent invalidating > (allocating/freeing/lookingup) a GEGL graph. > Retrying today it seems invalidation cost is now reduced to a reasonable value. Looking at the code and perf output a possible improvement seems to be here: https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render.c#n92 Doesn't seem wise to always create this cairo buffer if it is only used in an internal else branch. https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render.c#n240 Perhaps it is possible to create it only when needed or perform directly the color_transform on the cairo_data
(In reply to Massimo from comment #47) > One simple thing to try is in gegl_transform_get_invalidated_by_change > which seems to cost more than 3% of "my test" run. ... > https://git.gnome.org/browse/gegl/tree/operations/transform/transform-core. > c#n990 > > it is useful to move the creation/destruction of the buffer sampler > after the ... > In my tests the cost reduces from >3% to 0.76% and > gegl_buffer_sampler_new_at_level disappears from perf output. This improvement has been applied to GEGL master: commit 34f217b7d4297c7c49e16fd27ff601e9a0d07c83 Author: Øyvind Kolås <pippin@gimp.org> Date: Wed Mar 28 00:07:21 2018 +0200 transform: sometimes do less work in gegl_transform_get_invalidated_by_change As spotted by massimo in bug #694917 - the construction of this object is not neccesary in the case where we return early (finding the context rect could further also be special cased to always avoid the overhead of constructing/destroying a gobject).
(In reply to Massimo from comment #48) > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render. > c#n92 > > Doesn't seem wise to always create this cairo buffer if it is only used > in an internal else branch. > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayshell-render. > c#n240 > > Perhaps it is possible to create it only when needed or perform directly > the color_transform on the cairo_data Indeed - and this isn't a cairo-buffer it is a linear GeglBuffer wrapper around the already used pixel_data fished out of a cairo surface - this temporary interfacing object can indeed be created/destroyed in a local scope. commit a7b0d55fc5f7d6569ed98154e0311c96225d1e22 Author: Øyvind Kolås <pippin@gimp.org> Date: Wed Mar 28 00:35:08 2018 +0200 app: only create GeglBuffer wrapper for cairo-data when used As suggested by massimo in bug #694917, move unconditional creation/destruction of a wrapper GeglBuffer object from top-level scope of the function to the single conditional scope where it is used.
Looking at GIMP code executed a lot while painting, there is the function rtree_insert(), which looks weird: https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n152 if (node->w >= w && w < 2 * node->w && node->h >= h && h < 2 * node->h) Unless I'm not yet well awake, it seems to me that the second and fourth conditions are true when the first and third are. Assuming w>0 (guaranteed by gimp_display_xfer_get_surface caller()) and node->w>0 and neglecting overflows: node->w >= w node->w + node->w >= w + node->w 2 * node->w > w + 0 2 * node->w > w That would mean that the second loop is identical to the first and it makes me wonder if there is a typo and the intent was if (w <= node->w && node->w < 2 * w ...) or something entirely different. Anyway if it is correct it should be possible to simplify it
I think that here: https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116 it is wrong/harmful to insert zero sized rectangles in the tree because they can never be selected and just make the list longer. So I'd say rtree_node_create() should just return NULL when w or h are 0 https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n70
Another function used a lot while painting is GEGL gegl_downscale_2x2_u8_nl() https://git.gnome.org/browse/gegl/tree/gegl/gegl-algorithms.c#n540 Here I think lut accesses for the alpha channel could/should be spared.
gimpdisplayxfer.c was sortof "donated" to us by Chris Wilson because were were apparently doing something quite wrong/inefficient. I think you are now the next best thing to an expert we have :)
(In reply to Massimo from comment #51) > Looking at GIMP code executed a lot while painting, there is the function > rtree_insert(), which looks weird: > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n152 > > if (node->w >= w && w < 2 * node->w && node->h >= h && h < 2 * node->h) > > Unless I'm not yet well awake, it seems to me that the second and fourth > conditions are true when the first and third are. > > Assuming w>0 (guaranteed by gimp_display_xfer_get_surface caller()) and > node->w>0 and neglecting overflows: > > node->w >= w > node->w + node->w >= w + node->w > 2 * node->w > w + 0 > 2 * node->w > w > > That would mean that the second loop is identical to the first and it makes > me wonder if there is a typo and the intent was > > if (w <= node->w && node->w < 2 * w ...) > > or something entirely different. That's the intent. The first pass was meant to be a "best fit". So only split this node if approximately the same size as the desired region. so node->w has to be wide enough for the request w, but not too wide: so w >= node->w && (node->w - w) < w (Now why only splitting into 2 children, I can't remember.)
(In reply to Massimo from comment #52) > I think that here: > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116 rtree_node_insert() ? Inside rtree_node_insert() we are asking whether or not we need to split this node. If the residual is empty, we just use the current node as the leaf. > it is wrong/harmful to insert zero sized rectangles in the tree > because they can never be selected and just make the list longer. > > So I'd say rtree_node_create() should just return NULL when w or h are 0 Yes, an empty request wasn't considered.
I'm sure there are much better algorithms for quick packing than the current rtree; the intent was to be able to reuse the SHM image for transport (avoiding having to overwrite active regions as that requires a RTT to the server) so rtree was just a means to parcel up the remaining space in the SHM image, and then being able to swap over to a new SHM image batching up the requests/replies.
(In reply to Chris Wilson from comment #56) > (In reply to Massimo from comment #52) > > I think that here: > > > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116 > > rtree_node_insert() ? > Yes, see: if (((node->w - w) | (node->h - h)) > 1) I mean when say (node->w - w) == 0 and (node->h - h) == 2 (2 | 0) > 1 is true and in the else branch node->children[1] = rtree_node_create (rtree, prev, node->x + w, node->y, ww, h); will insert an empty rectangle in front of the non empty rectangle which will never be used and always checked prior the non empty This seems to happen often when painting with a brush that will have to update often a different area of the same size. > Inside rtree_node_insert() we are asking whether or not we need to split > this node. If the residual is empty, we just use the current node as the > leaf. > > > it is wrong/harmful to insert zero sized rectangles in the tree > > because they can never be selected and just make the list longer. > > > > So I'd say rtree_node_create() should just return NULL when w or h are 0 > > Yes, an empty request wasn't considered.
(In reply to Massimo from comment #58) > (In reply to Chris Wilson from comment #56) > > (In reply to Massimo from comment #52) > > > I think that here: > > > > > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116 > > > > rtree_node_insert() ? > > > > Yes, see: > > if (((node->w - w) | (node->h - h)) > 1) > > I mean when say (node->w - w) == 0 and (node->h - h) == 2 Gotcha. You can skip the empty child, and skip the NULL.
I have an issue with my test in comment #36: I did again the comparison (GIMP 2.10 rc versus GIMP 2.8.22) with 'Ctrl + Shift' key to tracing a brushstroke along the major side of an A3 format and we have the same performance before/after of the last patches (Massimo and Pippin). But, when you are painting freely the performance was improved. So, when you are using 'Ctrl + Shift', is this a way different from the free painting?
Created attachment 370303 [details] [review] attempt to improve painting with small brushes, big images (In reply to Massimo from comment #52) > I think that here: > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n116 > > it is wrong/harmful to insert zero sized rectangles in the tree > because they can never be selected and just make the list longer. > > So I'd say rtree_node_create() should just return NULL when w or h are 0 > > https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n70 Attaching a patch to be tested that keeps the same rectangle packing behaviour, so to behave exactly as before for what concerns batching the updates, but should be lighter when looping to find the first good rectangle to use. In my test (painting 10 s, brush size 20 px, image size 5000x4000 RGBu8) 'perf' shows a reduction of cycles percentage from 4.47% to 1.89%, I don't see lag with it, I don't without it.
(In reply to jose americo gobbo from comment #60) > I have an issue with my test in comment #36: > I did again the comparison (GIMP 2.10 rc versus GIMP 2.8.22) with 'Ctrl + > Shift' key to tracing a brushstroke along the major side of an A3 format and > we have the same performance before/after of the last patches (Massimo and > Pippin). > But, when you are painting freely the performance was improved. > So, when you are using 'Ctrl + Shift', is this a way different from the free > painting? Your test exercises code that is used when painting, but it knows in advance the path to stroke and doesn't interleave drawing with giving the feedback. So it has something in common with free painting and something specific.
(In reply to Massimo from comment #61) > Created attachment 370303 [details] [review] [review] > attempt to improve painting with small brushes, big images > > Attaching a patch to be tested that keeps the same rectangle packing > behaviour, so to behave exactly as before for what concerns batching > the updates, but should be lighter when looping to find the first > good rectangle to use. Awesome work. I reviewed, and that looks good. Tested also and I don't see anything obviously broken. Let's push! > In my test (painting 10 s, brush size 20 px, image size 5000x4000 RGBu8) > 'perf' shows a reduction of cycles percentage from 4.47% to 1.89%, I > don't see lag with it, I don't without it. Unfortunately I don't see any sensible improvements on speed. But I'm sure we will find the issue at some point! :-)
Review of attachment 370303 [details] [review]: Per previous comment. commit 2aa4426d4f30f576bb04c90ef573fdc4a88ee046 (HEAD -> master, origin/master, origin/HEAD) Author: Massimo Valentini <mvalentini@src.gnome.org> Date: Thu Mar 29 16:50:15 2018 +0200 app: improvements in code executed a lot while painting. This keeps the same rectangle packing behaviour, so to behave exactly as before for what concerns batching the updates, but should be lighter when looping to find the first good rectangle to use. In rtree_insert(), some conditions in the if tests are implied by previous conditions. And therefore the 2 successive for loops are actually identical. In rtree_node_insert(), it is wrong/harmful to insert zero sized rectangles in the tree because they can never be selected and just make the list longer. So rtree_node_create() should just return NULL when w or h are 0. See bug 694917, comments 51 to 61. app/display/gimpdisplayxfer.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
(In reply to Massimo from comment #53) > Another function used a lot while painting is GEGL gegl_downscale_2x2_u8_nl() > > https://git.gnome.org/browse/gegl/tree/gegl/gegl-algorithms.c#n540 > > Here I think lut accesses for the alpha channel could/should be spared. I also see a lot of time being spend there. 5.40% is spent (including 5.28% in self!). So any optimization there could definitely be worth it!
The following commit improves painting speed by ~20%, although the actual number most likely depends pretty heavily on the specific details of the test performed: commit 49285463e6c0cc03c361902df4e3006fa135dde3 Author: Ell <ell_se@yahoo.com> Date: Sat Mar 31 08:13:36 2018 -0400 app: align projection update area to coarse grid When adding a rectangle to a projection's update area, align the rectangle to a coarse grid, to reduce the complexity of the overall area. We currently align the rectangle to a 32x32 grid, which seems to be a good tradeoff between the overhead of processing a complex area, and the overhead of processing a large area. app/core/gimpprojection.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) You can measure the impact using the CPU activity timer in the dashboard, by painting for a set amount of time, and seeing how long it takes to catch up.
(In reply to Massimo from comment #62) > (In reply to jose americo gobbo from comment #60) > > I have an issue with my test in comment #36: > > I did again the comparison (GIMP 2.10 rc versus GIMP 2.8.22) with 'Ctrl + > > Shift' key to tracing a brushstroke along the major side of an A3 format and > > we have the same performance before/after of the last patches (Massimo and > > Pippin). > > But, when you are painting freely the performance was improved. > > So, when you are using 'Ctrl + Shift', is this a way different from the free > > painting? > > Your test exercises code that is used when painting, but it knows in advance > the path to stroke and doesn't interleave drawing with giving the feedback. > > So it has something in common with free painting and something specific. I think very interestingly these improvements around the fast painting with small brushes with low spacing... but, we must also add tests with large size brushes (512 px, for instance) also with spacing between 3 to 10% in large formats as A3 and A2 with 300ppi. For instance, I did a qualitative comparison with big size brush (512 px, .gbr and .gih) in a 5000 x 7000 pixels between GIMP 2.8.22 and 2.10 rc1 and we have a great difference in the painting performance (painting freely on the canvas with 12.5% zoom). The qualitative data of my testing in a 7000 x 5000 pixels document was: Spacing GIMP 2.8.22 GIMP 2.10 RC1 1 slow with lag but acceptable slow++++ not acceptable 3 Regular with discrete lag slow+++ not acceptable 5 Fast, Excellent slow++ acceptable (+/-) 10 Fast, Excellent slow+ acceptable 20 Fast, Excellent slow acceptable --- To compare qualitatively painting performing is possible to say that GIMP 2.8.22 is working with spacing 1% as GIMP 2.10 rc1 is working with spacing 20%.
Created attachment 370379 [details] Brush 512 px used on my qualitative testing
I must do a short note about https://bugzilla.gnome.org/show_bug.cgi?id=768118 (Spacing Dynamic) and these tests: I think useful verify if is possible to fix this bug to test the spacing effects dynamically, too.
(In reply to jose americo gobbo from comment #68) > Created attachment 370379 [details] > Brush 512 px used on my qualitative testing With brush/images that big it might be worth testing increasing these values: https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n58 which is already possible setting this env variable https://git.gnome.org/browse/gimp/tree/app/display/gimpdisplayxfer.c#n212
How does one set GIMP_DISPLAY_RENDER_BUF_SIZE? I'm guessing with something like: GIMP_DISPLAY_RENDER_BUF_SIZE=XXX export GIMP_DISPLAY_RENDER_BUF_SIZE assuming one is using a script to start GIMP. So what is the default GIMP_DISPLAY_RENDER_BUF_SIZE, and what would be a good value to try?
Created attachment 370382 [details] Screenshot of perf result (In reply to Ell from comment #66) > The following commit improves painting speed by ~20%, although the actual > number most likely depends pretty heavily on the specific details of the > test performed: > > commit 49285463e6c0cc03c361902df4e3006fa135dde3 I believe it may be faster indeed though I still see some clear lag (more than 2.8) and also right now I am not testing with the exact same setup (I don't have my small tablet at hands with which I did previous tests; graphics tablets generate a lot of events and therefore it may make lags worse, I guess). So I'll test tonight again with the tablet in similar conditions. Looking at the perf report, I can also see 2.5% is spent in self of gimp_operation_normal_process_sse4(), and 1.49% in self of conv_rgb8_rgbaF() (babl). gegl_downscale_2x2_u8_nl() is still one of the biggest blocks with 5.1% now in self. The biggest cost altogether is g_hash_table_lookup(), with 5.61% in self, but obviously this is called in various places. It is mostly costly from cache_lookup() (GEGL), i.e. 2.12%. combine_paint_mask_to_canvas_mask() also is costy with 1.91% of cycles. I am attaching a screenshot of the perf result. I would upload the `perf.data` file itself, but after a test, it doesn't seem like this file contains the debug symbols and the report is quite useless without. If you need my report of running perf and if you know what I should do/upload for you to get useful info from it, tell me what/how please. ;-) Hope these info help!
(In reply to Elle Stone from comment #71) > How does one set GIMP_DISPLAY_RENDER_BUF_SIZE? I'm guessing with something > like: > > GIMP_DISPLAY_RENDER_BUF_SIZE=XXX > export GIMP_DISPLAY_RENDER_BUF_SIZE > > assuming one is using a script to start GIMP. > > So what is the default GIMP_DISPLAY_RENDER_BUF_SIZE, and what would be a > good value to try? I did not try but it should be GIMP_DISPLAY_RENDER_BUF_SIZE=widthxheight the current value would be GIMP_DISPLAY_RENDER_BUF_SIZE=256x256 values worth trying would be power of 2, so 512x256 512x512 or 1024x256
Created attachment 370386 [details] [review] app: do not copy needlessly paint_mask GimpTempBuf. Looking at combine_paint_mask_to_canvas_mask() code, it looks like we were calling gimp_temp_buf_copy() needlessly. I don't see any kind of write operation, it's read-only. Is there any reason why we copy the temp buffer first? Could someone review and tell me if that makes sense? With this change, perf report of combine_paint_mask_to_canvas_mask() are a bit better (still 1.5% but better than before).
Review of attachment 370386 [details] [review]: Ok. Ell reviewed on IRC. So I push. commit 04a798d786617257ffa2b75ad24a14afd7b5c087 (HEAD -> master, origin/master, origin/HEAD) Author: Jehan <jehan@girinstud.io> Date: Sat Mar 31 16:36:55 2018 +0200 app: do not copy needlessly paint_mask GimpTempBuf. We are not doing any write operation on this mask data so copying all the data just to read it and unreffing it in the end is only a cost on performance. See also bug 694917. app/paint/gimppaintcore-loops.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
Only to illustrate the performance between 2.8.22 and 2.10 rc1 follow two short screencast videos: 1) GIMP 2.8.22 > https://www.youtube.com/watch?v=ubIlFxM4vgQ 2) GIMP 2.10 rc1 > https://www.youtube.com/watch?v=ML_DVEUOnK4 Document 7000 x 5000 pixels Paintbrush tool Brush 512 px No Dynamics Mouse
Created attachment 370388 [details] cache_lookup() called from various places. Another screenshot detailing most expensive call stacks of the cache_lookup(), happening either in paint core, projection flush, display shell, etc. In case it helps.
(In reply to jose americo gobbo from comment #76) > Only to illustrate the performance between 2.8.22 and 2.10 rc1 follow two > short screencast videos: > > 1) GIMP 2.8.22 > https://www.youtube.com/watch?v=ubIlFxM4vgQ > 2) GIMP 2.10 rc1 > https://www.youtube.com/watch?v=ML_DVEUOnK4 > > Document 7000 x 5000 pixels > Paintbrush tool > Brush 512 px > No Dynamics > Mouse Yep, definitely not good.
Americo, how large are your tile caches on 2.8 and 2.10?
In both cases, I used 1Gib. But I have tested also with 3Gib, but the performance doesn't change or rather is not possible see any difference.
Created attachment 370417 [details] Plug-in to test stroke time For testing paint speed, I just wrote a quick&dirty plug-in which will simply stroke the current path with current brush and foreground color. In the end, it will display the time spent (look in the status bar). Plug-in entry is found under Debug > Test stroke path.
Created attachment 370418 [details] 4000x5000 canvas with a big active vectors So I was testing with this 4000x5000 image just now, which has a Paths layer with a lot of vectors. My tests on GIMP 2.10 (brush Hardness 050 and size 20) were returning between 6 to 9 seconds. The same XCF loaded on 2.8 (same brush and size) were returning in 2 to 4 seconds.
Created attachment 370420 [details] 4000x5000 canvas with a big active vectors So I was testing with this 4000x5000 image just now, which has a Paths layer with a lot of vectors. My tests on GIMP 2.10 (brush Hardness 050 and size 20) were returning between 6 to 9 seconds. The same XCF loaded on 2.8 (same brush and size) were returning in 2 to 4 seconds.
Created attachment 370421 [details] Script to run the same test in console mode. And for the record, running this same plug-in on the same image, same brush, etc. with gimp-console runs in about 2 seconds in GIMP 2.8 and 3 seconds in master (test repeated many times as were my previous ones): > $ gimp-console-2.8 -b '(load "test-stroke2.scm")(gimp-quit TRUE)' > Test stroke path-Warning: Stroke done in 2.12259793282 seconds > $ app/gimp-console-2.10 -b '(load "test-stroke2.scm")(gimp-quit TRUE)' Test stroke path-Warning: Stroke done in 3.19817018509 seconds So for non-interactive running, painting is quite fast (though still a bit slower than 2.8 in this test, but would be acceptable). This same action takes about 5/6 more seconds with the GUI on master (whereas on 2.8, time spent is barely different in console or with the GUI)!
I am not able to enable these scripts :( But, I have tested by hand directly using a 7000 x 5000 px document with a path in 7000 px side. The brush used was of attachment 370379 [details] (512 px) with 'Dynamics Off'. With 2.10 rc1, updated now, the time elapsing was 4.45 sec. With 2.8.22 the time elapsing was 1 sec. I suggest to use big brush, 512 px or more, because painting with small brush in a big canvas is not usual when the zoom is 33% or less. And the point that we must compare is how is working 2.10 rc with big size brushes when we compare with GIMP 2.8.22.
I agree with Americo that these tests and improvements are very nice as far as they go but large brushes are much more the problem than small brushes. My sense is that painting using GIMP-2.9/2.10 already is faster than it used to be. And getting large brushes to work smoothly on large canvases with smaller spacings might not happen for 2.10, and maybe that's not even the goal of this thread. But for people with fast machines who want to paint using GIMP-2.10, the real sticking point likely won't be painting with small brushes with large and larger spacings, but rather painting with medium and large brushes with small and smaller spacings. For example, I tried replicating Americo's results with the vbr brush from his videos in Comment 76 . On my machine - intel core i7 with 32gb ram - the zigzags were easy and fast to draw at the original spacing of around 25. But change to 12 and drawing slows down. Change to 6, and it's very, very slow - too slow to be useable. I did try: GIMP_DISPLAY_RENDER_BUF_SIZE=widthxheight comparing 128x128 with 512x512, but really didn't see any practical difference. Maybe the syntax I uses is wrong? And maybe people with "reasonably much RAM" aren't setting up the relevant items in Preferences correctly? I know I don't have a clue whether my settings are correct or not, and also don't know how the swap and tmp files play into helping GIMP be faster.
(In reply to Elle Stone from comment #86) > I agree with Americo that these tests and improvements are very nice as far > as they go but large brushes are much more the problem than small brushes. Well both small and large brushes are a problem. Probably using large brushes is even slower which is likely the same bug exacerbated. :-/ > My sense is that painting using GIMP-2.9/2.10 already is faster than it used > to be. And getting large brushes to work smoothly on large canvases with > smaller spacings might not happen for 2.10, and maybe that's not even the > goal of this thread. But for people with fast machines who want to paint > using GIMP-2.10, the real sticking point likely won't be painting with small > brushes with large and larger spacings, but rather painting with medium and > large brushes with small and smaller spacings. All use cases are possible. It is not because you work on bigger canvases that it means you also use bigger brushes. You may work with accuracy and details on very big images. ;-) Anyway what I mean is that I don't think it is worth trying to divide into separate problems. When we will find the right optimization, it will improve any use case because there is a much larger problem than anything specific to brush size or brush spacing (if we still have a problem, this time only on big size/small spacing after we fix this, then we can focus on more specific issues; my guess though is that it will mostly be fixed or at least greatly improved in same time). > And maybe people with "reasonably much RAM" aren't setting up the relevant > items in Preferences correctly? I know I don't have a clue whether my > settings are correct or not Yeah I'm not sure if there is any kind of "intelligence" for the default value (do we have any code setting reasonable default settings depending on hardware?), but if we don't have, would be nice to have (at some point). > and also don't know how the swap and tmp files > play into helping GIMP be faster. Well mostly the rule is: try never to swap! I can tell you will know when you swap. This suddenly becomes extremely extreeeemeeeelyyyy slow. :P Also now you have the dashboard where you can see when you start to swap. Anyway here this is not the problem. We have slowness without swapping.
(In reply to Jehan from comment #87) > (In reply to Elle Stone from comment #86) > > My sense is that painting using GIMP-2.9/2.10 already is faster than it used > > to be. And getting large brushes to work smoothly on large canvases with > > smaller spacings might not happen for 2.10, and maybe that's not even the > > goal of this thread. But for people with fast machines who want to paint > > using GIMP-2.10, the real sticking point likely won't be painting with small > > brushes with large and larger spacings, but rather painting with medium and > > large brushes with small and smaller spacings. > > All use cases are possible. It is not because you work on bigger canvases > that it means you also use bigger brushes. You may work with accuracy and > details on very big images. ;-) Well, yes, I suspect most people do both! I settled on roughly 4000px by 6000px as the largest size canvas that my system can comfortably handle when using GIMP. I use large brushes (>=512px) mostly for background washes, and small to very small brushes for sketching, and other size brushes for other tasks, etc. > Anyway what I mean is that I don't think it is worth trying to divide into > separate problems. When we will find the right optimization, it will improve > any use case because there is a much larger problem than anything specific > to brush size or brush spacing (if we still have a problem, this time only > on big size/small spacing after we fix this, then we can focus on more > specific issues; my guess though is that it will mostly be fixed or at least > greatly improved in same time). > That sounds good!
The following commit in master, and the referenced GEGL commit, reduce the amount of data processed by the GEGL zoom tile handler (corresponding to the gegl_downscale_foo() calls), which improves painting performance when zoomed out. The actual impact varies depending on the specific use case. commit 4acdc7392af74d09a281f334d06841e32c006547 Author: Ell <ell_se@yahoo.com> Date: Sat Mar 31 11:49:01 2018 -0400 app: use gegl_tile_handler_damage_rect() in TileHandlerValidate Use the recently-added gegl_tile_handler_damage_rect() function during GimpTileHandlerValidate invalidation, instead of manually voiding the tile pyramid. This function avoids voiding mipmapped tiles entirely when only a subarea of the tile needs to be redrawn. See GEGL commit 3210f4ffc3c569a2acd9483811cb141070112bc6. app/gegl/gimptilehandlervalidate.c | 28 +--------------------------- app/gegl/gimptilehandlervalidate.h | 1 - 2 files changed, 1 insertion(+), 28 deletions(-)
Created attachment 370465 [details] Graph Comparison for 7000 px brushstroke Test made with brush, attachment 370379 [details] (512px), No Dyna, horizontal brustroke along 7000 px.
I did another test only with 2.10 rc1. To compare the speedy painting with 8bit (perceptual) and 32f (linear). The test settings was that: Minimal number of undo levels: 1 Maximum undo memory: 10 Mb Tile cache size: 4Gib Maximum new image size: 500 Mb Number of threads to use: 1 OpenCl disabled --- Processor Quadcore Intel with 8Gib memory Brush 512 px No Dyna --- With this settings, GIMP spent to trace a brushstroke long 3000 px: 8bit Perceptual > 6s 32f Linear > 18s
> With this settings, GIMP spent to trace a brushstroke long 3000 px: > 8bit Perceptual > 6s (Normal Legacy blend) > 32f Linear > 18s (Normal Legacy blend) with 32f linear and Default Normal blend > 6s
Americo, when you say you use 2.10 rc1, are you using master and updating regularly GIMP? Because there were tremendous improvements between the RC1 released and master already. I can definitely see them when just testing and now my test from comment 82 takes a little bit under 6 seconds. This is still twice slower than 2.8 (but used to be 3 times slower, so…). Now of course, with big brushes and small spacing, your tests suggest that the difference of speed increases. But I just wanted to make sure that is done on master. Master is still not as good as was 2.8, but code is moving a lot so any test has to be done on master code, otherwise this is not really meaningful. Thanks!
(In reply to Jehan from comment #93) > Americo, when you say you use 2.10 rc1, are you using master and updating > regularly GIMP? Because there were tremendous improvements between the RC1 > released and master already. Yes, the 2.10 rc1 was the last updated, a half hour ago. To complete the test: 8bit Perceptual > Legacy Normal Blend = 6s 8bit Perceptual > Default Normal Blend = 8s 8bit Linear > Legacy Normal Blend = 18s 8bit Linear > Default Normal Blend = 5s (seems a bit more fast).
Created attachment 370504 [details] Perf output with bigger canvas and bigger brush Ok so I decided to up the game, with a 10000x8000 canvas and a bigger than 300px brush. That was crazy slow, as expected (I compared also with 2.8: it had some lag, but nothing compared to master). Here is the output of perf running on GIMP. As you can see, nearly 20% is in self of combine_paint_mask_to_canvas_mask()! And 14% in gimp_operation_normal_process_sse4()! Slightly lesser offender (but still!) are: canvas_buffer_to_paint_buf_alpha() with 6,5%, conv_rgba8_rgb8(), conv_rgb8_rgba8(), conv_rgbaF_rgba8() and conv_rgba8_rgbaF(), all around 5% (so many conversions occurring!). Then gegl_downscale_2x2_u8_rgba() still around 4.4%. If we managed to optimize these all, or maybe cut the calls (are these called more often than they should?), that would definitely improve the perfs.
Please also profile when drawing on a buffer with an alpha channel - based on what you say here it seems like this is being done on a layer with no alpha, and that more direct babl fast paths are already available for the case of painting on layers with alpha.
Created attachment 370505 [details] Profile on a layer with alpha. (In reply to Øyvind Kolås (pippin) from comment #96) > Please also profile when drawing on a buffer with an alpha channel - based > on what you say here it seems like this is being done on a layer with no > alpha, and that more direct babl fast paths are already available for the > case of painting on layers with alpha. Indeed that was a single layer image with no opacity. I did a new profiling when painting on an alpha layer. The 2 same functions are still the biggest offenders but reversed with gimp_operation_normal_process_sse4() being now 18% of the cycles and combine_paint_mask_to_canvas_mask() "only" 11.2%. I also realize that we can get even more details on the lines where the most cycles are spent in these functions. So I give this information too by making a 3-page PDF, where the 2 additional pages give these details for the top functions.
*** Bug 794301 has been marked as a duplicate of this bug. ***
*** Bug 782751 has been marked as a duplicate of this bug. ***
Some paint loops are now parallelized in master, which improves performance for big brushes, especially with expensive modes or complex dynamics: commit c8d4c079a204dded5429f336e394e492a581226a Author: Ell <ell_se@yahoo.com> Date: Wed Apr 4 16:57:35 2018 -0400 app: parallelize gimpbrush-transform.cc Use gimp_parallel_distribute_foo() to parallelize the brush transform functions. app/core/gimpbrush-transform.cc | 683 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------ 1 file changed, 365 insertions(+), 318 deletions(-) commit 3df757ffd7311e621817aa7aa66e61efecf80de6 Author: Ell <ell_se@yahoo.com> Date: Wed Apr 4 17:02:34 2018 -0400 app: parallelize gimppaintcore-loops.cc Ditto. app/paint/gimppaintcore-loops.cc | 430 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------ 1 file changed, 229 insertions(+), 201 deletions(-) commit cb239e60f6ef91ba66373a285ab2f5cbeb700eb8 Author: Ell <ell_se@yahoo.com> Date: Wed Apr 4 17:12:21 2018 -0400 app: parallelize gimp-gegl-loops.cc Ditto. app/gegl/gimp-gegl-loops.cc | 827 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------- 1 file changed, 448 insertions(+), 379 deletions(-)
Created attachment 370537 [details] Artefacts on painting with new parallelized code. I see some weird artefacts with the multi-threading. Cf. the screenshot using the classical "Hardness 050". The strokes should be all smooth (if you set threads to 1 in the preferences, you get this back), but they are like "dirty" with some vertical strokes.
I did again the test with 7000 px brushstroke with 512px brush (no dyna) and we have now of 22% faster.
(In reply to Jehan from comment #101) > Created attachment 370537 [details] > Artefacts on painting with new parallelized code. > > I see some weird artefacts with the multi-threading. Cf. the screenshot > using the classical "Hardness 050". The strokes should be all smooth (if you > set threads to 1 in the preferences, you get this back), but they are like > "dirty" with some vertical strokes. Should be fixed in master: commit 082a6404c9613beeef4feafdb91ff43994c096dc Author: Ell <ell_se@yahoo.com> Date: Thu Apr 5 02:14:29 2018 -0400 app: fix iterator area in canvas_buffer_to_paint_buf_alpha() The entire ROI was processed in all threads, instead of only the thread-specific area, causing artifacts. app/paint/gimppaintcore-loops.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
More parallelization. This mostly impacts painting with dynamics: commit 4336e2e52a5bd5dfdc31ccdf2e0594cf93b3ca90 Author: Ell <ell_se@yahoo.com> Date: Thu Apr 5 17:30:51 2018 -0400 app: parallelize gimpbrushcore-loops.cc Use gimp_parallel_distribute_foo() to parallelize the brush core loops. app/paint/gimpbrushcore-loops.cc | 204 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------- 1 file changed, 126 insertions(+), 78 deletions(-)
*** Bug 795017 has been marked as a duplicate of this bug. ***
I did again a new speed performance test with the last GIMP 2.10 rc1: Video 1 with GIMP 2.8.22: https://www.youtube.com/watch?v=-LJhXkOFTVc Video 2 with GIMP 2.10 rc1: https://www.youtube.com/watch?v=4ZvF0KRV-tk The document size is 7000 x 2500 px Brush Size: 512 px Spacing: 1 Dynamics Off --- When I compare GIMP 2.8.22 with the last GIMP 2.10 rc1 with these conditions: To have the similar performance on GIMP 2.10 rc1 is necessary to use spacing equal 10 instead spacing 1. The time elapsing to paint the brushstroke along the 7000 px: GIMP 2.8.22: 1s circa GIMP 2.10 rc1: 10s circa
Painting is now done in a separate thread, so that updating the display doesn't stall it: commit 8e7a34297fe993c1e8de9d85ecd61c73a1e4afd4 Author: Ell <ell_se@yahoo.com> Date: Sun Apr 8 09:28:23 2018 -0400 app: move painting to a separate thread Add gimppainttool-paint.[ch], which takes care of painting during motion events in GimpPaintTool. Perform the actual painting in a separate thread, so that display updates, which can have a significant synchronization overhead, don't stall painting. Allow specific paint tools to opt-out of a separate paint thread, and avoid it in GimpMybrushTool, since it doesn't seem to work. The separate paint thread can be explicitly disabled by setting the GIMP_NO_PAINT_THREAD environment variable. app/tools/Makefile.am | 2 + app/tools/gimpmybrushtool.c | 5 ++- app/tools/gimppainttool-paint.c | 312 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ app/tools/gimppainttool-paint.h | 30 +++++++++++++ app/tools/gimppainttool.c | 19 ++++---- app/tools/gimppainttool.h | 2 + 6 files changed, 359 insertions(+), 11 deletions(-) This should be a pretty significant boost, at least in display-bound cases. I'm seeing 2.10 being about as fast as, or faster than, 2.8 is many cases now (under equivalent conditions). Fingers crossed that you guys report similar results -- please test! And another display-related optimization: commit db50c72c24fffd07d3bdcadc6a633b864d065fac Author: Ell <ell_se@yahoo.com> Date: Sun Apr 8 09:16:53 2018 -0400 app: align display paint area to a coarse grid Align rectangles added to the display paint area, in gimp_display_paint_area(), to a coarse grid, to reduce the complexity of ther overall area. This is similar to commit 49285463e6c0cc03c361902df4e3006fa135dde3, however the alignment happens in display space, instead of image space. app/display/gimpdisplay.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
Created attachment 370661 [details] Speed Performance last update GIMP 2.10 rc1 (commit 8e7a34297fe) I did a test more extensive and following the Pippin advice (testing with alpha and not). The improvement is great on 8i precision comparing with previous 2.10 rc1 releases, but yet is great the difference with the 2.8.22 performance, circa 5 times slower. Instead with 32f precision with alpha, the performance grows and is 3 times circa slower than 2.8.22 performance.
Created attachment 370662 [details] Graph Speed Performance 2.10 rc1 (commit 8e7a34297fe)
(In reply to jose americo gobbo from comment #109) > Created attachment 370662 [details] > Graph Speed Performance 2.10 rc1 (commit 8e7a34297fe) All data of this test for the different precisions are referred only to 'legacy' blend mode.
Created attachment 370668 [details] Spreadsheet of last test revised... adding some notes (commit 8e7a34297fe)
Created attachment 370706 [details] Default x Legacy with 32f and 8i Graph Comparison between Default and Legacy Blends to 32f (Perceptual and Linear) and 8i (Perceptual). Conditions of the test: Brush size 512 px (attachment 370379 [details]) 7000 x 2500 px document Brushstroke 7000 px along the document width (with Ctrl+Shift on Paintbrush) Dynamics Off 4 processors / 4 threads (Gimp 2.8.22/GIMP 2.10 rc1) Memory 8 Gib Tile Cache 4 Gib
Created attachment 370707 [details] Spreadsheet of last test revised GIMP commit f77edcdf193e The spreadsheet has the complete data of the tests with Legacy and Default Blend modes verified with 32f (Perceptual and Linear) and 8i (Perceptual). I verified that: 8i --- + With Perceptual/Legacy is faster than Perceptual/Default blend mode. 32f --- + With Perceptual, the Legacy blend is faster than Default blend. + With Linear, the Default blend is faster than Legacy blend mode. + The Linear/Default and Perceptual/Legacy blends are very similar as speed performance. Test conditions: Brush size 512 (attachment 370379 [details]) 7000 x 2500 px document Brushstroke 7000 px along document width (Ctrl+Shift with Paintbrush) Dynamics Off 4 processors / 4 threads (GIMP 2.8.22/GIMP 2.10 rc1) 8 Gib Memory 4 Gib Tile Cache
When it comes to simple painting using no dynamics, and cheap blend modes, there's a good chance we're, at least partially, memory bound. The following commit should improve painting speed in such cases (I'm seeing at least ~10% speedup, but this number can probably vary depending on circumstances): commit f2a1fd5bf0e170e79bc3c2f1f162d2f73ecb9747 Author: Ell <ell_se@yahoo.com> Date: Sat Apr 14 19:02:21 2018 -0400 app: refactor gimppaintcore-loops to coalesce iteration The gimppaintcore-loops functions perform very little actual computational work (in case of do_layer_blend(), at least for simple blend modes), which makes the cost of buffer iteration, and memory bandwidth, nonnegligible factors. Since these functions are usually called in succession, acessing the same region of the same buffers, using the same foramts, coalescing them into a single function, which performs all the necessary processing in a single step, can improve performance when these functions are the bottleneck. Add a gimp_paint_core_loops_process() function, which does just that: it takes a set of algorithms to run, and a set of parameters, and performs all of them in one go. The individual functions are kept for convenience, but are merely wrappers around gimp_paint_core_loops_process(). Be warned: the implementation uses unholy C++ from outer space, in order to make this (sort of) managable. See the comments for more details. app/paint/gimppaintcore-loops.cc | 1494 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------- app/paint/gimppaintcore-loops.h | 116 ++++++---- app/paint/gimppaintcore.c | 75 +++---- 3 files changed, 1315 insertions(+), 370 deletions(-)
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gimp/issues/454.