GNOME Bugzilla – Bug 160451
[Enhancement] Improve speed of burn compositing by 3 - Patch included
Last modified: 2005-03-05 17:35:23 UTC
By using a precalculated table, the burn compositing generic function is 3X or + faster than mmx or sse. Caveat, when using specific optimization (mmx,sse ...) burn compositing is slower ... Here is test on a duron mobile 1Ghz : fredo@miss:/mnt/cds/gimpcvs/gimp/app/composite ¤ ./gimp-composite-mmx-test 21:48 Running gimp_composite_mmx tests... addition_va8_va8_va8 2.2477 0.9773 2.3000 addition_rgba8_rgba8_rgba8 6.7972 2.4825 2.7380 burn_rgba8_rgba8_rgba8 5.9359 14.0415 0.4227* darken_rgba8_rgba8_rgba8 6.3550 2.2009 2.8875 difference_rgba8_rgba8_rgba8 7.2503 5.1972 1.3950 grain_extract_rgba8_rgba8_rgba8 5.9045 2.2010 2.6827 grain_merge_rgba8_rgba8_rgba8 5.3515 2.0103 2.6620 lighten_rgba8_rgba8_rgba8 6.2262 2.0275 3.0708 multiply_rgba8_rgba8_rgba8 4.6579 2.0282 2.2965 scale_rgba8_rgba8_rgba8 3.4975 1.5678 2.2309 screen_rgba8_rgba8_rgba8 6.0242 2.0430 2.9486 subtract_rgba8_rgba8_rgba8 4.2412 2.1595 1.9640 swap_rgba8_rgba8_rgba8 6.7306 2.3925 2.8132 fredo@miss:/mnt/cds/gimpcvs/gimp/app/composite ¤ ./gimp-composite-sse-test 21:52 Running gimp_composite_sse tests... addition_rgba8_rgba8_rgba8 5.2444 2.1373 2.4537 burn_rgba8_rgba8_rgba8 5.2905 11.4912 0.4604* darken_rgba8_rgba8_rgba8 6.1715 2.2014 2.8035 difference_rgba8_rgba8_rgba8 6.4453 2.1421 3.0088 grain_extract_rgba8_rgba8_rgba8 5.5560 1.9910 2.7906 grain_merge_rgba8_rgba8_rgba8 5.3849 1.9868 2.7104 lighten_rgba8_rgba8_rgba8 6.2213 2.1318 2.9183 multiply_rgba8_rgba8_rgba8 4.6002 2.0144 2.2837 scale_rgba8_rgba8_rgba8 3.5357 1.5478 2.2844 screen_rgba8_rgba8_rgba8 5.9192 2.0715 2.8574 subtract_rgba8_rgba8_rgba8 4.1289 2.2879 1.8047 swap_rgba8_rgba8_rgba8 5.9772 1.9347 3.0895 And here is the patch : Index: app/composite/gimp-composite-generic.c =================================================================== RCS file: /cvs/gnome/gimp/app/composite/gimp-composite-generic.c,v retrieving revision 1.26 diff -p -u -u -p -r1.26 gimp-composite-generic.c --- app/composite/gimp-composite-generic.c 30 Oct 2004 18:42:09 -0000 1.26 +++ app/composite/gimp-composite-generic.c 4 Dec 2004 19:42:26 -0000 @@ -69,6 +69,7 @@ static guchar add_lut[511]; static gint32 random_table[RANDOM_TABLE_SIZE]; +static guchar burn_lut[256][256]; /* * @@ -873,18 +874,10 @@ gimp_composite_burn_any_any_any_generic const guint alpha = (has_alpha1 || has_alpha2) ? MAX(bytes1, bytes2) - 1 : bytes1; guint b; - /* FIXME: Is the burn effect supposed to be dependant on the sign of this - * temporary variable? */ - gint tmp; - while (length--) { for (b = 0; b < alpha; b++) - { - tmp = (255 - src1[b]) << 8; - tmp /= src2[b] + 1; - dest[b] = (guchar) CLAMP(255 - tmp, 0, 255); - } + dest[b] = burn_lut[src1[b]][src2[b]]; if (has_alpha1 && has_alpha2) dest[alpha] = MIN(src1[alpha], src2[alpha]); else if (has_alpha2) @@ -1454,7 +1447,28 @@ gimp_composite_generic_init (void) random_table[i] = g_rand_int (gr); g_rand_free (gr); + + /* generate a table for burn compositing */ + int a,b; + + /* FIXME: Is the burn effect supposed to be dependant on the sign of this + * temporary variable? */ + int tmp; + + for (a=0;a<256;a++) + for (b=0;b<256;b++) + { + tmp = (255 - a) << 8; + tmp /= b + 1; + tmp = (255-tmp); + if ( tmp<0 ) + tmp=0 ; + if ( tmp>255 ) + tmp=255 ; + burn_lut[a][b] = tmp; + } + /* generate a table for saturate add */ for (i = 0; i < 256; i++) add_lut[i] = i;
It would be nice if you could attach the patch as an attachment to this bug-report.
Created attachment 34542 [details] [review] Reduce composite burn effect time by 3 This patch is only against app/composite/gimp-composite-generic.c
Hi Sven, It's my first report with bugzilla, I didn't found the submit form the first time. I hope my patch is clean now ... I bought a powerpc, I hope add altivec support for next year (first half january)
Well, the patch clearly doesn't follow the GIMP coding style which is outlined in the file HACKING as found in the GIMP source tree. I am also not sure if this is worth a 64 kByte lookup table.
I'm disappointed for the style as I read HACKING and set up my vim to don't produce tabs. For the name of variable, I just follow the file in app/composite. I'll take a look at that. Should I repost the patch once cleaned ? I thought about this 64k table before posting it. I suppose people (like me) work on heavy image (~MB). If I had a factor 2, I would'nt bother to post it. But here is close to x4 witch is very good. My debugging version of gimp takes about 30MB in memory, I work on still camera image, 64Ko is very few in comparison. On the other hand, this compositing function is not often used.
Created attachment 34551 [details] [review] Reduce composite burn effect time by 3 (Cleaned up) Clean ups : int -> gint. aligning two misaligned lines. removing space before ';'
Created attachment 34761 [details] [review] cleaned up patch Here's a cleaned up version of the patch that follows the GIMP coding style and doesn't make use of C99 features which aren't supported by all compilers GIMP is supposed to work with.
I'd like to hear Helvetix opinion on this patch. After applying it, the generic routines are a lot faster than the MMX and SSE optimized ones so we would probably have to change those as well. I think we can safely do this change in the 2.2 cycle (either before 2.2.0 is released or for 2.2.1).
I've considered this before, but not being sure about the memory tradeoff (the table is not small) I didn't pursue it. But given that memory in contemporary machines are 512Mb or 1Gb, ~64k seems a small enough price to pay. Plus, I can make this optional at run-time, if people find that useful to do. I think this is a reasonable patch and I will include this (or something like it) in the next set of updates I'm working on now.
IMO the size of the LUT is reasonable and there's no point in making this a run-time option. I would not even make it a compile-time option.
I fully agree, but I'm partial here ... There is a way to divide the size of the LUT by 2 but it will significantly slow down (~15%).
Let's bump this off the 2.2 milestone. Now that we branched again, we should first do such changes in the HEAD branch.
Comment on attachment 34761 [details] [review] cleaned up patch Seems like this has been accepted, but not yet committed.
Helvetix said he would incorporate the patch into his changes to the compositing code.
Helvetix, can you please comment on the state of your work on integrating this change? I don't think it is fair to have people wait several months for the inclusion of their patches.
On the other hand, I have thought a bit more. 1) My tests was only with the default value for the regression test so, it worked on a large set, and thus is bandwith limited. So i didn't test on small set 2) Small sets are more frequent. It may slow the process for small brushes. And on a real usage basis feel slower with frequent context switch ... So, I can wait more on this (contrary to altivec where I made both small and large test case) I know gimp will move to gegl, and thus my work can be see as short/middle term for the gimp and not important (it is for this bug). Now, I hesitate to going further to make regression test more complete. Maybe with charts for small/large sets and even a simulation of 'real' gimp usage via gimp command line.
And I forgot the main reason I wrote the precedent comment ... How we manage different speeds on different function ??? Burn is faster with a lut, but activiting mmx will discard this benefit (and obviously we must use mmx). Maybe we need a dynamic mechanism to choose the most efficient functions at launch time ?
I've considered a dynamic checker which assigns different compositing functions based on the current systems performance. That would be a lot of work for very little payoff, albeit pretty fancy. I don't understand your comment about activating mxx will discard this benefit.
the burn 'generic' compositing function is faster than the mmx, but *all* other mmx function is faster than generic. Since in my patch I didn't comment the burn mmx function in gimp-composite-mmx.h, when using mmx, we use the slow burn mmx function
I see. Thank you for the clarification. The right thing to do here is to implement the generic burn function using your method, then simply remove the other implementations (in mmx, sse, sse2, etc.)
Isn't there a way to switch the accelerated burn functions off, other than removing them?
I've incorporated Frederic Leroy's patch nearly verbatim and have made the correct adjustments to not use the current mmx/sse implementations (which do not enjoy the benefit of the look-up-table). Thanks to Frederic for the good work. This bug should be closed.
There is no way to switch off any of the individual compositing functions other than to comment them out, regenerate and recompile.