GNOME Bugzilla – Bug 374854
possible optimizations in Script-Fu
Last modified: 2018-05-24 12:04:31 UTC
It appears that Script-Fu parsing its scripts takes up about one third of the startup time of gimp (with warm caches). I had a look at this with sysprof and found that about two third of that time is spent in oblist_find_by_name(). This function spends almost all of its time comparing strings. It does so in a case-insensitive way that is suitable for UTF-8 encoded strings. In particular it calls g_utf8_casefold() on both strings and compares them with g_utf8_collate(). The latter needs to do normalize the strings before comparing them. If we could store the normalized casefolded string in the table that oblist_find_by_name() searches in, we could avoid doing the casefolding and normalization for at least one of the strings. A lot of other places also call stricmp(), but almost always with lowercase ASCII strings which can be assumed to be identical to their casefolded and normalized version. As far as I can see the stored strings are even already normalized (not case-folded though), see store_string(). Now if we could also case-fold them, we could cut down startup time noticeably with a simple change. Other optimizations are possible as well. For example it appears as if the string length passed to store_string() is a character count that needs to be calculated using g_utf8_strlen(). The value is then immidiately converted back to a bytes count in store_string(). It is also questionable whether it makes sense to copy all glib-allocated strings to newly malloced buffers. It may make sense to do all memory allocation using g_malloc() and friends and then just use the strings that glib allocated for us. This would avoid some redundant memory allocations and copies.
I'd like to advocate that Script-fu (as implemented with TinyScheme) employ case sensitivity and only recognize standard ASCII (i.e., normalized versions of UTF-8) codes for its symbols and identifiers. Being as Script-fu is a scripting language for the GIMP, there should be little expectation that it will ever be used to run extensive "external" programs or make use of extensive external libraries. There may be a few runtime extensions that might be useful; but if they are well-written, they will most assuredly not rely case insensitivity and most likely would not employ non-standard ASCII characters (it is usually the opposite, the libraries are very particular NOT to use special features). It would seem to be quite common for Scheme implementations to forego the case insensitivity that is part of the R5RS specification. This website lists some common Scheme interpreters that are using case sensitivity for their symbols: http://www196.pair.com/lisovsky/scheme/casesens/index.html The reason to "demand" case sensitivity for a Scheme interpreter is, of course, speed. I performed a couple of crude benchmarks and discovered that replacing the 'stricmp' function call with 'strcmp' resulted in about a 40% improvement in execution time for the "script-fu-refresh" function (12 seconds versus 7 seconds from the console, 30 seconds versus 22 seconds for the splash screen portions). I realize that speed is not of primary concern for Script-fu; but when weighed against the nearly negligible benefit of case insensitivity, it would seem to be worth the trade. Making scripts depend on case insensitivity is poor programming practice and it would seem that even now, the case insensitivity of TinyScheme is somewhat questionable, as the following console transcript would indicate: => (define One 1) One => One 1 => ONE 1 => one Error: eval: unbound variable: one -------------------------- Limiting the character set to "standard" ASCII (normalized UTF-8) is a separate issue; but it also affords an opportunity for optimization with little demand being placed on script authors. While it might be "cute" to define 'lambda' to actually be the Greek symbol (or other such "tricks" using UTF-8), if nothing else such a practice might cause difficulty with using the GIMP's Script-fu console to interactively work with Script-fu. The only downside I can see to case sensitivity and UTF-8 coding for symbols would be with regards to maintainability of Script-fu with the ongoing development of the TinyScheme project. That is not to trivialize that downside; it might very well be sufficient justification to retain case insensitivity. Nonetheless, if the maintainability aspect of this decision can be resolved then I think it would be worth being non-compliant with R5RS for the purposes of GIMP scripting. Thanks for your time and consideration.
I tend to agree with comment #1. But I'd like to point out a few points. (1) Unless symbols are limited to ASCII, they would still have to be normalized for comparison. strcmp() can only be used reliably on ASCII strings. (2) I fully agree that case-insensitivity in a a programming language is more harmful than useful. IMO we should get rid of it even though that means violating the standard. (3) The 40% speedup that you have seen can IMO easily be achieved by the optimizations that I outlined. This does not depend on the question of whether we do want to support non-ASCII symbols (yes, please) or being case-insensitive.
Created attachment 78242 [details] [review] Patch to optimize oblist_find_by_name() and when UTF-8 string compares are used. The GIMP splash screen says "extension_script_fu" on my machine for abou 18.5 to 19 seconds in the Script-Fu. With this patch, the time is down to about 16.2 to 16.5 seconds. Using sysprof, the time spent in oblist_find_by_name() is reduced by about 50%.
If you want to use strcasecmp, please use g_ascii_strcasecmp() as strcasecmp() is not available on all supported platforms and has completely broken behaviour with respect to locales. Do you need to define stricmp at all after these changes? Also you must not use free() on memory that was allocated using glib. If possible, you should use g_malloc() and g_free() (and their variants) all over the place. But if you absolutely need to mix memory allocators, please make sure that you are using the correct function for releasing the allocated memory.
Created attachment 78313 [details] [review] Updated tinyscheme optimization patch
Don't know how this this reversion slipped in to the patch file but line 167 of the new patch should read: strkey(a) = g_utf8_collate_key(p1, -1); and not strkey(x) = g_utf8_collate_key(p1, -1);
Looks good to me. The hash function is still problematic (as discussed on irc earlier today). But that's not part of this optimization. I should probably open a new bug report for it.
2006-12-13 Kevin Cozens <kcozens@cvs.gnome.org> * plug-ins/script-fu/tinyscheme/scheme-private.h * plug-ins/script-fu/tinyscheme/scheme.c: Optimizations for string comparisons. Time spent in oblist_find_by_name() reduced by ~50% during startup by use of stored collation keys. Fixes bug #374854. Fix for hash_fn() will be commited shortly.
Unfortunately I have to reopen the bug because the implementation is wrong. The current code inserts into the hash table using the name. Lookup in the hash table is done using the key. This only happens to work because for all the symbols in our scripts the key is identical to the name. I will attach a patch that should fix the problem. While writing this patch I found that when mk_symbol() is run for a new symbol, the key is generated twice. Once when doing the lookup in oblist_find_by_name() and second time when this lookup fails and oblist_add_by_name() is used. However, after having another look at the code, I am pretty sure that the implementation is still buggy even after this patch is applied. mk_symbol() seems always to be called with a string that is passed through g_utf8_strdown(). I strongly suggest to make a number of test cases with symbols that have upper and lower case characters outside the ASCII range.
Created attachment 78348 [details] [review] fix insertion into hash table
The implementation regarding optimization isn't wrong but incomplete. Without the hash_fn() changes it will not properly handled UTF-8 coded identifiers. The attached patch is missing two changes in calls to hash_fn(). I haven't commited the changes from my tree since I experience a hang of Script-Fu when trying to run scripts. I'm looking in to the problem.
Setting on the 2.4 milestone because it doesn't make sense to ship with this change half-way implemented.
Created attachment 84696 [details] [review] Additional changes related to use of collation key. I have reviewed the passing of the collation key to hash_fn(). This patch fixes some errors I found in the calling of hash_fn(). It also includes changes where hash_fn() was still being called with a name instead of collation key. However, it also results in Script-Fu going in to an infinite loop. This occurs either during start up of GIMP or when clicking the "OK" button in a Script-Fu dialog box. I'm not sure the DDD/gdb packages are working as well as they used to on my machine since I udpated to FC6 and it is making it harder to debug the problem. I don't think the effort involved in completing the optimization changes is worth the couple of seconds of startup time it will save. I would rather back out the changes and leave optimization until after 2.4 is out. There are three other Script-Fu issues that are more important which should be dealt with first.
*** Bug 318256 has been marked as a duplicate of this bug. ***
Created attachment 88272 [details] strace log when trace.scm is refreshed trace.scm is a simple script with only two functions and corresponding PDB structures. It is 441 bytes long. (define (script-fu-trace-[on|off]) (tracing [0|1])) The strace log shows how many time is spend with llseek/lseek/read sequences. This is related to Solaris FILE* implementation. In linux, the time is spend using _llseek. The plugin lost a lot of time here. A detailed analysis show that we use llseek(file,0, SEEK_CUR) = ftell(), then lseek(file, -offset, SEEK_CUR) = lseek() and finally read() = fgetc() (on Solaris, a complete buffer is read by FILE* implementation at each fgetc()). The scheme parser then lost character already read.
Created attachment 88273 [details] strace log when trace.scm is refreshed with backchar() updated Updating plug-ins/script-fu/tinyscheme/scheme.c:backchar() reduce significantly the parsing time.
Created attachment 88275 [details] [review] Patch for tinyscheme parser backchar() against 2.3.16
Looks good to me, waiting for a comment from Kevin...
2007-05-17 Kevin Cozens <kcozens@cvs.gnome.org> * plug-ins/script-fu/tinyscheme/scheme.c: Applied patch from Eric Lamarque that optimizes backchar() use. See bug #347854. The backchar() patch saves approximately 0.1 seconds during GIMP start up on my new machine. I estimate it would save around 10 seconds on my old 266MHz Pentium II. It would be interesting to see if the change could be used when reading from string buffers. Changing summary line as this report now includes information about other possible optimizations for Script-Fu than just stricmp().
Kevin, is this currently in a state that can be released? Are there any optimizations that might have to be reverted because they aren't finished yet? If not, we can probably close this bug as FIXED.
I think we need to consider reverting the changes made regarding the storing of collation keys. As Sven has stated, the changes are not complete without the changes for hash_fn(). The changes shouldn't be that hard but initial attempts have shown they aren't quite as simple as I first thought. Also, saving the collation keys hasn't shown as big an improvement to start up time as originally thought. The optimization made by Eric to backchar() seem to have provided more of a reduction in start up time. We can revisit the issue again after a 2.4 release. I still have in mind the idea of reworking Script-Fu to make it behave as other scripting language plug-ins (such as pygimp and gimp-perl). It would provide the biggest time savings when starting GIMP with a warm cache.
There is nothing that needs to be done in hash_fn(). The hash function will work just fine with UTF-8 encoded keys. And if it doesn't, then you could simply use g_strhash() or copy it. What needs to be done is to fix the use of the hash table, as pointed out in comment #9. Perhaps reverting is indeed the best thing we can do here. The optimization should not have been attempted without a test case as pointed out in comment #9.
Created attachment 93413 [details] [review] patch to revert the changes done in revision 21536 Please review the attached patch carefully and test that it doesn't introduce any regressions.
I was referring to the use of hash_fn() and what it is passed and not to any need for changes in the hash_fn() function itself. I referenced your comments about hash_fn() but missed a word or two so it wasn't clear in my previous comment. I will review the patch to revert the changes and compare them against the patch that was originally applied. That should prevent any regressions from slipping in.
Applied patch from Sven Neumann. We can revisit this issue after the 2.4 release of GIMP. Moving this to the Future milestone for now since some planned changes for Script-Fu could make this issue obsolete. 2007-08-11 Kevin Cozens <kcozens@cvs.gnome.org> * plug-ins/script-fu/tinyscheme/scheme.c: Commited patch from Sven Neumann that reverts the optimization patch which stored collation keys. See bug #374854.
It would be nice if the proposed changes could be reconsidered for 2.6. There are some really low-hanging fruits here and it would help to reduce startup time.
It appears to dangerous to change this code without the test cases suggested in comment #9. So let's bump it to the Future milestone until someone adds those test cases to Script-Fu.
The infinite loop mentioned in comments #11 and #13 happens whenever there is an error while interpreting scheme's code attached to the '*error-hook*', in that case the '*error-hook*' is invoked again and if the error persists the infinite loop starts. The problem is in scheme-wrapper.c where the old_constants[] "APPLY" is added to scheme's environment. Now, it does not hide the built-in 'apply' function just because the two identifiers live in two different bins of the hash table, but after the application of the patches attached to this bug report, the constant hides the built-in, and as soon as a script, directly or indirecly, invokes 'apply' the '*error-hook*' is invoked which again calls 'apply' (through 'throw').
*** Bug 643158 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gimp/issues/221.