After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 374854 - possible optimizations in Script-Fu
possible optimizations in Script-Fu
Status: RESOLVED OBSOLETE
Product: GIMP
Classification: Other
Component: Script-Fu
git master
Other All
: Normal enhancement
: Future
Assigned To: GIMP Bugs
GIMP Bugs
: 318256 643158 (view as bug list)
Depends on:
Blocks: 643158
 
 
Reported: 2006-11-13 20:42 UTC by Sven Neumann
Modified: 2018-05-24 12:04 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to optimize oblist_find_by_name() and when UTF-8 string compares are used. (5.15 KB, patch)
2006-12-12 22:25 UTC, Kevin Cozens
needs-work Details | Review
Updated tinyscheme optimization patch (5.19 KB, patch)
2006-12-13 20:03 UTC, Kevin Cozens
committed Details | Review
fix insertion into hash table (1.35 KB, patch)
2006-12-14 10:09 UTC, Sven Neumann
needs-work Details | Review
Additional changes related to use of collation key. (2.14 KB, patch)
2007-03-16 03:14 UTC, Kevin Cozens
rejected Details | Review
strace log when trace.scm is refreshed (7.57 KB, text/plain)
2007-05-16 10:47 UTC, Eric Lamarque
  Details
strace log when trace.scm is refreshed with backchar() updated (334 bytes, text/plain)
2007-05-16 10:54 UTC, Eric Lamarque
  Details
Patch for tinyscheme parser backchar() against 2.3.16 (852 bytes, patch)
2007-05-16 10:57 UTC, Eric Lamarque
committed Details | Review
patch to revert the changes done in revision 21536 (4.71 KB, patch)
2007-08-10 06:35 UTC, Sven Neumann
committed Details | Review

Description Sven Neumann 2006-11-13 20:42:49 UTC
It appears that Script-Fu parsing its scripts takes up about one third of the startup time of gimp (with warm caches). I had a look at this with sysprof and found that about two third of that time is spent in oblist_find_by_name(). This function spends almost all of its time comparing strings. It does so in a case-insensitive way that is suitable for UTF-8 encoded strings. In particular it calls g_utf8_casefold() on both strings and compares them with g_utf8_collate(). The latter needs to do normalize the strings before comparing them.

If we could store the normalized casefolded string in the table that oblist_find_by_name() searches in, we could avoid doing the casefolding and normalization for at least one of the strings. A lot of other places also call stricmp(), but almost always with lowercase ASCII strings which can be assumed to be identical to their casefolded and normalized version.

As far as I can see the stored strings are even already normalized (not case-folded though), see store_string(). Now if we could also case-fold them, we could cut down startup time noticeably with a simple change.

Other optimizations are possible as well. For example it appears as if the string length passed to store_string() is a character count that needs to be calculated using g_utf8_strlen(). The value is then immidiately converted back to a bytes count in store_string().

It is also questionable whether it makes sense to copy all glib-allocated strings to newly malloced buffers. It may make sense to do all memory allocation using g_malloc() and friends and then just use the strings that glib allocated for us. This would avoid some redundant memory allocations and copies.
Comment 1 saulgoode 2006-11-20 03:34:17 UTC
I'd like to advocate that Script-fu (as implemented with TinyScheme) employ case sensitivity and only recognize standard ASCII (i.e., normalized versions of UTF-8) codes for its symbols and identifiers.

Being as Script-fu is a scripting language for the GIMP, there should be little expectation that it will ever be used to run extensive "external" programs or make use of extensive external libraries. There may be a few runtime extensions that might be useful; but if they are well-written, they will most assuredly not rely case insensitivity and most likely would not employ non-standard ASCII characters (it is usually the opposite, the libraries are very particular NOT to use special features). 

It would seem to be quite common for Scheme implementations to forego the case insensitivity that is part of the R5RS specification. This website lists some common Scheme interpreters that are using case sensitivity for their symbols: http://www196.pair.com/lisovsky/scheme/casesens/index.html

The reason to "demand" case sensitivity for a Scheme interpreter is, of course, speed. I performed a couple of crude benchmarks and discovered that replacing the 'stricmp' function call with 'strcmp' resulted in about a 40% improvement in execution time for the "script-fu-refresh" function (12 seconds versus 7 seconds from the console, 30 seconds versus 22 seconds for the splash screen portions).

I realize that speed is not of primary concern for Script-fu; but when weighed against the nearly negligible benefit of case insensitivity, it would seem to be worth the trade.

Making scripts depend on case insensitivity is poor programming practice and it would seem that even now, the case insensitivity of TinyScheme is somewhat questionable, as the following console transcript would indicate:

=> (define One 1)
One
=> One
1
=> ONE
1
=> one
Error: eval: unbound variable: one 

--------------------------

Limiting the character set to "standard" ASCII (normalized UTF-8) is a separate issue; but it also affords an opportunity for optimization with little demand being placed on script authors. While it might be "cute" to define 'lambda' to actually be the Greek symbol (or other such "tricks" using UTF-8), if nothing else such a practice might cause difficulty with using the GIMP's Script-fu console to interactively work with Script-fu. 


The only downside I can see to case sensitivity and UTF-8 coding for symbols would be with regards to maintainability of Script-fu with the ongoing development of the TinyScheme project. That is not to trivialize that downside; it might very well be sufficient justification to retain case insensitivity. Nonetheless, if the maintainability aspect of this decision can be resolved then I think it would be worth being non-compliant with R5RS for the purposes of GIMP scripting.

Thanks for your time and consideration.






Comment 2 Sven Neumann 2006-11-20 09:00:34 UTC
I tend to agree with comment #1. But I'd like to point out a few points.

(1) Unless symbols are limited to ASCII, they would still have to be normalized for comparison. strcmp() can only be used reliably on ASCII strings.

(2) I fully agree that case-insensitivity in a a programming language is more harmful than useful. IMO we should get rid of it even though that means violating the standard.

(3) The 40% speedup that you have seen can IMO easily be achieved by the optimizations that I outlined. This does not depend on the question of whether we do want to support non-ASCII symbols (yes, please) or being case-insensitive.
Comment 3 Kevin Cozens 2006-12-12 22:25:22 UTC
Created attachment 78242 [details] [review]
Patch to optimize oblist_find_by_name() and when UTF-8 string compares are used.

The GIMP splash screen says "extension_script_fu" on my machine for abou 18.5 to 19 seconds in the Script-Fu. With this patch, the time is down to about 16.2 to 16.5 seconds. Using sysprof, the time spent in oblist_find_by_name() is reduced by about 50%.
Comment 4 Sven Neumann 2006-12-13 08:24:04 UTC
If you want to use strcasecmp, please use g_ascii_strcasecmp() as strcasecmp() is not available on all supported platforms and has completely broken behaviour with respect to locales. Do you need to define stricmp at all after these changes?

Also you must not use free() on memory that was allocated using glib. If possible, you should use g_malloc() and g_free() (and their variants) all over the place. But if you absolutely need to mix memory allocators, please make sure that you are using the correct function for releasing the allocated memory.
Comment 5 Kevin Cozens 2006-12-13 20:03:48 UTC
Created attachment 78313 [details] [review]
Updated tinyscheme optimization patch
Comment 6 Kevin Cozens 2006-12-13 20:30:51 UTC
Don't know how this this reversion slipped in to the patch file but line 167 of the new patch should read:
    strkey(a) = g_utf8_collate_key(p1, -1);
and not
    strkey(x) = g_utf8_collate_key(p1, -1);
Comment 7 Sven Neumann 2006-12-13 20:36:20 UTC
Looks good to me. The hash function is still problematic (as discussed on irc earlier today). But that's not part of this optimization. I should probably open a new bug report for it.
Comment 8 Kevin Cozens 2006-12-13 22:05:46 UTC
2006-12-13  Kevin Cozens  <kcozens@cvs.gnome.org>

        * plug-ins/script-fu/tinyscheme/scheme-private.h
        * plug-ins/script-fu/tinyscheme/scheme.c: Optimizations for string
        comparisons. Time spent in oblist_find_by_name() reduced by ~50%
        during startup by use of stored collation keys. Fixes bug #374854.

Fix for hash_fn() will be commited shortly.
Comment 9 Sven Neumann 2006-12-14 10:07:06 UTC
Unfortunately I have to reopen the bug because the implementation is wrong. The current code inserts into the hash table using the name. Lookup in the hash table is done using the key. This only happens to work because for all the symbols in our scripts the key is identical to the name.

I will attach a patch that should fix the problem. While writing this patch I found that when mk_symbol() is run for a new symbol, the key is generated twice. Once when doing the lookup in oblist_find_by_name() and second time when this lookup fails and oblist_add_by_name() is used.

However, after having another look at the code, I am pretty sure that the implementation is still buggy even after this patch is applied. mk_symbol() seems always to be called with a string that is passed through g_utf8_strdown(). I strongly suggest to make a number of test cases with symbols that have upper and lower case characters outside the ASCII range.
Comment 10 Sven Neumann 2006-12-14 10:09:03 UTC
Created attachment 78348 [details] [review]
fix insertion into hash table
Comment 11 Kevin Cozens 2006-12-14 16:41:54 UTC
The implementation regarding optimization isn't wrong but incomplete. Without the hash_fn() changes it will not properly handled UTF-8 coded identifiers. The attached patch is missing two changes in calls to hash_fn(). I haven't commited the changes from my tree since I experience a hang of Script-Fu when trying to run scripts. I'm looking in to the problem.
Comment 12 Sven Neumann 2007-01-03 15:15:07 UTC
Setting on the 2.4 milestone because it doesn't make sense to ship with this change half-way implemented.
Comment 13 Kevin Cozens 2007-03-16 03:14:49 UTC
Created attachment 84696 [details] [review]
Additional changes related to use of collation key.

I have reviewed the passing of the collation key to hash_fn(). This patch fixes some errors I found in the calling of hash_fn(). It also includes changes where hash_fn() was still being called with a name instead of collation key.

However, it also results in Script-Fu going in to an infinite loop. This occurs either during start up of GIMP or when clicking the "OK" button in a Script-Fu dialog box.

I'm not sure the DDD/gdb packages are working as well as they used to on my machine since I udpated to FC6 and it is making it harder to debug the problem. I don't think the effort involved in completing the optimization changes is worth the couple of seconds of startup time it will save. I would rather back out the changes and leave optimization until after 2.4 is out. There are three other Script-Fu issues that are more important which should be dealt with first.
Comment 14 Eric Lamarque 2007-04-16 16:40:53 UTC
*** Bug 318256 has been marked as a duplicate of this bug. ***
Comment 15 Eric Lamarque 2007-05-16 10:47:57 UTC
Created attachment 88272 [details]
strace log when trace.scm is refreshed

trace.scm is a simple script with only two functions and corresponding PDB structures. It is 441 bytes long.

(define (script-fu-trace-[on|off])
  (tracing [0|1]))

The strace log shows how many time is spend with llseek/lseek/read sequences. This is related to Solaris FILE* implementation. In linux, the time is spend using _llseek.

The plugin lost a lot of time here. A detailed analysis show that we use llseek(file,0, SEEK_CUR) = ftell(), then lseek(file, -offset, SEEK_CUR) = lseek() and finally read() = fgetc() (on Solaris, a complete buffer is read by FILE* implementation at each fgetc()).

The scheme parser then lost character already read.
Comment 16 Eric Lamarque 2007-05-16 10:54:17 UTC
Created attachment 88273 [details]
strace log when trace.scm is refreshed with backchar() updated

Updating plug-ins/script-fu/tinyscheme/scheme.c:backchar() reduce significantly the parsing time.
Comment 17 Eric Lamarque 2007-05-16 10:57:54 UTC
Created attachment 88275 [details] [review]
Patch for tinyscheme parser backchar() against 2.3.16
Comment 18 Sven Neumann 2007-05-16 20:19:54 UTC
Looks good to me, waiting for a comment from Kevin...
Comment 19 Kevin Cozens 2007-05-17 20:24:12 UTC
2007-05-17  Kevin Cozens  <kcozens@cvs.gnome.org>

        * plug-ins/script-fu/tinyscheme/scheme.c: Applied patch from
        Eric Lamarque that optimizes backchar() use. See bug #347854.

The backchar() patch saves approximately 0.1 seconds during GIMP start up on my new machine. I estimate it would save around 10 seconds on my old 266MHz Pentium II. It would be interesting to see if the change could be used when reading from string buffers.

Changing summary line as this report now includes information about other possible optimizations for Script-Fu than just stricmp().
Comment 20 Sven Neumann 2007-07-26 09:29:23 UTC
Kevin, is this currently in a state that can be released? Are there any optimizations that might have to be reverted because they aren't finished yet? If not, we can probably close this bug as FIXED.
Comment 21 Kevin Cozens 2007-08-10 03:11:53 UTC
I think we need to consider reverting the changes made regarding the storing of collation keys. As Sven has stated, the changes are not complete without the changes for hash_fn(). The changes shouldn't be that hard but initial attempts have shown they aren't quite as simple as I first thought.

Also, saving the collation keys hasn't shown as big an improvement to start up time as originally thought. The optimization made by Eric to backchar() seem to have provided more of a reduction in start up time.

We can revisit the issue again after a 2.4 release. I still have in mind the idea of reworking Script-Fu to make it behave as other scripting language plug-ins (such as pygimp and gimp-perl). It would provide the biggest time savings when starting GIMP with a warm cache.
Comment 22 Sven Neumann 2007-08-10 06:24:25 UTC
There is nothing that needs to be done in hash_fn(). The hash function will work just fine with UTF-8 encoded keys. And if it doesn't, then you could simply use g_strhash() or copy it. What needs to be done is to fix the use of the hash table, as pointed out in comment #9.

Perhaps reverting is indeed the best thing we can do here. The optimization should not have been attempted without a test case as pointed out in comment #9.
Comment 23 Sven Neumann 2007-08-10 06:35:06 UTC
Created attachment 93413 [details] [review]
patch to revert the changes done in revision 21536

Please review the attached patch carefully and test that it doesn't introduce any regressions.
Comment 24 Kevin Cozens 2007-08-10 18:42:45 UTC
I was referring to the use of hash_fn() and what it is passed and not to any need for changes in the hash_fn() function itself. I referenced your comments about hash_fn() but missed a word or two so it wasn't clear in my previous comment.

I will review the patch to revert the changes and compare them against the patch that was originally applied. That should prevent any regressions from slipping in.
Comment 25 Kevin Cozens 2007-08-11 14:54:25 UTC
Applied patch from Sven Neumann. We can revisit this issue after the 2.4 release of GIMP. Moving this to the Future milestone for now since some planned changes for Script-Fu could make this issue obsolete.

2007-08-11  Kevin Cozens  <kcozens@cvs.gnome.org>

        * plug-ins/script-fu/tinyscheme/scheme.c: Commited patch from
        Sven Neumann that reverts the optimization patch which stored
        collation keys. See bug #374854.
Comment 26 Sven Neumann 2008-06-05 06:45:59 UTC
It would be nice if the proposed changes could be reconsidered for 2.6. There are some really low-hanging fruits here and it would help to reduce startup time.
Comment 27 Sven Neumann 2008-07-16 06:51:58 UTC
It appears to dangerous to change this code without the test cases suggested in comment #9. So let's bump it to the Future milestone until someone adds those test cases to Script-Fu.
Comment 28 Massimo 2010-08-02 12:58:29 UTC
The infinite loop mentioned in comments #11 and #13 happens
whenever there is an error while interpreting scheme's code
attached to the '*error-hook*', in that case the '*error-hook*'
is invoked again and if the error persists the infinite loop
starts.

The problem is in scheme-wrapper.c where the old_constants[] 
"APPLY" is added to scheme's environment.

Now, it does not hide the built-in 'apply' function just because 
the two identifiers live in two different bins of the hash table,
but after the application of the patches attached to this bug report,
the constant hides the built-in, and as soon as a script, directly 
or indirecly, invokes 'apply' the '*error-hook*' is invoked which 
again calls 'apply' (through 'throw').
Comment 29 Michael Natterer 2011-02-28 22:54:46 UTC
*** Bug 643158 has been marked as a duplicate of this bug. ***
Comment 30 GNOME Infrastructure Team 2018-05-24 12:04:31 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gimp/issues/221.