After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 476356 - Script containing invalid UTF-8 characters is not loaded
Script containing invalid UTF-8 characters is not loaded
Status: RESOLVED FIXED
Product: GIMP
Classification: Other
Component: Script-Fu
git master
Other All
: Normal normal
: 2.4
Assigned To: GIMP Bugs
GIMP Bugs
Depends on:
Blocks:
 
 
Reported: 2007-09-12 21:55 UTC by Michael Schumacher
Modified: 2007-09-21 23:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
script with japanese comments (101.96 KB, text/x-scheme)
2007-09-12 21:59 UTC, Michael Schumacher
  Details
patch to implement proper handling of utf-8. (3.00 KB, patch)
2007-09-17 02:35 UTC, Simon Budig
none Details | Review
Fix of the previous patch (2.63 KB, patch)
2007-09-17 02:38 UTC, Simon Budig
none Details | Review
Simplified and bug-fixed version of previous patch. (2.56 KB, patch)
2007-09-21 22:18 UTC, Kevin Cozens
committed Details | Review

Description Michael Schumacher 2007-09-12 21:55:00 UTC
Steps to reproduce:

1. get photoeffects script from http://users.telenet.be/ev1/gimpphotoeffects_en.html (I'll attach it, too)
2. install it
3. run gimp or refresh scripts
4. notice that you do not get the expected menu entires (<Image>->Script-Fu->Photo Effects)

but

5. remove all japanese comments
6. notice that scripts starts to work


Is this an encoding issue? Firefox claims that the script is in Shift_JIS, file says "Non-ISO extended ASCII English text".
Comment 1 Michael Schumacher 2007-09-12 21:59:14 UTC
Created attachment 95484 [details]
script with japanese comments
Comment 2 Kevin Cozens 2007-09-12 23:28:33 UTC
The problem is an encoding issue. Comment lines containing characters that are not valid ASCII and are not valid UTF-8 will cause a problem. The inchar() routine of TinyScheme needs to be updated to ignore over bad characters.
Comment 3 Sven Neumann 2007-09-13 06:35:41 UTC
If fixing the routing handling the comments is not simple, we should simply require scripts to be in UTF-8 encoding.
Comment 4 Michael Natterer 2007-09-13 10:22:16 UTC
I thought that was a requirement anyway since we switched to tiny scheme?
Comment 5 Raphaël Quinet 2007-09-13 13:45:34 UTC
Since we switched to tinyscheme, the scripts are expected to be in UTF-8. But we could still be tolerant and ignore invalid characters in comments (in non-functional parts of the script). Otherwise, some users might consider this as a regression because their scripts that were (incorrectly) encoded using a local character set were working with previous GIMP versions and are not working anymore. 

But I agree with Sven in comment #3: if it is too hard to be tolerant, then we could instead be more strict and require all scripts to be in UTF-8. We could validate the script once before parsing it and display a sensible error message if the script encoding is not UTF-8.
Comment 6 Kevin Cozens 2007-09-13 18:02:01 UTC
I was just reviewing how UTF-8 characters are coded. Based on what I have just read it should be simple enough to modify the basic_inchar() routine of TinyScheme to ignore invalid UTF-8 characters.

basic_inchar() is called each time a new character is needed from the input source (either a file or string buffer). It should ignore any bytes until either the top bit is 0 (byte < 128) or the top two bits are set (byte >= 192). Both conditions indicate the start of a (possibly) valid character. 

Doing the above would let basic_inchar() return valid characters at the expense of being unable to have a script read a binary file. In order to read a binary file one would need to know when a script is being parsed and when a script is being run. 
Comment 7 Kevin Cozens 2007-09-14 18:40:08 UTC
Changed summary line to provide a better indication of the problem.
Comment 8 Simon Budig 2007-09-17 02:35:44 UTC
Created attachment 95699 [details] [review]
patch to implement proper handling of utf-8.

This patch implements proper UTF8 parsing and discards invalid byte sequences silently.

Note that this does not address enabling binary I/O.
Comment 9 Simon Budig 2007-09-17 02:38:42 UTC
Created attachment 95700 [details] [review]
Fix of the previous patch

Sorry, the previous patch had parts of an unrelated fix in it.
Comment 10 Sven Neumann 2007-09-17 07:05:38 UTC
Since you are already doing all the checks in your code, you should probably use g_utf8_get_char() instead of g_utf8_get_char_validated().
Comment 11 Kevin Cozens 2007-09-21 22:18:21 UTC
Created attachment 95992 [details] [review]
Simplified and bug-fixed version of previous patch.

Updated version of previous patch. Fixes a couple of minor errors in the previous patch and slightly simplifies it.
Comment 12 Kevin Cozens 2007-09-21 23:53:48 UTC
2007-09-21  Kevin Cozens  <kcozens@cvs.gnome.org>

        * plug-ins/script-fu/tinyscheme/scheme.c (basic_inchar): Applied
        modified patch from Simon Budig. Any bytes read from a file which
        are not valid UTF-8 characters will be ignored. Fixes bug #476356.