GNOME Bugzilla – Bug 129867
Regexp evaluation corrupts stack on Tru64 and Aix
Last modified: 2004-07-30 20:59:37 UTC
glib-2.2.3 atk-1.2.4 pango-1.2.5 gtk+-2.2.4 gettext-0.13.1 This problem has happened with every single devel version of the GIMP I've built in the last several months (at least back to 1.3.15). I've held off reporting it because I wanted to try get some more useful information for the bug report (e.g. a stack trace). It may be a while before I can do that, though, so I thought I should at least make a report. Note that I've built gimp and all its prereqs on Solaris in 64 bit mode, and I have *not* seen the problem there. Then again, Solaris is a lot more forgiving about mixed 32 and 64 bit code (and things like memory alignment for intrinsic types) than Tru64 is. The problem is that gimp will SEGV with a "FLTBOUNDS", which is a "memory bounds violation". It happens while "Starting Extensions" and "extension_script_fu" are displayed in the splash. Whatever's going wrong is completely corrupting the stack. I've spent some time stepping with Tru64's debugger (gdb only works so-so on Tru64) but I haven't yet had the patience to step as deep into the code as the problem apparently exists. I will try find some time to look into this and get a real stack trace, but it may be a couple weeks. Until then, I can tell you that `truss' on the gimp shows that gimp reads the following from script-fu: 194701: read(4, 0x00000001408114A0, 25) = 25 194701: g i m p _ p r o c e d u r a l _ d b _ q u e r y\0 (pid 194701 is the gimp), and then it makes the following additional reads: 194701: read(4, 0x0000000140124FE8, 4) = 4 194701: read(4, 0x0000000140808440, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x0000000140801260, 3) = 3 194701: read(4, 0x0000000140808468, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x000000014080A760, 3) = 3 194701: read(4, 0x0000000140808490, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x0000000140813A40, 3) = 3 194701: read(4, 0x00000001408084B8, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x00000001408230A0, 3) = 3 194701: read(4, 0x00000001408084E0, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x00000001402475A0, 3) = 3 194701: read(4, 0x0000000140808508, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x0000000140823000, 3) = 3 194701: read(4, 0x0000000140808530, 4) = 4 194701: read(4, 0x000000011FFFB640, 4) = 4 194701: read(4, 0x0000000140823020, 3) = 3 194701: Incurred fault #32, FLTBOUNDS %pc = 0x0000000000000000 addr = 0x000000011FFFB6A0 194701: Received signal #11, SIGSEGV [caught] The SEGV always happens after exactly that number of additional read()s. I will try get an actual stack-trace some time in the next couple weeks.
A stack trace would really be helpful here. Or at least a report if the problem still shows up with the latest development releases (gimp-2.0pre3 at the time of this writing).
I'm building 2.0pre3 now, and will report back tomorrow on whether the problem is still present. It's been impossible to get a stack trace *after* the SEGV (the stack is completely corrupt), so my only recourse is to catch the problem just before it happens, which involves a lot of stepping with the debugger. Assuming the problem is still present, I won't have time to spend on it until this weekend, so the earliest I would be able to have a stack trace to you would be Monday night of next week (2004-Feb-16).
If there happen to be any warnings before the segfault, that would give you a chance to catch the problem more easily. You can then use the --g-fatal-warnings command-line option.
Thanks for the tip Sven. I'll definitely give that a try, if the problem is still present.
I built 2.0pre3, and the problem still happens. There are no warnings when starting the gimp (other than the new bug regarding pluginrc, which has apparently already been fixed in CVS), so the --g-fatal-warnings didn't help in this case. I will try find some time this coming weekend to step with the debugger up to the point where the problem happens. I will report back when I have any additional information.
I finally had some time to spend on this. The stack trace from *just before* the SEGV is always: >0 0x30008843f5c in g_ptr_array_add(farray=0x140080040, data=0x1400a8b00) "garray.c":611
+ Trace 44895
ladebug is the Tru64 debugger (gdb doesn't work very well on Tru64): (ladebug) print farray 0x140080040 (ladebug) print *farray struct _GPtrArray { pdata = 0x14074be80; len = 0; } (ladebug) print *(farray->pdata) 0x0 (ladebug) print *((GRealPtrArray *) farray) struct _GRealPtrArray { pdata = 0x14074be80; len = 0; alloc = 64; } (ladebug) W 601 } 602 603 void 604 g_ptr_array_add (GPtrArray* farray, 605 gpointer data) 606 { 607 GRealPtrArray* array = (GRealPtrArray*) farray; 608 609 g_return_if_fail (array); 610 > 611 g_ptr_array_maybe_expand (array, 1); 612 613 array->pdata[array->len++] = data; 614 } (ladebug) s Thread received signal SEGV The "step " command was not completed. pc address 0x0 is invalid; substituting RA stopped at [ 0x0] I've also tried this with gimp-2.0pre4 and with glib-2.3.5, and the problem remains the same. The problem happens in the same spot, always. I suppose it's possible that the problem is being caused somewhere in the gimp, but since the crash is happening in a glib routine, I think for now it's best to assume that this is either a subtle problem in glib or it's a problem in the development environment I'm using. I can understand why the compiler could optimize away `array' inside g_ptr_array_add(), and even why it might choose to inline all of g_ptr_array_maybe_expand() (I'm *guessing* that's what it's doing) but I don't understand why that should be causing the consistent SEGV I'm seeing. Should this be reassigned to glib, or be kept with the gimp for now?
I asked someone I know that builds a lot of opensource software if he could build gimp 2.0.0 on Tru64 UNIX, to see if anyone else is seeing this problem or if it's just me. He's seeing the same crash. I'm currently using glib-2.4.0 atk-1.6.0 pango-1.4.0 gtk+-2.4.0 gimp-2.0.0 but as I reported above, it's been going on for quite a while and I've tried many different prerelease versions of gimp, along with various combinations of the most important prereqs. Should "Version:" be updated on bug, to indicate that it applies to current gimp too? Should this be reassigned to the glib developers? Thanks! Tim
I spent some time in the debugger last night, and made some significant progress on this. My prior debugging attempts had been compiled with both optimization and debugging, and I think some of the inlining was making it harder to find the real problem. I don't understand why, but the stack is being corrupted by the calls to `regcomp' in procedural_db_query_invoker (procedural_db_cmds.c). Before the regcomp calls, the stack looks like: (ladebug) where >0 0x12031fd10 in procedural_db_query_invoker(gimp=0x1400d1f80, args=0x140799240) "procedural_db_cmds.c":474
+ Trace 47201
That's all that's left of the stack. So, when procedural_db_query_invoker finishes and returns, and procedural_db_execute finishes and returns, we get a SEGV because we're returning to an invalid routine that may not even be part of our process space. gimp is using a custom regex library for all non-glibc systems. I have verified that if I manually define HAVE_GLIBC_REGEX and remove all references to regexrepl/libregex.a from the Makefiles, so that gimp uses the regcomp/regexec/regerror/regfree that's part of the system C library, the SEGV doesn't happen, and 20 minutes of playing with the gimp and script-fu wasn't able to turn up any problems. So, what does regexrepl/libregex.a do that a UNIX95 or UNIX98 regcomp/regexec/regerror/regfree doesn't do? The autoconf test that sets HAVE_GLIBC_REGEX is strange -- it's basically checking "is this a glibc2 system", not for anything regex-related. No sure why, can someone explain?
GIMP provides it's own GNUish regex because system regex implementations, despite the standard, can be quite idiosyncratic. And there are some old (and perhaps new?) platforms that provide the regex functions, but they are stubs, and not actually implemented. If you change line 1002 to be #undef MATCH_MAY_ALLOCATE, does it still crash for you? Either way, we should sync up with the latest code from GNU grep.
I'm recompiling right now, I'll have a report tomorrow. I agree it would be good to sync up regexrepl/ with current GNU grep. More later.
I recompiled with MATCH_MAY_ALLOCATE undefined, and it did not make a difference (still a crash).
Yosh, you mentioned getting the regex files from recent GNU grep and trying them. The most recent version(s) of grep have never been put back on ftp.gnu.org after the security incident last summer, and the maintainer (bero@redhat.com) didn't answer my email. Do you know where the canonical versions of regex.c and regex.h are? Looks like even grep's versions come from elsewhere (glibc?).
I'll probably just grab them from debian's version. The glibc stuff has a bunch of extra cruft that is hard to integrate.
Moving all bugs remaining on the 2.0.3 milestone to 2.0.4.
Why is this on a 2.0.x milestone? This looks like 2.2 material to me. Changing milestone appropriately. Dave.
What makes it look more appropriate for 2.2 than 2.0.x?
Sorry Tim - I perhaps acted hastily. For me, merging in more recent versions of the regex files goes beyong a simple bug-fix, and it might introduce other problems. So that operation seems more appropriate in an unstable branch (we expect this branch to be the stable branch in a couple of months, in any case). Reading the comments, there is obviously a problem with the existing regex files, and fixing that bug would of course be a 2.0 type thing. Looking at comment #8, it seems you have verified that using an external regex library fixes the problem, so if Sven thinks it's fine, I guess that this merge could go into both branches, rather than hunt for the problem in our regex code. I was simply swept away with enthustiasm after reading the description.
I agree with Dave that this is too much of a change for the stable branch. It will definitely have to go into the HEAD branch first. We can consider to backport it later if there's a need. Leave it on the 2.2 milestone.
In comment #8, I determined that using the system/POSIX (non-GNU) regex libraries solved the problem for me. I'm not 100% certain that updating to the latest GNU regex code will solve the problem, but based on visual inspection of the code that's in grep-2.5, I'm betting it will. You're right that updating the regex files might introduce other problems (for non-Linux platforms, since the (GNU) regex code in libc is used on Linux). I think the risk is pretty minimal, but your intuition might be correct. How about if we leave this on the 2.2 milestone and I try come up with a minimal patch to the regex files that fixes the problem for the 2.0.x series?
I am not reluctant to backport a full code update to 2.0.x but I'd like to see the change appearing in a 2.1.x release first.
The latest version of regex should not be pulled from grep, but gnulib: http://savannah.gnu.org/projects/gnulib
AIX 5.2 gives the same error. Defining HAVE_GLIBC_REGEX to use the system regex library gets Gimp to run as on Tru64 UNIX.
BTW, I just *quickly* replaced the files in regexrepl/ with the ones from gnulib and I still get the reported failure on Tru64 UNIX. I didn't do a fully rebuild, just those modules dependent on regexrepl/.
You sure the gnulib has the same version grep does then?
The gnulib version should always the latest. The version from grep doesn't match it as they haven't pulled in those changes (probably for no really good reason). I just checked the version in grep and it's older than the version in gnulib.
Perhaps the description should be changed to reflect the bug.
The original summary on the bug I submitted was gimp SEGV when communicating with script-fu on Tru64 Yosh changed it to the "sync regex" description after I found that it was the regexrepl code that was corrupting the stack. Either way is fine, though I think having Tru64 and AIX in the summary makes it easier for people to find this if they're searching bugzilla for problems that affect their (non-Linux) system. Having "sync regex" in the summary makes it easy to remember what we think the resolution might be, without having to ready 27+ comments... ;-)
I got the impression from comment #25 by The Written Word that the problem might be a little more complicated than that. Anyway, the efforts by you guys to fix this are greatly appreciated. Obviously enough, not many people have Aix or Tru65 machines lying around... Thanks a lot.
I'm attaching three small patches. Only the Makefile.am is strictly required, but I highly recommend you consider the others as well. Albert, will you test with these (you'll need to rerun automake after the most important one) and verify they fix the situation on AIX? gimp-2.0.3-regexrepl-Makefile.am.patch : change app/Makefile.am so that $REGEXREPL is listed right after the libgimp* libraries, before the external dependencies. The external dependencies may actually pull in -lc, so with REGEXREPL at the end of the link line, the libc version of reg* functions would be used. That's what's happening here -- we're getting the stuff from regex.h included, but the actual functions we link to are the libc versions. gimp-2.0.3-regex-__STDC__.patch : Change two occurrences (one each in regex.c and regex.h) of #if __STDC__ to #if defined(__STDC__) Reason: Some commercial C compilers, when executing in "ANSI + extensions" mode, will define __STDC__, but set it to 0. It's better to just test if __STDC__ is defined at all. gimp-2.0.3-osf-re_max_failures.patch : By inspection of regex.c, it's clear that the comment doesn't match the code. Someone already fixed the re_max_failures values for the first half of the #ifdef (INT_IS_16BIT, not the case for any modern platform), but they missed fixing it in the second half of the #ifdef (the part we actually use). I've checked regex.c from grep 2.5, and this same fix has been incorporated into that version.
Created attachment 30084 [details] [review] change dependency order in app/Makefile.am so $(REGEXREPL) is linked earlier
Created attachment 30085 [details] [review] test #if defined(__STDC__), not #if __STDC__
Created attachment 30086 [details] [review] apply the re_max_failures fix to the #else case that was missed
Good catch. I'll check in a bit. BTW, one of the things I tested yesterday was using the regex.h from GIMP but the regex implementation from libc. This caused the same crash on AIX and Tru64. So, you might be right.
BTW, I think you should patch plug-ins/script-fu/Makefile.am as well. I'll test with a reordered REGEXREPL for that as well.
With just app/Makefile.am I no longer get the main app to crash, but you're absolutely right that the same change should be made to plug-ins/script-fu/Makefile.am. I'll attach the trivial patch for that.
Created attachment 30090 [details] [review] apply the same link reordering to plug-ins/script-fu/Makefile.am Albert pointed out that my first patch was incomplete. This second part fixes the other part of the gimp that uses an external REGEXREPL library -- script-fu.
BTW, the Apache people ran into something similar: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26088 Odd that I don't see the problem on HP-UX 10.20, 11.00, or 11i though.
Ok, patch id#30084 and id#30090 fixes the problem! Thanks Tim. Kudos!
Thanks for the feedback! I agree that it's strange that this doesn't affect more platforms. I'm tempted to build the prereqs and the gimp on IRIX and see what happens without the patches, but may not get to doing that. If you get a chance, test the other two patches too.
The following platforms didn't have problems: IRIX 6.5.21 HP-UX 10.20, 11.00, 11i Solaris 2.5.1-9/SPARC Redhat Linux 7.1, 9 RHEL 2.1, 3.0 Patch id#30085 and #30086 look good. Tested on the above platforms in addition to Tru64 UNIX 4.0d, 5.1, AIX 4.3.3, 5.1, and 5.2 without problems.
Applied to both branches. I am closing this as FIXED now. Please feel free to open a new bug report that deals with syncing our source with an upstream version. In the long run however glib is supposed to provide regex support. 2004-07-30 Sven Neumann <sven@gimp.org> Applied a bunch of small changes contributed by Tim Mooney to fix stack corruption on Tru64 and Aix (bug #129867). * app/Makefile.am * plug-ins/script-fu/Makefile.am: changed the dependency order so that $(REGEXREPL) is linked earlier. * regexrepl/regex.[ch]: fixed check for __STDC__, merged upstream fix for re_max_failures value.