GNOME Bugzilla – Bug 327479
warp plugin segfaults on 64bit arches
Last modified: 2006-08-08 17:41:08 UTC
Steps to reproduce: 1. Prepare a layer warpable 2. use the warp filter 3. segfault Stack trace: 1. /usr/lib64/gimp/2.0/plug-ins/warp: fatal error: Segmentation fault 2. /usr/lib64/gimp/2.0/plug-ins/warp (pid:4579): [E]xit, [H]alt, show [S]tack trace or [P]roceed: S 3. #0 0x00002aaaac6edff4 in waitpid () from /lib/tls/libc.so.6 4. #1 0x00002aaaac4e2c58 in g_on_error_stack_trace () 5. #2 0x00002aaaac4e31ba in g_on_error_query () from /usr/lib/libglib-2.0.so.0 6. #3 0x00002aaaab016471 in gimp_attach_new_parasite () 7. #4 <signal handler called> 8. #5 0x000000000040521f in ?? () 9. #6 0x0000000000405777 in ?? () 10. #7 0x0000000000406862 in ?? () 11. #8 0x00000000004032ba in ?? () 12. #9 0x00002aaaab016a62 in gimp_attach_new_parasite () 13. #10 0x00002aaaab016783 in gimp_attach_new_parasite () 14. #11 0x00002aaaab0146db in gimp_main () from /usr/lib/libgimp-2.0.so.0 15. #12 0x0000000000402f18 in ?? () 16. #13 0x00002aaaac67e674 in __libc_start_main () from /lib/tls/libc.so.6 17. #14 0x0000000000402e6a in ?? () 18. #15 0x00007fffffc5d418 in ?? () 19. #16 0x000000000000001c in ?? () 20. #17 0x0000000000000006 in ?? () 21. #18 0x00007fffffc5e851 in ?? () 22. #19 0x00007fffffc5e873 in ?? () 23. #20 0x00007fffffc5e879 in ?? () 24. #21 0x00007fffffc5e87b in ?? () 25. #22 0x00007fffffc5e87d in ?? () 26. #23 0x00007fffffc5e882 in ?? () 27. #24 0x0000000000000000 in ?? () Other information: I know that isn't quite meaningful, since the warp plugin is working on ppc, probably the issue is simd related. I'll try to get a trace from a debug gimp as soon as I can access an amd64 box
Eventually I managed to get a decent stack trace
+ Trace 65366
same issue with the current cvs again, probably there is a misalignment in the SIMD code, on ppc is working fine
For the record, could you tell us what distro you are using for these tests? OS is set to "all" here, but I presume you are talking about Linux. Thanks.
I guess that's more or less all you need Portage 2.1_rc2-r1 (default-linux/amd64/2005.1, gcc-3.4.6, glibc-2.4-r2, 2.6.16-gentoo-r2 x86_64) ================================================================= System uname: 2.6.16-gentoo-r2 x86_64 AMD Athlon(tm) 64 Processor 3500+ Gentoo Base System version 1.12.0_pre19 ccache version 2.4 [disabled] dev-lang/python: 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.4-r1 dev-util/confcache: 0.4.2 sys-apps/sandbox: 1.2.18.1 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r2 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r3 CBUILD="x86_64-pc-linux-gnu" CFLAGS="-mtune=k8 -mcpu=k8 -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CXXFLAGS="-mtune=k8 -mcpu=k8 -O2 -pipe" Unset: ASFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, MAKEOPTS, PORTAGE_RSYNC_EXTRA_OPTS
I was saying to myself, I bet he is using Gentoo. But I didn't dare write it down... Gentoo has a history of producing bugs that do not show up for other distros, and are very difficult to resolve because of the variability in how things are built.
Umm, following up on an irc conversation with Brix yesterday, maybe I should explain that I would not have written comment #5 had I any way of knowing that you are actually a Gentoo developer. When I read this bug report (and your other one), as far as I could tell the situation was that (1) you were getting bugs that other people hadn't reported and I couldn't reproduce; (2) you were using Gentoo, and (3) you were not aware that using Gentoo was relevant, because you didn't mention the fact. I gave a response that was appropriate to that situation, in my opinion. The biggest problem with Gentoo, from a GIMP develper's point of view, has been that bugs appearing in Gentoo often have not been reproducible in other distros and have often turned out not to be GIMP problems -- but if Gentoo developers are able to help out in figuring things out, that definitely makes things a lot easier and is definitely welcome.
I can reproduce this on Gentoo AMD64, this is the stack trace I get: This is a development version of GIMP. Debug messages may appear here. /usr/lib64/gimp/2.0/plug-ins/warp: fatal error: Segmentation fault /usr/lib64/gimp/2.0/plug-ins/warp (pid:10400): [E]xit, [H]alt, show [S]tack trace or [P]roceed: s
+ Trace 68446
Unfortunately this does not seem to be the full output, although my system is built with full debug information.
Created attachment 66224 [details] Full backtrace Here it is the full backtrace with all the variables _and_ the line numbers. Seems like the default script truncates the data, this should be more helpful. I have to inform you that my system is not what can be considered "stable", but with this information it should be easy to find if it's a problem of the source or of codegen.
It might be informative to try the following, if you can: go into gimp/plug-ins/common, edit the Makefile to remove the optimization from the CFLAGS line (if it is not -O2, please mention this), then as root do "make install-warp", which will cause only the warp plugin to be rebuilt. It would be interesting to know whether it still crashes, and if it does, the stack trace you get will be more helpful.
Created attachment 66228 [details] Full backtrace (without optimisations on warp plugin) Here it is, this is with -ggdb only. My usual CFLAGS are CFLAGS="-march=athlon64 -Os -ftracer -pipe -ftree-vectorize -Wformat=2 -Wno-error -Wno-pointer-sign -g -ggdb -Wstrict-aliasing=2" so not exactly a "standard" setup, but with GCC 4.1 this never was a problem for me (never experienced occasional crashes that weren't caused by sources).
bug reproduced with the current cvs on a ppc64, the toolchain is the same and again the distribution is Gentoo.
As far as I can see the crash is on this line in diff_prepare_row: data[-pixel_rgn->bpp + b] = data[b]; and if I'm not mistaken the problem is that pixel_rgn->bpp is a guint - unsigned int - which is 4 bytes on amd64. The "-pixel_rgn->bpp" part is calculated in 32 bits which yields a positive integer close to 2^32 (positive since bpp is unsigned). I don't know exactly what happens to the types when b is added (it is a signed int) but the end result seems to be that we get a 64-bit integer with a value close to 2^32 and therefore read outside the "data" array. One way to fix it is to cast the bpp to signed int first: data[-(gint)pixel_rgn->bpp + b] = data[b];
Created attachment 68345 [details] [review] Suggested patch (cast pixel_rgn->bpp to gint in diff_prepare_row) Here's a simple patch implementing what I said in the previous comment. It seems to work fine as far as I can see - the crash is gone and the output seems reasonable but I don't have any expected output to compare with.
That patch looks fine, please commit. (and another great example why it is almost *always* a bad idea to use unsigned when not absolutely needed)
I don't have CVS write access, could you please apply the patch? (Apologies for the late reply)
Applied to both branches. Thanks a lot for the patch.