GNOME Bugzilla – Bug 669157
Crashes when C calls Python callback with Python3
Last modified: 2012-05-19 08:21:23 UTC
Created attachment 206560 [details] Test code causing the crash with python3 The attached test code uses colunm.set_cell_data_func to register a cell data func. It works with python2.7 but crashes with python3. The crash happens when Gtk tries to call the cell_data_func, in
+ Trace 229574
(gdb) print info->func $3 = (GtkCellLayoutDataFunc) 0x7ffff7fdffc0 I've tested pygobject-3.0.3, both a self compiled version and the debian version.
Thanks for reporting this bug. I just tried your sample with latest version from master and everything worked fine. Could you please try it with version 3.1.0 or later.
I've tested this in Ubuntu(pyobject-3.1.0), and it still exists. But it seems related to Debian/Ubuntu, since it doesn't appear in Arch Linux(pygobject-3.0.3).
Iven, I tested it on currrent Ubuntu 12.04 (the only version which has 3.1.0) on amd64, and your example works just fine. Which Ubuntu version/architecture are you using?
(In reply to comment #3) > Iven, I tested it on currrent Ubuntu 12.04 (the only version which has 3.1.0) > on amd64, and your example works just fine. Which Ubuntu version/architecture > are you using? I'm also using Ubuntu 12.04 amd64. Are you using python3 ? Python2 doesn't have this problem.
Right, sorry. I did use "python" (2.7) in my previous test. I confirm the crash with python3.
I don't get the same backtrace as you, but this example program does crash. It also happens in current jhbuild, so it doesn't look like a Debian/Ubuntuism. I also noticed that when building with Python 3, the testCallbackDestroyNotify test case crashes with SIGILL. Your example is also kind of a callback, so it's possible that these are related. 3.0.3 and 3.0.0 also crash in jhbuild in that case, so it's not a recent regression in pygobject itself; presumably something in g-i or glib changed underneath, I'll have a closer look at this.
I tried downgrading g-i and glib, but not much luck, the test case keeps crashing. Iven, which versions of gobject-introspection, glib, and GTK are you using on Arch?
I'm using: gobject-introspection 1.30.0 glib2 2.30.2 gtk3 3.2.3
Debian sid also has glib2 2.30.2 and gtk3 3.2.3, and the test crashes there. I grabbed the upstream tarballs for gobject-introspection 1.30.0, gtk3 3.2.3, and ran this in a Debian sid environment, and the attached test code still crashes. This is with Python 3.2.2. So perhaps Arch Linux has some patches which makes this work? Or uses a different python3 version perhaps?
Just tried it on Fedora 16, no crash there: pygobject 3.0.3 glib 2.30.2 gobject-introspection 1.30.0 Gtk+ 3.2.3 Python 3.2.1
Tried on Ubuntu 11.10, segfaults: pygobject 3.0.0 (with some patches backported from 3.0.x) glib 2.30.0 gobject-introspection 1.30.0 Gtk+ 3.2.0 Python 3.2.2 I also tried Python 3.2.1 on both Ubuntu 11.10 and 12.04, no difference. As a data point, this sometimes crashes with a SIGSEGV and sometimes with a SIGILL, but under gdb always with SIGILL. Somewhere in g_closure_invoke() ... g_hash_table_foreach(). Need to leave for today, I hope I can find some time to debug this soon. (Any help appreciated, of course).
Got another duplicate, bug 674475. This is already exposed by the test suite: test_callback_destroy_notify (test_everything.TestCallbacks) ... make[2]: *** [check-local] Speicherzugriffsfehler (Speicherauszug erstellt)
*** Bug 674475 has been marked as a duplicate of this bug. ***
*** Bug 669968 has been marked as a duplicate of this bug. ***
Keeping some notes for myself. I debugged this a little further. gdb'ing at the time when the crash happens is unfortunately rather useless. You get up to the point when it tries to call callback() and then a SIGILL/SIGSEGV, but at this point it's a typeless pointer, so it's not easy to say whether the callback pointer points to something sensible. However, this only affects functions which get a callback and a GDestroyNotify; callbacks with other scopes work fine. Drilling down a little futher, the segfault goes away if I hack the end of _pygi_marshal_from_py_interface_callback() to replace if (destroy_cache) { PyGICClosure *destroy_notify = _pygi_destroy_notify_create (); state->in_args[destroy_cache->c_arg_index].v_pointer = destroy_notify->closure; } with if (destroy_cache) { state->in_args[destroy_cache->c_arg_index].v_pointer = NULL; } or even further, hacking _pygi_destroy_notify_create() to disable this line destroy_notify->closure = g_callable_info_prepare_closure ( (GICallableInfo*) glib_destroy_notify, &destroy_notify->cif, _pygi_destroy_notify_callback_closure, NULL); and replace it with destroy_notify->closure = NULL; Interestingly it still crashes if I leave the g_callable_info_prepare_closure() call active and set destroy_notify->closure = NULL immediately afterwards. I. e. it's not the _calling_ destroy_notify->closure that is the problem: it does not even get that far, I checked by adding a printf. Also, the destroy notify would only be called after calling callback(), but it crashes at that point.
Sounds like a memory allocation error in g_callable_info_prepare_closure which causes the stack to get trashed during subsequent calls. It might be happening in other systems where you are not seeing the crash but the way they allocate memory may be slightly different so the crash is not triggered. Try valgrinding the test cases.
I got back to this for a bit now. Some more observations: * check.valgrind does not print out any errors during that one test case, until the crash happens (it does print tons of errors, but they also occur in other tests) * g_callable_info_prepare_closure is called hundreds of time in e. g. test_callback_userdata() without any problem. * As soon as the first DestroyNotify callback is being prepared by g_callable_info_prepare_closure(), the segfault happens; NB that this happens BEFORE the destroy_notify or any of the _free functions are called; it happens when trying to call the actual callback. * After disabling the destroy_notify creation as in comment 15, the test_everything.TestCallbacks.test_callback_userdata_refcount() case fails: it successfully calls the callback 100 times, but then SIGBUSes in regress_test_callback_thaw_notifications(). * This does not happen on current Fedora 17 beta, which has pretty much the same versions of glib, g-i, and pygobject than I have here. I'm beginning to think that this is some weird compiler version/compiler flags issue. F17 has an older libffi, but I tried an older one as well, and libffi has a gazillion tests. I'll continue working on this as time permits, but if someone else has any idea, I'm all ears :)
(In reply to comment #17) > * After disabling the destroy_notify creation as in comment 15, the > test_everything.TestCallbacks.test_callback_userdata_refcount() case fails: it > successfully calls the callback 100 times, but then SIGBUSes in > regress_test_callback_thaw_notifications(). Sorry, that was not true, it fails the first time. This test case does not crash when calling Everything.test_callback_user_data() instead of Everything.test_callback_destroy_notify(). So this confirms a problem with building the destroy_notify FFI closure.
I have another hour to spend on this, and making some progress. I looked at various versions of libffi, and that doesn't make a difference. I am now checking variations of python3.2. Keeping notes: * Original upstream Python 3.2.3: ./configure --with-system-ffi --with-valgrind --with-system-expat test case works, no crash * Python 3.2.3 with Debian/Ubuntu patches applied (mostly the same package, same patches): (configure same as above) test case works, no crash * Original upstream Python 3.2.3 with Debian/Ubuntu compiler and configure options: CFLAGS="-g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security " LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" ./configure --enable-ipv6 --enable-loadable-sqlite-extensions --with-dbmliborder=bdb:gdbm --with-wide-unicode --with-computed-gotos --with-system-expat --with-system-ffi --with-fpectl (this just drops --prefix=/usr) test case works, no crash * Python 3.2.3 with Debian/Ubuntu patches applied and Debian/Ubuntu compiler and configure options (configure same as above): test case works, no crash Argh .. so it's not the libffi version, not the Debian/Ubuntu patches, not the compiler flags (linker/security optimizations, etc.).
More tests: * Python 3.2.3 with Debian/Ubuntu patches applied (mostly the same package, same patches) and running an extra make -j4 profile-opt step to rebuild with profile guided optimizations (it's what the Debian packages do): test case works, no crash * python3.2 Debian package building with DEB_BUILD_OPTIONS=nocheck,noopt (which also disables profile guided optimizations: test case fails, crashing So PGO does not seem to be the culprit either. * building upstream python 3.2.3 with a separate build tree: test case works
Kees pointed out that I forgot -D_FORTIFY_SOURCE=2 before, so testing this: * Original upstream Python 3.2.3 with Debian/Ubuntu compiler and configure options: CFLAGS="-g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" ./configure --enable-ipv6 --enable-loadable-sqlite-extensions The test case still works, so it's not that.
I went through the Debian packaging again, and collected the entirety of build/link flags now: CPPFLAGS="-D_FORTIFY_SOURCE=2" CFLAGS="-g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -g1 -flto -fuse-linker-plugin" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" ./configure --enable-ipv6 --enable-loadable-sqlite-extensions --with-dbmliborder=bdb:gdbm --with-wide-unicode --with-computed-gotos --with-system-expat --with-system-ffi --with-fpectl --prefix=/usr --enable-shared make -j4 make install DESTDIR=/tmp/p vi /tmp/p/usr/bin/python3.2-config # adjust #! path to /tmp/p When I run the tests with git clean -fdx; PYTHON=/tmp/p/usr/bin/python3.2 ./autogen.sh CFLAGS="-g -O0" && make -j4 && TEST_NAMES=test_everything.TestCallbacks make check I get the crash. However, when I have it use the locally built python library, it works: LD_LIBRARY_PATH=/tmp/p/usr/lib/ TEST_NAMES=test_everything.TestCallbacks make check Debian/Ubuntu installs the static build into /usr/bin/python3.2, i. e. that does not link to libpython3.2mu.so.1.0 itself. When running the tests with that, libpython is never loaded (as expected). So we can rule out the existence of the shared library and the new compiler flags "-g1 -flto -fuse-linker-plugin" as well.
Interesting observation: The failure depends on the python3.2 installation that was used when _building_ pygobject, not when _running_ it. When I build pygobject against a configuration that works, then the test case still succeeds even when I run the tests against /usr/bin/python3.2 (i. e. the packaged one which triggers the bug). Conversely, if I configure/build pygobject against a "broken" python 3.2, then running it with a python interpreter from a "good" build still fails. Comparing python3.2-config outputs for the working vs. failing python3.2 configuration: --cflags: work: -I/tmp/pw/usr/include/python3.2mu -I/tmp/pw/usr/include/python3.2mu -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -g1 -flto -fuse-linker-plugin fail: -I/tmp/p/usr/include/python3.2mu -I/tmp/p/usr/include/python3.2mu -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security --ldflags: work: -L/usr/lib/python3.2/config-3.2mu -lpthread -ldl -lutil -lm -lpython3.2mu -Xlinker -export-dynamic fail: -L/usr/lib/python3.2/config-3.2mu -lpthread -ldl -lutil -lm -lpython3.2mu -Xlinker -export-dynamic -Wl,-O1 -Wl,-Bsymbolic-functions The other values are identical. However, as far as I can see, pygobject only uses python3.2-config --includes,which is identical for both builds. It does not use the sysconfig module heavily either, only in configure to determine the site-packages dir. Both the output of configure and "make V=1" are identical (except for the different $PYTHON paths) for both cases.
(In reply to comment #23) > Interesting observation: The failure depends on the python3.2 installation that > was used when _building_ pygobject, not when _running_ it. > > When I build pygobject against a configuration that works, then the test case > still succeeds even when I run the tests against /usr/bin/python3.2 (i. e. the > packaged one which triggers the bug). Conversely, if I configure/build > pygobject against a "broken" python 3.2, then running it with a python > interpreter from a "good" build still fails. OK, this was a red herring, my previous attempt to hack tests/Makefile.am (replacing $(PYTHON) with $$PYTHON and setting it when running "make check") does not work. When I replace $(PYTHON) in the runtests.py call with the actual path (e. g. /tmp/p/usr/bin/python3.2) it fails with the "bad" python 3.2 build and works with the "good" python 3.2 build. So back to the previous state: There is something wrong with how Debian/Ubuntu builds the python 3.2 package, or it uses an option which causes this bug in the upstream code.
The Debian package runs this snippet between configure and make, to configure static (builtin) vs. dynamic (extensions) modules: egrep \ "^#($$(awk -v ORS='|' '$$2 ~ /^extension$$/ {print $$1}' debian/PVER-minimal.README.Debian.in)XX)" \ Modules/Setup.dist \ | sed -e 's/^#//' -e 's/-Wl,-Bdynamic//;s/-Wl,-Bstatic//' \ >> $(1)/Modules/Setup.local When I disable this part, the local build works. That code results in this Modules/Setup.local: ----------------- 8< --------------------- # Edit this file for local setup changes array arraymodule.c # array objects math mathmodule.c _math.c # -lm # math library functions, e.g. sin() _struct _struct.c # binary structure packing/unpacking time timemodule.c _time.c # -lm # time operations and variables _random _randommodule.c # Random number generator atexit atexitmodule.c # Register functions to be run at interpreter-shutdown _elementtree -DUSE_PYEXPAT_CAPI _elementtree.c # elementtree accelerator _pickle _pickle.c # pickle accelerator _datetime _datetimemodule.c # datetime accelerator _bisect _bisectmodule.c # Bisection algorithms _heapq _heapqmodule.c # Heap queue algorithm unicodedata unicodedata.c # static Unicode character database fcntl fcntlmodule.c # fcntl(2) and ioctl(2) spwd spwdmodule.c # spwd(3) grp grpmodule.c # grp(3) select selectmodule.c # select(2); not on ancient System V _socket socketmodule.c _ssl _ssl.c -lssl -lcrypto _posixsubprocess _posixsubprocess.c # POSIX subprocess module helper _hashlib _hashopenssl.c -lssl -lcrypto syslog syslogmodule.c # syslog daemon interface binascii binascii.c _ctypes _ctypes/_ctypes.c _ctypes/callbacks.c _ctypes/callproc.c _ctypes/stgdict.c _ctypes/cfield.c _ctypes/malloc_closure.c -lffi zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz pyexpat pyexpat.c -lexpat ----------------- 8< --------------------- Bisecting this quickly leads to the _ctypes module configuration. So a small reproducer based on the upstream Python 3.2 build is: * Configure and build pygobject with PYTHON=python3.2 (or specify the full path for a local python build) $ cd /tmp/ $ tar xf python3.2_3.2.3.orig.tar.gz $ cd python3.2-3.2.3 $ ./configure --with-wide-unicode # to be compatible with the Debian/Ubuntu python 3.2 $ echo '_ctypes _ctypes/_ctypes.c _ctypes/callbacks.c _ctypes/callproc.c _ctypes/stgdict.c _ctypes/cfield.c _ctypes/malloc_closure.c -lffi' >> Modules/Setup.local $ make -j4 Then in the pygobject built tree, run $ (cd tests; TEST_NAMES=test_everything.TestCallbacks PYTHONPATH=..:. LD_LIBRARY_PATH=./.libs GI_TYPELIB_PATH=. /tmp/python3.2-3.2.3/python ./runtests.py) which reproduces the failure. Conversely, if I change the ctypes line in debian/PVER-minimal.README.Debian.in from _ctypes extension to _ctypes builtin and build/install the packages, everything works. At this point I'm convinced enough that this is a bug in python3.2, or in how Debian configures the ctype module, so I'm closing the pygobject side and will continue this on the Debian/Ubuntu bugs: https://launchpad.net/bugs/909292 http://bugs.debian.org/665359
*** Bug 676283 has been marked as a duplicate of this bug. ***
*** Bug 676325 has been marked as a duplicate of this bug. ***
*** Bug 676264 has been marked as a duplicate of this bug. ***