After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 669157 - Crashes when C calls Python callback with Python3
Crashes when C calls Python callback with Python3
Status: RESOLVED NOTGNOME
Product: pygobject
Classification: Bindings
Component: introspection
3.0.x
Other Linux
: Urgent critical
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
: 669968 674475 676264 676283 676325 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2012-02-01 10:21 UTC by kalanzun
Modified: 2012-05-19 08:21 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Test code causing the crash with python3 (908 bytes, text/x-python)
2012-02-01 10:21 UTC, kalanzun
Details

Description kalanzun 2012-02-01 10:21:26 UTC
Created attachment 206560 [details]
Test code causing the crash with python3

The attached test code uses colunm.set_cell_data_func to register a cell data func. It works with python2.7 but crashes with python3. The crash happens when
Gtk tries to call the cell_data_func, in 

  • #1 apply_cell_attributes
    at gtk/gtkcellarea.c line 1284

(gdb) print info->func
$3 = (GtkCellLayoutDataFunc) 0x7ffff7fdffc0

I've tested pygobject-3.0.3, both a self compiled version and the debian version.
Comment 1 Sebastian Pölsterl 2012-02-11 10:46:36 UTC
Thanks for reporting this bug. I just tried your sample with latest version from master and everything worked fine. Could you please try it with version 3.1.0 or later.
Comment 2 Iven 2012-02-13 05:24:13 UTC
I've tested this in Ubuntu(pyobject-3.1.0), and it still exists.

But it seems related to Debian/Ubuntu, since it doesn't appear in Arch Linux(pygobject-3.0.3).
Comment 3 Martin Pitt 2012-02-13 06:23:41 UTC
Iven, I tested it on currrent Ubuntu 12.04 (the only version which has 3.1.0) on amd64, and your example works just fine. Which Ubuntu version/architecture are you using?
Comment 4 Iven 2012-02-13 06:28:07 UTC
(In reply to comment #3)
> Iven, I tested it on currrent Ubuntu 12.04 (the only version which has 3.1.0)
> on amd64, and your example works just fine. Which Ubuntu version/architecture
> are you using?

I'm also using Ubuntu 12.04 amd64. Are you using python3 ? Python2 doesn't have this problem.
Comment 5 Martin Pitt 2012-02-13 06:55:50 UTC
Right, sorry. I did use "python" (2.7) in my previous test. I confirm the crash with python3.
Comment 6 Martin Pitt 2012-02-15 08:13:22 UTC
I don't get the same backtrace as you, but this example program does crash. It also happens in current jhbuild, so it doesn't look like a Debian/Ubuntuism.

I also noticed that when building with Python 3, the testCallbackDestroyNotify test case crashes with SIGILL. Your example is also kind of a callback, so it's  possible that these are related.

3.0.3 and 3.0.0 also crash in jhbuild in that case, so it's not a recent regression in pygobject itself; presumably something in g-i or glib changed underneath, I'll have a closer look at this.
Comment 7 Martin Pitt 2012-02-15 09:02:10 UTC
I tried downgrading g-i and glib, but not much luck, the test case keeps crashing.

Iven, which versions of gobject-introspection, glib, and GTK are you using on Arch?
Comment 8 Iven 2012-02-15 09:31:03 UTC
I'm using:

gobject-introspection 1.30.0
glib2 2.30.2
gtk3 3.2.3
Comment 9 Martin Pitt 2012-02-15 11:42:53 UTC
Debian sid also has glib2 2.30.2 and gtk3 3.2.3, and the test crashes there.

I grabbed the upstream tarballs for gobject-introspection 1.30.0, gtk3 3.2.3, and ran this in a Debian sid environment, and the attached test code still crashes. This is with Python 3.2.2. So perhaps Arch Linux has some patches which makes this work? Or uses a different python3 version perhaps?
Comment 10 Sebastian Pölsterl 2012-02-15 12:22:10 UTC
Just tried it on Fedora 16, no crash there:

pygobject 3.0.3
glib 2.30.2
gobject-introspection 1.30.0
Gtk+ 3.2.3
Python 3.2.1
Comment 11 Martin Pitt 2012-02-15 16:51:46 UTC
Tried on Ubuntu 11.10, segfaults:

pygobject 3.0.0 (with some patches backported from 3.0.x)
glib 2.30.0
gobject-introspection 1.30.0
Gtk+ 3.2.0
Python 3.2.2

I also tried Python 3.2.1 on both Ubuntu 11.10 and 12.04, no difference.

As a data point, this sometimes crashes with a SIGSEGV and sometimes with a SIGILL, but under gdb always with SIGILL. Somewhere in g_closure_invoke() ... g_hash_table_foreach().

Need to leave for today, I hope I can find some time to debug this soon. (Any help appreciated, of course).
Comment 12 Martin Pitt 2012-04-23 16:47:22 UTC
Got another duplicate, bug 674475. This is already exposed by the test suite:

test_callback_destroy_notify (test_everything.TestCallbacks) ... make[2]: *** [check-local] Speicherzugriffsfehler (Speicherauszug erstellt)
Comment 13 Martin Pitt 2012-04-23 16:47:28 UTC
*** Bug 674475 has been marked as a duplicate of this bug. ***
Comment 14 Martin Pitt 2012-04-23 16:49:40 UTC
*** Bug 669968 has been marked as a duplicate of this bug. ***
Comment 15 Martin Pitt 2012-04-25 14:20:27 UTC
Keeping some notes for myself.

I debugged this a little further. gdb'ing at the time when the crash happens is unfortunately rather useless. You get up to the point when it tries to call callback() and then a SIGILL/SIGSEGV, but at this point it's a typeless pointer, so it's not easy to say whether the callback pointer points to something sensible.

However, this only affects functions which get a callback and a GDestroyNotify; callbacks with other scopes work fine.

Drilling down a little futher, the segfault goes away if I hack the end of _pygi_marshal_from_py_interface_callback() to replace


    if (destroy_cache) {
        PyGICClosure *destroy_notify = _pygi_destroy_notify_create ();
        state->in_args[destroy_cache->c_arg_index].v_pointer = destroy_notify->closure;
    }

with

   if (destroy_cache) {
      state->in_args[destroy_cache->c_arg_index].v_pointer = NULL;
   }

or even further, hacking _pygi_destroy_notify_create() to disable this line

        destroy_notify->closure = g_callable_info_prepare_closure ( (GICallableInfo*) glib_destroy_notify,
                                                                    &destroy_notify->cif,
                                                                    _pygi_destroy_notify_callback_closure,
                                                                    NULL);

and replace it with

        destroy_notify->closure = NULL;

Interestingly it still crashes if I leave the g_callable_info_prepare_closure() call active and set destroy_notify->closure = NULL immediately afterwards. I. e. it's not the _calling_ destroy_notify->closure that is the problem: it does not even get that far, I checked by adding a printf. Also, the destroy notify would only be called after calling callback(), but it crashes at that point.
Comment 16 johnp 2012-04-25 15:43:22 UTC
Sounds like a memory allocation error in g_callable_info_prepare_closure which causes the stack to get trashed during subsequent calls.  It might be happening in other systems where you are not seeing the crash but the way they allocate memory may be slightly different so the crash is not triggered.  Try valgrinding the test cases.
Comment 17 Martin Pitt 2012-04-26 08:10:30 UTC
I got back to this for a bit now. Some more observations:

 * check.valgrind does not print out any errors during that one test case, until the crash happens (it does print tons of errors, but they also occur in other tests)

 * g_callable_info_prepare_closure is called hundreds of time in e. g. test_callback_userdata() without any problem.

 * As soon as the first DestroyNotify callback is being prepared by g_callable_info_prepare_closure(), the segfault happens; NB that this happens BEFORE the destroy_notify or any of the _free functions are called; it happens when trying to call the actual callback.

 * After disabling the destroy_notify creation as in comment 15, the test_everything.TestCallbacks.test_callback_userdata_refcount() case fails: it successfully calls the callback 100 times, but then SIGBUSes in regress_test_callback_thaw_notifications().

 * This does not happen on current Fedora 17 beta, which has pretty much the same versions of glib, g-i, and pygobject than I have here. I'm beginning to think that this is some weird compiler version/compiler flags issue. F17 has an older libffi, but I tried an older one as well, and libffi has a gazillion tests.

I'll continue working on this as time permits, but if someone else has any idea, I'm all ears :)
Comment 18 Martin Pitt 2012-04-26 08:24:57 UTC
(In reply to comment #17)
>  * After disabling the destroy_notify creation as in comment 15, the
> test_everything.TestCallbacks.test_callback_userdata_refcount() case fails: it
> successfully calls the callback 100 times, but then SIGBUSes in
> regress_test_callback_thaw_notifications().

Sorry, that was not true, it fails the first time.

This test case does not crash when calling Everything.test_callback_user_data() instead of Everything.test_callback_destroy_notify(). So this confirms a problem with building the destroy_notify FFI closure.
Comment 19 Martin Pitt 2012-04-27 07:56:23 UTC
I have another hour to spend on this, and making some progress. I looked at various versions of libffi, and that doesn't make a difference.

I am now checking variations of python3.2. Keeping notes:

 * Original upstream Python 3.2.3: 

   ./configure --with-system-ffi --with-valgrind --with-system-expat

   test case works, no crash

 * Python 3.2.3 with Debian/Ubuntu patches applied (mostly the same package, same patches): (configure same as above)

  test case works, no crash

 * Original upstream Python 3.2.3 with Debian/Ubuntu compiler and configure options:

  CFLAGS="-g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security " LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" ./configure --enable-ipv6 --enable-loadable-sqlite-extensions --with-dbmliborder=bdb:gdbm --with-wide-unicode --with-computed-gotos --with-system-expat  --with-system-ffi --with-fpectl

  (this just drops --prefix=/usr)

  test case works, no crash

 * Python 3.2.3 with Debian/Ubuntu patches applied and Debian/Ubuntu compiler and configure options (configure same as above):

  test case works, no crash

Argh .. so it's not the libffi version, not the Debian/Ubuntu patches, not the compiler flags (linker/security optimizations, etc.).
Comment 20 Martin Pitt 2012-04-27 09:36:20 UTC
More tests:

 * Python 3.2.3 with Debian/Ubuntu patches applied (mostly the same package,
same patches) and running an extra

  make -j4 profile-opt

  step to rebuild with profile guided optimizations (it's what the Debian packages do):

  test case works, no crash

 * python3.2 Debian package building with DEB_BUILD_OPTIONS=nocheck,noopt (which also disables profile guided optimizations:

  test case fails, crashing

So PGO does not seem to be the culprit either.

 * building upstream python 3.2.3 with a separate build tree:

  test case works
Comment 21 Martin Pitt 2012-05-02 09:42:09 UTC
Kees pointed out that I forgot -D_FORTIFY_SOURCE=2 before, so testing this:

 * Original upstream Python 3.2.3 with Debian/Ubuntu compiler and configure
options:

  CFLAGS="-g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" ./configure --enable-ipv6 --enable-loadable-sqlite-extensions

The test case still works, so it's not that.
Comment 22 Martin Pitt 2012-05-03 05:39:17 UTC
I went through the Debian packaging again, and collected the entirety of build/link flags now:

CPPFLAGS="-D_FORTIFY_SOURCE=2" CFLAGS="-g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -g1 -flto -fuse-linker-plugin" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro" ./configure --enable-ipv6 --enable-loadable-sqlite-extensions --with-dbmliborder=bdb:gdbm --with-wide-unicode --with-computed-gotos --with-system-expat  --with-system-ffi --with-fpectl --prefix=/usr --enable-shared

make -j4 
make install DESTDIR=/tmp/p
vi /tmp/p/usr/bin/python3.2-config   # adjust #! path to /tmp/p

When I run the tests with

  git clean -fdx; PYTHON=/tmp/p/usr/bin/python3.2 ./autogen.sh CFLAGS="-g -O0" && make -j4 && TEST_NAMES=test_everything.TestCallbacks make check

I get the crash. However, when I have it use the locally built python library, it works:

  LD_LIBRARY_PATH=/tmp/p/usr/lib/ TEST_NAMES=test_everything.TestCallbacks make check

Debian/Ubuntu installs the static build into /usr/bin/python3.2, i. e. that does not link to libpython3.2mu.so.1.0 itself. When running the tests with that, libpython is never loaded (as expected).

So we can rule out the existence of the shared library and the new compiler flags "-g1 -flto -fuse-linker-plugin" as well.
Comment 23 Martin Pitt 2012-05-03 07:49:34 UTC
Interesting observation: The failure depends on the python3.2 installation that was used when _building_ pygobject, not when _running_ it.

When I build pygobject against a configuration that works, then the test case still succeeds even when I run the tests against /usr/bin/python3.2 (i. e. the packaged one which triggers the bug). Conversely, if I configure/build pygobject against a "broken" python 3.2, then running it with a python interpreter from a "good" build still fails.

Comparing python3.2-config outputs for the working vs. failing python3.2 configuration:

--cflags:
work:
-I/tmp/pw/usr/include/python3.2mu -I/tmp/pw/usr/include/python3.2mu -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -g1 -flto -fuse-linker-plugin

fail:
-I/tmp/p/usr/include/python3.2mu -I/tmp/p/usr/include/python3.2mu -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security

--ldflags:
work:
-L/usr/lib/python3.2/config-3.2mu -lpthread -ldl -lutil -lm -lpython3.2mu -Xlinker -export-dynamic

fail:
-L/usr/lib/python3.2/config-3.2mu -lpthread -ldl -lutil -lm -lpython3.2mu -Xlinker -export-dynamic -Wl,-O1 -Wl,-Bsymbolic-functions

The other values are identical.

However, as far as I can see, pygobject only uses python3.2-config --includes,which is identical for both builds. It does not use the sysconfig module heavily either, only in configure to determine the site-packages dir.

Both the output of configure and "make V=1" are identical (except for the different $PYTHON paths) for both cases.
Comment 24 Martin Pitt 2012-05-03 08:00:20 UTC
(In reply to comment #23)
> Interesting observation: The failure depends on the python3.2 installation that
> was used when _building_ pygobject, not when _running_ it.
> 
> When I build pygobject against a configuration that works, then the test case
> still succeeds even when I run the tests against /usr/bin/python3.2 (i. e. the
> packaged one which triggers the bug). Conversely, if I configure/build
> pygobject against a "broken" python 3.2, then running it with a python
> interpreter from a "good" build still fails.

OK, this was a red herring, my previous attempt to hack tests/Makefile.am (replacing $(PYTHON) with $$PYTHON and setting it when running "make check") does not work. When I replace $(PYTHON) in the runtests.py call with the actual path (e. g. /tmp/p/usr/bin/python3.2) it fails with the "bad" python 3.2 build and works with the "good" python 3.2 build.

So back to the previous state: There is something wrong with how Debian/Ubuntu builds the python 3.2 package, or it uses an option which causes this bug in the upstream code.
Comment 25 Martin Pitt 2012-05-03 10:04:47 UTC
The Debian package runs this snippet between configure and make, to configure static (builtin) vs. dynamic (extensions) modules:

        egrep \
          "^#($$(awk -v ORS='|' '$$2 ~ /^extension$$/ {print $$1}' debian/PVER-minimal.README.Debian.in)XX)" \
          Modules/Setup.dist \
            | sed -e 's/^#//' -e 's/-Wl,-Bdynamic//;s/-Wl,-Bstatic//' \
            >> $(1)/Modules/Setup.local

When I disable this part, the local build works.

That code results in this Modules/Setup.local:

----------------- 8< ---------------------
# Edit this file for local setup changes
array arraymodule.c     # array objects
math mathmodule.c _math.c # -lm # math library functions, e.g. sin()
_struct _struct.c       # binary structure packing/unpacking
time timemodule.c _time.c # -lm # time operations and variables
_random _randommodule.c # Random number generator
atexit atexitmodule.c      # Register functions to be run at interpreter-shutdown
_elementtree -DUSE_PYEXPAT_CAPI _elementtree.c  # elementtree accelerator
_pickle _pickle.c       # pickle accelerator
_datetime _datetimemodule.c     # datetime accelerator
_bisect _bisectmodule.c # Bisection algorithms
_heapq _heapqmodule.c   # Heap queue algorithm
unicodedata unicodedata.c    # static Unicode character database
fcntl fcntlmodule.c     # fcntl(2) and ioctl(2)
spwd spwdmodule.c               # spwd(3) 
grp grpmodule.c         # grp(3)
select selectmodule.c   # select(2); not on ancient System V
_socket socketmodule.c
_ssl _ssl.c -lssl -lcrypto
_posixsubprocess _posixsubprocess.c  # POSIX subprocess module helper
_hashlib _hashopenssl.c -lssl -lcrypto
syslog syslogmodule.c           # syslog daemon interface
binascii binascii.c
_ctypes _ctypes/_ctypes.c _ctypes/callbacks.c _ctypes/callproc.c _ctypes/stgdict.c _ctypes/cfield.c _ctypes/malloc_closure.c  -lffi
zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz
pyexpat pyexpat.c -lexpat
----------------- 8< ---------------------

Bisecting this quickly leads to the _ctypes module configuration.


So a small reproducer based on the upstream Python 3.2 build is:

 * Configure and build pygobject with PYTHON=python3.2  (or specify the full path for a local python build)

$ cd /tmp/
$ tar xf python3.2_3.2.3.orig.tar.gz
$ cd python3.2-3.2.3
$ ./configure --with-wide-unicode    # to be compatible with the Debian/Ubuntu python 3.2
$ echo '_ctypes _ctypes/_ctypes.c _ctypes/callbacks.c _ctypes/callproc.c _ctypes/stgdict.c _ctypes/cfield.c _ctypes/malloc_closure.c  -lffi' >> Modules/Setup.local
$ make -j4

Then in the pygobject built tree, run

$ (cd tests; TEST_NAMES=test_everything.TestCallbacks PYTHONPATH=..:. LD_LIBRARY_PATH=./.libs GI_TYPELIB_PATH=.  /tmp/python3.2-3.2.3/python ./runtests.py)

which reproduces the failure.

Conversely, if I change the ctypes line in debian/PVER-minimal.README.Debian.in from

  _ctypes               extension

to

  _ctypes               builtin

and build/install the packages, everything works.

At this point I'm convinced enough that this is a bug in python3.2, or in how Debian configures the ctype module, so I'm closing the pygobject side and will continue this on the Debian/Ubuntu bugs:

https://launchpad.net/bugs/909292
http://bugs.debian.org/665359
Comment 26 Martin Pitt 2012-05-19 08:19:35 UTC
*** Bug 676283 has been marked as a duplicate of this bug. ***
Comment 27 Martin Pitt 2012-05-19 08:19:44 UTC
*** Bug 676325 has been marked as a duplicate of this bug. ***
Comment 28 Martin Pitt 2012-05-19 08:21:23 UTC
*** Bug 676264 has been marked as a duplicate of this bug. ***