After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 701692 - g_callable_info_get_n_args assertion fail -> core dump
g_callable_info_get_n_args assertion fail -> core dump
Status: RESOLVED DUPLICATE of bug 688694
Product: pygobject
Classification: Bindings
Component: introspection
3.8.x
Other Linux
: Normal normal
: ---
Assigned To: Nobody's working on this now (help wanted and appreciated)
Python bindings maintainers
Depends on:
Blocks:
 
 
Reported: 2013-06-06 03:55 UTC by Kip
Modified: 2014-02-10 19:06 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Backtrace (132.75 KB, text/plain)
2014-02-08 21:14 UTC, Benjamin Berg
Details

Description Kip 2013-06-06 03:55:39 UTC
I'm getting this very rare and unpredictable error which is difficult to replicate. I have a secondary Python worker thread which very rarely creates the following error:

        ** (Main.py:32524): CRITICAL **: g_callable_info_get_n_args:
        assertion `GI_IS_CALLABLE_INFO (info)' failed
        **
        ERROR:girepository/gicallableinfo.c:150:g_callable_info_is_method: code should not be reached
        Aborted (core dumped)

With something like that, I'm totally stumped on where to begin
debugging, but I'm guessing this could be a GObject introspection bug. 

One thing is clear, I do call the following before doing anything with threads...

        GObject.threads_init()
        Gdk.threads_init() 

At the time of the crash, the following worker thread was known to be executing:

<https://bazaar.launchpad.net/~avaneya/avaneya/trunk/view/head:/Extras/VLR/Launcher/Source/VerificationProgressPage.py#L156>
Comment 1 Kip 2013-06-06 03:56:59 UTC
If I'm not mistaken, I believe this is where the assertion is failing:

<https://github.com/magcius/gobject-introspection/blob/master/girepository/gicallableinfo.c#L288>
Comment 2 Kip 2013-06-10 03:56:55 UTC
Since I haven't been able to solve this issue yet, I'm going to have to refactor my code to use asynchronous I/O without a separate thread since I've only noticed this issue when the worker thread is running. Because of this, the link in comment #1 will no longer be relevant. This is a link to the last revision using the Python threading model.

<https://bazaar.launchpad.net/~avaneya/avaneya/trunk/view/306/Extras/VLR/Launcher/Source/VerificationProgressPage.py#L156>
Comment 3 Simon Feltman 2013-06-11 01:58:14 UTC
Hi Kip,

This is most likely a PyGObject bug, the assertion happens when bad data is passed to the function, but it is a bit hard to tell exactly what is going on.

I looked through the code and nothing really stuck out that seems problematic. Some of the following info would be helpful in trying to address this issue:

* What version of PyGObject is being used? There have been a number of memory bugs fixed in 3.8, it might be worthwhile to test there.
* Both C and Python stack traces would be useful.
* Having a simpler script which just shows the bug (something running in a loop or what not if it is intermittent) or at least very clear instructions for how to install/launch the app to reproduce the problem would also be a requirement to get any traction on this. But a simplified self contained script is preferred.
Comment 4 Simon Feltman 2013-06-11 02:00:47 UTC
Moving to PyGObject.

Also knowing the system architecture would be helpful here.
Comment 5 Kip 2013-06-11 02:13:01 UTC
Hey Simon. Glad to be of assistance. Since the code that exhibited this problem I am refactoring to no longer use Python threads (use synchronous IO instead), I hopefully won't have this issue anymore. However, I still hope we manage to nail it in case anyone else is having it.

Like you, I can't really figure out what the issue is, but I suspect it's something lower level. I figured that because looking at the assertion in gicallableinfo.c, that code should probably have never been reached - even with broken high level Python code. Since the Python interpreter isn't actually multithreaded and Python threads are built on top of that, I wonder if that might have had something to do with it.

My system architecture is amd64. The package version of python-gi is 3.8.0-2ubuntu1 (Xubuntu Raring, 13.04).

If I get a chance, I'll try and produce a minimal. But knowing what the minimal code needs to be would be best aided with a stack trace. I'd love to get you a stack trace, but I don't know how. With compiled code, it's more straightforward. But with the Python code, it just bails with the above assertion failure. It's more difficult to run it directly from within a Python interpreter session simply because the Python code is bootstrapped from another bash script (via XDG autostart).

Let me know if I can help with anything else.
Comment 6 Simon Feltman 2013-06-17 14:18:33 UTC
Moving importance down for now due to a lack of test case and reproducibility.
Comment 7 Simon Feltman 2013-06-17 14:25:27 UTC
For generating a stack trace, you might have a look at https://live.gnome.org/GettingTraces

You should be able to point gdb at "python script.py" for example to get the C stack trace.
Comment 8 Kip 2013-06-17 23:04:34 UTC
Hey Simon. Since my code is refactored now to no longer use Python threads, I won't have an opportunity to try and replicate this core dump any time soon. However, if I get some time in the future, I will come back to this and try again with the last broken revision of that code before it was refactored. I will use GDB in tandem with the Python interpreter as you recommended and report back to you.
Comment 9 Simon Feltman 2013-10-04 21:21:44 UTC
Closing since this is not reproducible and the OP's code has been refactored since. Please re-open if feel otherwise or can supply a script which reproduces the problem.
Comment 10 Benjamin Berg 2014-02-08 21:14:54 UTC
Created attachment 268527 [details]
Backtrace

The attached backtrace happens for me on debian testing. The python-gi version on the machine is 3.10.2-2.

There are some things to note about the backtrace/application:
 * The thread crashing in the call to GLib.idle_add.
 * There are a lot of recursive signal emissions happening in different worker threads.

Simon noted that there is a bug where the GIL is not released when calling g_signal_emitv. Though it seems unlikely that this might cause issues (it could be a performance impact in some cases).

(I am not allowed to reopen the issue, so it would be nice if someone else does this.)
Comment 11 Kip 2014-02-08 22:57:28 UTC
Thanks a lot Benjamin. That was very insightful and I'm happy someone else managed to replicate this problem.
Comment 12 Benjamin Berg 2014-02-09 09:58:03 UTC
Hm, a script that does a lot of recursive signal emissions paired with idle_add/timeout_add (which is what is going on in my application) does not exhibit the problem. So it seems like there is more to it than just the recursive emission, or multiple threads calling the same function at the same time.
Comment 13 Benjamin Berg 2014-02-10 12:24:14 UTC
OK, I am pretty sure it is the refcounting issue that was fixed in bug 688694. This fix is only in 1.37.1 of libgirepository, while I still have 1.36.0 (I am applying the patch right now).

So, the bug is already fixed; thanks for everyone looking into this :-)

I'll reopen the issue should I be able to reproduce with the applied patch; but I don't expect that will be the case.

*** This bug has been marked as a duplicate of bug 688694 ***
Comment 14 Benjamin Berg 2014-02-10 12:27:46 UTC
Oh, just in case anyone is wondering, the unref/ref in question happens inside _invoke_callable, where the lock GIL is not held anymore. So during that small period we can get multiple threads refing/unrefing the same interface.
Comment 15 Kip 2014-02-10 19:06:48 UTC
Good eye Benjamin. Thanks a lot for sharing with us.