GNOME Bugzilla – Bug 701692
g_callable_info_get_n_args assertion fail -> core dump
Last modified: 2014-02-10 19:06:48 UTC
I'm getting this very rare and unpredictable error which is difficult to replicate. I have a secondary Python worker thread which very rarely creates the following error: ** (Main.py:32524): CRITICAL **: g_callable_info_get_n_args: assertion `GI_IS_CALLABLE_INFO (info)' failed ** ERROR:girepository/gicallableinfo.c:150:g_callable_info_is_method: code should not be reached Aborted (core dumped) With something like that, I'm totally stumped on where to begin debugging, but I'm guessing this could be a GObject introspection bug. One thing is clear, I do call the following before doing anything with threads... GObject.threads_init() Gdk.threads_init() At the time of the crash, the following worker thread was known to be executing: <https://bazaar.launchpad.net/~avaneya/avaneya/trunk/view/head:/Extras/VLR/Launcher/Source/VerificationProgressPage.py#L156>
If I'm not mistaken, I believe this is where the assertion is failing: <https://github.com/magcius/gobject-introspection/blob/master/girepository/gicallableinfo.c#L288>
Since I haven't been able to solve this issue yet, I'm going to have to refactor my code to use asynchronous I/O without a separate thread since I've only noticed this issue when the worker thread is running. Because of this, the link in comment #1 will no longer be relevant. This is a link to the last revision using the Python threading model. <https://bazaar.launchpad.net/~avaneya/avaneya/trunk/view/306/Extras/VLR/Launcher/Source/VerificationProgressPage.py#L156>
Hi Kip, This is most likely a PyGObject bug, the assertion happens when bad data is passed to the function, but it is a bit hard to tell exactly what is going on. I looked through the code and nothing really stuck out that seems problematic. Some of the following info would be helpful in trying to address this issue: * What version of PyGObject is being used? There have been a number of memory bugs fixed in 3.8, it might be worthwhile to test there. * Both C and Python stack traces would be useful. * Having a simpler script which just shows the bug (something running in a loop or what not if it is intermittent) or at least very clear instructions for how to install/launch the app to reproduce the problem would also be a requirement to get any traction on this. But a simplified self contained script is preferred.
Moving to PyGObject. Also knowing the system architecture would be helpful here.
Hey Simon. Glad to be of assistance. Since the code that exhibited this problem I am refactoring to no longer use Python threads (use synchronous IO instead), I hopefully won't have this issue anymore. However, I still hope we manage to nail it in case anyone else is having it. Like you, I can't really figure out what the issue is, but I suspect it's something lower level. I figured that because looking at the assertion in gicallableinfo.c, that code should probably have never been reached - even with broken high level Python code. Since the Python interpreter isn't actually multithreaded and Python threads are built on top of that, I wonder if that might have had something to do with it. My system architecture is amd64. The package version of python-gi is 3.8.0-2ubuntu1 (Xubuntu Raring, 13.04). If I get a chance, I'll try and produce a minimal. But knowing what the minimal code needs to be would be best aided with a stack trace. I'd love to get you a stack trace, but I don't know how. With compiled code, it's more straightforward. But with the Python code, it just bails with the above assertion failure. It's more difficult to run it directly from within a Python interpreter session simply because the Python code is bootstrapped from another bash script (via XDG autostart). Let me know if I can help with anything else.
Moving importance down for now due to a lack of test case and reproducibility.
For generating a stack trace, you might have a look at https://live.gnome.org/GettingTraces You should be able to point gdb at "python script.py" for example to get the C stack trace.
Hey Simon. Since my code is refactored now to no longer use Python threads, I won't have an opportunity to try and replicate this core dump any time soon. However, if I get some time in the future, I will come back to this and try again with the last broken revision of that code before it was refactored. I will use GDB in tandem with the Python interpreter as you recommended and report back to you.
Closing since this is not reproducible and the OP's code has been refactored since. Please re-open if feel otherwise or can supply a script which reproduces the problem.
Created attachment 268527 [details] Backtrace The attached backtrace happens for me on debian testing. The python-gi version on the machine is 3.10.2-2. There are some things to note about the backtrace/application: * The thread crashing in the call to GLib.idle_add. * There are a lot of recursive signal emissions happening in different worker threads. Simon noted that there is a bug where the GIL is not released when calling g_signal_emitv. Though it seems unlikely that this might cause issues (it could be a performance impact in some cases). (I am not allowed to reopen the issue, so it would be nice if someone else does this.)
Thanks a lot Benjamin. That was very insightful and I'm happy someone else managed to replicate this problem.
Hm, a script that does a lot of recursive signal emissions paired with idle_add/timeout_add (which is what is going on in my application) does not exhibit the problem. So it seems like there is more to it than just the recursive emission, or multiple threads calling the same function at the same time.
OK, I am pretty sure it is the refcounting issue that was fixed in bug 688694. This fix is only in 1.37.1 of libgirepository, while I still have 1.36.0 (I am applying the patch right now). So, the bug is already fixed; thanks for everyone looking into this :-) I'll reopen the issue should I be able to reproduce with the applied patch; but I don't expect that will be the case. *** This bug has been marked as a duplicate of bug 688694 ***
Oh, just in case anyone is wondering, the unref/ref in question happens inside _invoke_callable, where the lock GIL is not held anymore. So during that small period we can get multiple threads refing/unrefing the same interface.
Good eye Benjamin. Thanks a lot for sharing with us.