GNOME Bugzilla – Bug 661383
GnuCash segfault on Linux (Fedora 16 beta) because of shared library load / unload weirdness
Last modified: 2018-06-29 23:01:55 UTC
Created attachment 198719 [details] link gnucash against gnutls explicitly From https://bugzilla.redhat.com/show_bug.cgi?id=703249#c16: Here's what's happening: * gnutls is loaded and initialized when libgncmod-aqbanking.so is loaded * gnutls initializes libgcrypt * when initializing libgcrypt, gnutls passes in pointers to gnutls mutex callback functions * during initialization, libgcrypt uses gnutls mutex management functions to create mutex * this creates private data about the mutex inside gnutls * gnutls is unloaded when libgncmod-aqbanking.so is unloaded * However, LIBGCRYPT IS NOT UNLOADED, because it is linked directly against gnucash rather than loaded dynamically with libgncmod-aqbanking.so * gnutls is loaded and initialized again later when libgncmod-aqbanking.so is loaded again * gnutls initializes libgcrypt again * but libgcrypt was never unloaded and so it thinks it's still initialized * but the private data associated with the mutex that libgcrypt created and still has is no longer valid, because gnutls's private data was erased when it was unloaded * bam, segfault when libgcrypt tries to use the mutex Easiest fix: link gnutls directly against gnutls so it wont' be loaded/unloaded. The attached patch does this.
Created attachment 198797 [details] [review] gnc-module patch Here's what I'm intending to add in Fedora - this patch marks scanned modules to not be unloaded. It seems to match the fact that g_module_close() isn't called in gnc_module_unload.
Created attachment 199038 [details] [review] updated patch Here's an updated patch. On the 'real' open of the module, we need to upgrade the symbol visibility, otherwise assorted things fail. (glade files finding callbacks is the obvious one I noticed.)
Comment on attachment 199038 [details] [review] updated patch The current gnucash codebase never uses any un-loading of modules. Hence, I agree any bad things happening from the assumption that it had worked need to be worked around.
Comment on attachment 199038 [details] [review] updated patch Hmm, I'm not really sure about the call to dlopen. This function is not available when compiling via mingw for Windows, so it probably should be protected. I'm surprised you need to call this function though. The call to open the module is g_module_open(modinfo->module_filepath, 0) in line 501. This mask suggests to load the symbols globally according to http://developer.gnome.org/glib/2.30/glib-Dynamic-Loading-of-Modules.html#GModuleFlags I suspect that the problem may be you are calling g_module_make_resident in the wrong place. You are calling it during the module info gathering step. At that point, the module is only temporarily opened to just get sufficient information about it. It is called there on purpose in a local scope and closed afterwards. Since you call the g_module_make_resident function there, perhaps the module is not really closed then and remains open with symbols in a local scope. Shouldn't g_module_make_resident be called rather at the spot you are calling dlopen, namely right after the module got opened to be used ?
In the meantime, I have my system upgraded to Fedora 16 and I'm "lucky" enough to run into the same segmentation fault. That put me in a good position to debug it :) I hadn't seen the reference to the fedora's bugzilla report above. After reading through it, I realized my proposal in comment 4 would not work. But instead I have slightly reworked Bill's patch to avoid the call to dl_open and still get the desired global scope for the symbols. See commit r21665. If no bad side effects are reported, I'll backport it to the 2.4 branch as well in a few days.
This was backported to 2.4.9. I believe this can be considered fixed now. If the issue still appears, feel free to reopen this report.
Reassign version to 2.4.x so that individual 2.4 versions can be retired.
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=661383. Please update any external references or bookmarks.