After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 661383 - GnuCash segfault on Linux (Fedora 16 beta) because of shared library load / unload weirdness
GnuCash segfault on Linux (Fedora 16 beta) because of shared library load / u...
Status: RESOLVED FIXED
Product: GnuCash
Classification: Other
Component: Build system
2.4.x
Other Linux
: Normal normal
: ---
Assigned To: Geert Janssens
Christian Stimming
Depends on:
Blocks:
 
 
Reported: 2011-10-10 15:32 UTC by Jonathan Kamens
Modified: 2018-06-29 23:01 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
link gnucash against gnutls explicitly (554 bytes, text/plain)
2011-10-10 15:32 UTC, Jonathan Kamens
  Details
gnc-module patch (571 bytes, patch)
2011-10-11 16:32 UTC, Bill Nottingham
none Details | Review
updated patch (1.10 KB, patch)
2011-10-14 21:37 UTC, Bill Nottingham
needs-work Details | Review

Description Jonathan Kamens 2011-10-10 15:32:37 UTC
Created attachment 198719 [details]
link gnucash against gnutls explicitly

From https://bugzilla.redhat.com/show_bug.cgi?id=703249#c16:

Here's what's happening:

* gnutls is loaded and initialized when libgncmod-aqbanking.so is loaded
* gnutls initializes libgcrypt
  * when initializing libgcrypt, gnutls passes in pointers to gnutls mutex
    callback functions
* during initialization, libgcrypt uses gnutls mutex management functions to
  create mutex
  * this creates private data about the mutex inside gnutls
* gnutls is unloaded when libgncmod-aqbanking.so is unloaded
* However, LIBGCRYPT IS NOT UNLOADED, because it is linked directly against
  gnucash rather than loaded dynamically with libgncmod-aqbanking.so
* gnutls is loaded and initialized again later when libgncmod-aqbanking.so is
  loaded again
* gnutls initializes libgcrypt again
* but libgcrypt was never unloaded and so it thinks it's still initialized
* but the private data associated with the mutex that libgcrypt created and
  still has is no longer valid, because gnutls's private data was erased when
  it was unloaded
* bam, segfault when libgcrypt tries to use the mutex

Easiest fix: link gnutls directly against gnutls so it wont' be loaded/unloaded. The attached patch does this.
Comment 1 Bill Nottingham 2011-10-11 16:32:14 UTC
Created attachment 198797 [details] [review]
gnc-module patch

Here's what I'm intending to add in Fedora - this patch marks scanned modules to not be unloaded. It seems to match the fact that g_module_close() isn't called in gnc_module_unload.
Comment 2 Bill Nottingham 2011-10-14 21:37:15 UTC
Created attachment 199038 [details] [review]
updated patch

Here's an updated patch. On the 'real' open of the module, we need to upgrade the symbol visibility, otherwise assorted things fail. (glade files finding callbacks is the obvious one I noticed.)
Comment 3 Christian Stimming 2011-10-16 20:08:07 UTC
Comment on attachment 199038 [details] [review]
updated patch

The current gnucash codebase never uses any un-loading of modules. Hence, I agree any bad things happening from the assumption that it had worked need to be worked around.
Comment 4 Geert Janssens 2011-11-09 15:10:33 UTC
Comment on attachment 199038 [details] [review]
updated patch

Hmm, I'm not really sure about the call to dlopen. This function is not available when compiling via mingw for Windows, so it probably should be protected.

I'm surprised you need to call this function though. The call to open the module is
g_module_open(modinfo->module_filepath, 0) in line 501. This mask suggests to load the symbols globally according to
http://developer.gnome.org/glib/2.30/glib-Dynamic-Loading-of-Modules.html#GModuleFlags

I suspect that the problem may be you are calling g_module_make_resident in the wrong place. You are calling it during the module info gathering step. At that point, the module is only temporarily opened to just get sufficient information about it. It is called there on purpose in a local scope and closed afterwards. Since you call the g_module_make_resident function there, perhaps the module is not really closed then and remains open with symbols in a local scope.

Shouldn't g_module_make_resident be called rather at the spot you are calling dlopen, namely right after the module got opened to be used ?
Comment 5 Geert Janssens 2011-12-04 17:45:53 UTC
In the meantime, I have my system upgraded to Fedora 16 and I'm "lucky" enough to run into the same segmentation fault. That put me in a good position to debug it :)

I hadn't seen the reference to the fedora's bugzilla report above. After reading through it, I realized my proposal in comment 4 would not work.

But instead I have slightly reworked Bill's patch to avoid the call to dl_open and still get the desired global scope for the symbols.

See commit r21665.

If no bad side effects are reported, I'll backport it to the 2.4 branch as well in a few days.
Comment 6 Geert Janssens 2012-02-15 13:36:36 UTC
This was backported to 2.4.9. I believe this can be considered fixed now. If the issue still appears, feel free to reopen this report.
Comment 7 John Ralls 2017-09-24 22:48:42 UTC
Reassign version to 2.4.x so that individual 2.4 versions can be retired.
Comment 8 John Ralls 2018-06-29 23:01:55 UTC
GnuCash bug tracking has moved to a new Bugzilla host. This bug has been copied to https://bugs.gnucash.org/show_bug.cgi?id=661383. Please update any external references or bookmarks.