GNOME Bugzilla – Bug 646342
dlerror() not thread-safe in all libc, making gmodule-dl.c's fetch_dlerror fail sometimes
Last modified: 2018-05-24 13:00:30 UTC
Here are some backtraces illustrating the problem. The libc in these stacktraces is eglib 2.10. Line 108 of dlerror.c in eglibc.10 is indeed a strcmp. That dlerror.c can be found here: http://www.eglibc.org/cgi-bin/viewcvs.cgi/branches/eglibc-2_10/libc/dlfcn/dlerror.c?rev=8421 Line 10.8 is 'if (strcmp (result->errstring, "out of memory") != 0)' making me conclude that result->errstring isn't threadsafe. That result variable comes from a __libc_getspecific (key) (which sounds to me like a key-value store). Crash in GIO: 0 strcmp () at ../ports/sysdeps/arm/strcmp.S:79 1 0x3aaf1334 __dlerror () at dlerror.c:108 2 0x3bbb8f58 fetch_dlerror (replace_null=0) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:81 3 0x3bbb9168 _g_module_symbol (symbol_name=0x3bbb9ec0 "g_module_unload", symbol=0x33d61c) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:148 4 g_module_symbol (symbol_name=0x3bbb9ec0 "g_module_unload", symbol=0x33d61c) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:633 5 0x3bbb99d0 g_module_open (file_name=0x32f608 <Address 0x32f608 out of bounds>) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:490 6 0x3b7a49a0 g_io_module_load_module (gmodule=0x31cc00) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/giomodule.c:185 7 0x3b88ec08 g_type_module_use (module=0x31cc00) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gobject/gtypemodule.c:256 8 0x3b7a4580 g_io_modules_scan_all_in_directory (dirname=0x3b82b48c "/usr/lib/gio/modules") at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/giomodule.c:381 9 0x3b7a4850 _g_io_modules_ensure_loaded () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/giomodule.c:572 10 0x3b7c362c get_default_vfs () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gvfs.c:186 11 0x3accb16c g_once_impl (once=0x3b85a4cc, func=0x3b7c3608 <get_default_vfs>, arg=0x0) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthread.c:1049 12 0x3b787128 g_file_new_for_path (path=0x43f007b8 <Address 0x43f007b8 out of bounds>) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gfile.c:5899 13 0x438b1a68 tracker_sparql_backend_load_plugins () at tracker-backend.c:1430 14 tracker_sparql_backend_real_init () at tracker-backend.c:552 15 0x438ab8e4 tracker_sparql_connection_init (self=0x0, error=0x3aaf32e8) at tracker-connection.c:1208 16 0x438ad088 tracker_sparql_connection_get_internal (is_direct_only=0, cancellable=0x0, error=0x43abdd0c) at tracker-connection.c:486 17 0x438adbac _lambda0_ (self=0x13f4c0) at tracker-connection.c:611 18 __lambda0__gio_scheduler_job_func (self=0x13f4c0) at tracker-connection.c:686 19 0x3b7a51b0 io_job_thread () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gioscheduler.c:181 20 0x3acced28 g_thread_pool_thread_proxy () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthreadpool.c:319 21 0x3accc95c g_thread_create_proxy (data=0x30df40) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthread.c:1897 22 0x3ad9894c start_thread () at pthread_create.c:302 23 0x3abe88e8 clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101 24 0x3abe88e8 clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101 Crash nearby but this time in libtracker-sparql: 0 strcmp () at ../ports/sysdeps/arm/strcmp.S:79 1 0x3aaf1334 __dlerror () at dlerror.c:108 2 0x3bbb8f58 fetch_dlerror (replace_null=0) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:81 3 0x3bbb9168 _g_module_symbol (symbol_name=0x3bbb9efc "g_module_check_init", symbol=0x43afdbb4) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:148 4 g_module_symbol (symbol_name=0x3bbb9efc "g_module_check_init", symbol=0x43afdbb4) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:633 5 0x3bbb97dc g_module_open (file_name=0x3239e8 <Address 0x3239e8 out of bounds>) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:485 6 0x438f1070 tracker_sparql_backend_load_plugins_from_path (self=0x304340, path=0x3239e8 <Address 0x3239e8 out of bounds>, required=0, error=0x43afdc70) at tracker-backend.c:1644 7 0x438f1b28 tracker_sparql_backend_load_plugins () at tracker-backend.c:1497 8 tracker_sparql_backend_real_init () at tracker-backend.c:552 9 0x438eb8e4 tracker_sparql_connection_init (self=0x0, error=0x3aaf32e8) at tracker-connection.c:1208 10 0x438ed088 tracker_sparql_connection_get_internal (is_direct_only=0, cancellable=0x0, error=0x43afdd0c) at tracker-connection.c:486 11 0x438edbac _lambda0_ (self=0x1296c0) at tracker-connection.c:611 12 __lambda0__gio_scheduler_job_func (self=0x1296c0) at tracker-connection.c:686 13 0x3b7a51b0 io_job_thread () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gioscheduler.c:181 14 0x3acced28 g_thread_pool_thread_proxy () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthreadpool.c:319 15 0x3accc95c g_thread_create_proxy (data=0x2fec18) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthread.c:1897 16 0x3ad9894c start_thread () at pthread_create.c:302 17 0x3abe88e8 clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101 18 0x3abe88e8 clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101 Suggested fix is adding global locks in gmodule-dl.c, or of course asking libc maintainers to provide a thread-safe dlerror(). I think POSIX doesn't require threadsafety of dlerror(), though. Some discussion that I found on this issue: http://www.mail-archive.com/debian-glibc@lists.debian.org/msg43434.html
Note that the problem is also observed in GIO. I fully agree that if dlerror() isn't thread-safe that it's probably impossible to properly fix getting the dlsym()'s error message when other libraries will also be using dlerror() in other threads. But then we have to consider documenting in GIO that if you ever do use GIO, you cannot use dlerror() (from another thread) anymore. I guess passing a generic error message instead of the one that dlerror() gives, to the GModule API, could be a temporary solution for when the libc is known not to be threadsafe.
(In reply to comment #0) > Line 10.8 is 'if (strcmp (result->errstring, "out of memory") != 0)' making me > conclude that result->errstring isn't threadsafe. That result variable comes > from a __libc_getspecific (key) (which sounds to me like a key-value store). yes, but it's a thread-specific key-value store. maybe eglibc's dlerror() is just broken?
(In reply to comment #2) > (In reply to comment #0) > > Line 10.8 is 'if (strcmp (result->errstring, "out of memory") != 0)' making me > > conclude that result->errstring isn't threadsafe. That result variable comes > > from a __libc_getspecific (key) (which sounds to me like a key-value store). > > yes, but it's a thread-specific key-value store. maybe eglibc's dlerror() is > just broken? Sounds likely that eglibc is just broken and should be fixed. The author of E-mail in debian-glibc mentions that POSIX doesn't require dlerror() to be thread-safe, though (not that it makes much sense when it's not thread-safe in my opinion). Can gmodule be adapted to return some generic error message for in case a libc is detected (at build time, for example) that is known not to have a thread-safe dlerror()? Not nice, but nicer than race conditions :-\
Not only does POSIX not require dlerror to be thread-safe; as-written, the specification DOES NOT ALLOW it to be thread-safe, i.e. glibc is non-conforming. The text for dlerror reads: "The dlerror() function shall return a null-terminated character string (with no trailing <newline>) that describes the last error that occurred during dynamic linking processing. If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL. Thus, invoking dlerror() a second time, immediately following a prior invocation, shall result in NULL being returned." As-written, it's valid to call dlopen or dlsym in one thread and later query the error message with dlerror from another thread. This could happen in a GUI program where dlopen happens in the main thread and dlerror happens in a GUI thread displaying the error dialog box, or similar, or in a situation where a callback-based interface in a third-party library causes a callback to run in a new thread without the main program's knowledge. While POSIX could "fix" this and specify the error status to be thread-local, the whole concept of using dlerror except for the purpose of printing error messages to the user is just bogus. dlopen and dlsym already return non-descriptive error information (success/failure). I'm aware that some users (including glib/gtk/gnome?) of dlsym are treating null as a potentially-successful return since a symbol could be defined to 0; this seems to be done for ideological purposes, not for any actual application requirements. As it is a major portability bug affecting operation of multi-threaded programs on any system except glibc-based ones, it should simply be fixed by using the return value of dlsym, rather than dlerror, to determine whether lookup failed.
we are following the dlsym man page recommendation for how to check for errors. this is relevant: http://austingroupbugs.net/view.php?id=97
*** Bug 672665 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/399.