After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 646342 - dlerror() not thread-safe in all libc, making gmodule-dl.c's fetch_dlerror fail sometimes
dlerror() not thread-safe in all libc, making gmodule-dl.c's fetch_dlerror fa...
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: gmodule
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 672665 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-03-31 14:25 UTC by Philip Van Hoof
Modified: 2018-05-24 13:00 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Philip Van Hoof 2011-03-31 14:25:50 UTC
Here are some backtraces illustrating the problem.

The libc in these stacktraces is eglib 2.10. Line 108 of dlerror.c in eglibc.10 is indeed a strcmp.

That dlerror.c can be found here: http://www.eglibc.org/cgi-bin/viewcvs.cgi/branches/eglibc-2_10/libc/dlfcn/dlerror.c?rev=8421

Line 10.8 is 'if (strcmp (result->errstring, "out of memory") != 0)' making me conclude that result->errstring isn't threadsafe. That result variable comes from a __libc_getspecific (key) (which sounds to me like a key-value store).


Crash in GIO:

0		strcmp () at ../ports/sysdeps/arm/strcmp.S:79
1	0x3aaf1334	__dlerror () at dlerror.c:108
2	0x3bbb8f58	fetch_dlerror (replace_null=0) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:81
3	0x3bbb9168	_g_module_symbol (symbol_name=0x3bbb9ec0 "g_module_unload", symbol=0x33d61c) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:148
4		g_module_symbol (symbol_name=0x3bbb9ec0 "g_module_unload", symbol=0x33d61c) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:633
5	0x3bbb99d0	g_module_open (file_name=0x32f608 <Address 0x32f608 out of bounds>) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:490
6	0x3b7a49a0	g_io_module_load_module (gmodule=0x31cc00) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/giomodule.c:185
7	0x3b88ec08	g_type_module_use (module=0x31cc00) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gobject/gtypemodule.c:256
8	0x3b7a4580	g_io_modules_scan_all_in_directory (dirname=0x3b82b48c "/usr/lib/gio/modules") at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/giomodule.c:381
9	0x3b7a4850	_g_io_modules_ensure_loaded () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/giomodule.c:572
10	0x3b7c362c	get_default_vfs () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gvfs.c:186
11	0x3accb16c	g_once_impl (once=0x3b85a4cc, func=0x3b7c3608 <get_default_vfs>, arg=0x0) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthread.c:1049
12	0x3b787128	g_file_new_for_path (path=0x43f007b8 <Address 0x43f007b8 out of bounds>) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gfile.c:5899
13	0x438b1a68	tracker_sparql_backend_load_plugins () at tracker-backend.c:1430
14		tracker_sparql_backend_real_init () at tracker-backend.c:552
15	0x438ab8e4	tracker_sparql_connection_init (self=0x0, error=0x3aaf32e8) at tracker-connection.c:1208
16	0x438ad088	tracker_sparql_connection_get_internal (is_direct_only=0, cancellable=0x0, error=0x43abdd0c) at tracker-connection.c:486
17	0x438adbac	_lambda0_ (self=0x13f4c0) at tracker-connection.c:611
18		__lambda0__gio_scheduler_job_func (self=0x13f4c0) at tracker-connection.c:686
19	0x3b7a51b0	io_job_thread () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gioscheduler.c:181
20	0x3acced28	g_thread_pool_thread_proxy () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthreadpool.c:319
21	0x3accc95c	g_thread_create_proxy (data=0x30df40) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthread.c:1897
22	0x3ad9894c	start_thread () at pthread_create.c:302
23	0x3abe88e8	clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101
24	0x3abe88e8	clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101

Crash nearby but this time in libtracker-sparql:

0		strcmp () at ../ports/sysdeps/arm/strcmp.S:79
1	0x3aaf1334	__dlerror () at dlerror.c:108
2	0x3bbb8f58	fetch_dlerror (replace_null=0) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:81
3	0x3bbb9168	_g_module_symbol (symbol_name=0x3bbb9efc "g_module_check_init", symbol=0x43afdbb4) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule-dl.c:148
4		g_module_symbol (symbol_name=0x3bbb9efc "g_module_check_init", symbol=0x43afdbb4) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:633
5	0x3bbb97dc	g_module_open (file_name=0x3239e8 <Address 0x3239e8 out of bounds>) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gmodule/gmodule.c:485
6	0x438f1070	tracker_sparql_backend_load_plugins_from_path (self=0x304340, path=0x3239e8 <Address 0x3239e8 out of bounds>, required=0, error=0x43afdc70) at tracker-backend.c:1644
7	0x438f1b28	tracker_sparql_backend_load_plugins () at tracker-backend.c:1497
8		tracker_sparql_backend_real_init () at tracker-backend.c:552
9	0x438eb8e4	tracker_sparql_connection_init (self=0x0, error=0x3aaf32e8) at tracker-connection.c:1208
10	0x438ed088	tracker_sparql_connection_get_internal (is_direct_only=0, cancellable=0x0, error=0x43afdd0c) at tracker-connection.c:486
11	0x438edbac	_lambda0_ (self=0x1296c0) at tracker-connection.c:611
12		__lambda0__gio_scheduler_job_func (self=0x1296c0) at tracker-connection.c:686
13	0x3b7a51b0	io_job_thread () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/gio/gioscheduler.c:181
14	0x3acced28	g_thread_pool_thread_proxy () at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthreadpool.c:319
15	0x3accc95c	g_thread_create_proxy (data=0x2fec18) at /home/bifh6/cs2009q3-armel/work/glib2.0-2.26.0/glib/gthread.c:1897
16	0x3ad9894c	start_thread () at pthread_create.c:302
17	0x3abe88e8	clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101
18	0x3abe88e8	clone () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:101

Suggested fix is adding global locks in gmodule-dl.c, or of course asking libc maintainers to provide a thread-safe dlerror(). I think POSIX doesn't require threadsafety of dlerror(), though.

Some discussion that I found on this issue:

http://www.mail-archive.com/debian-glibc@lists.debian.org/msg43434.html
Comment 1 Philip Van Hoof 2011-03-31 14:39:11 UTC
Note that the problem is also observed in GIO. I fully agree that if dlerror() isn't thread-safe that it's probably impossible to properly fix getting the dlsym()'s error message when other libraries will also be using dlerror() in other threads. But then we have to consider documenting in GIO that if you ever do use GIO, you cannot use dlerror() (from another thread) anymore.

I guess passing a generic error message instead of the one that dlerror() gives, to the GModule API, could be a temporary solution for when the libc is known not to be threadsafe.
Comment 2 Dan Winship 2011-03-31 15:01:38 UTC
(In reply to comment #0)
> Line 10.8 is 'if (strcmp (result->errstring, "out of memory") != 0)' making me
> conclude that result->errstring isn't threadsafe. That result variable comes
> from a __libc_getspecific (key) (which sounds to me like a key-value store).

yes, but it's a thread-specific key-value store. maybe eglibc's dlerror() is just broken?
Comment 3 Philip Van Hoof 2011-03-31 15:07:15 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > Line 10.8 is 'if (strcmp (result->errstring, "out of memory") != 0)' making me
> > conclude that result->errstring isn't threadsafe. That result variable comes
> > from a __libc_getspecific (key) (which sounds to me like a key-value store).
> 
> yes, but it's a thread-specific key-value store. maybe eglibc's dlerror() is
> just broken?

Sounds likely that eglibc is just broken and should be fixed. The author of E-mail in debian-glibc mentions that POSIX doesn't require dlerror() to be thread-safe, though (not that it makes much sense when it's not thread-safe in my opinion).

Can gmodule be adapted to return some generic error message for in case a libc is detected (at build time, for example) that is known not to have a thread-safe dlerror()? Not nice, but nicer than race conditions :-\
Comment 4 bugdal 2012-12-23 15:17:42 UTC
Not only does POSIX not require dlerror to be thread-safe; as-written, the specification DOES NOT ALLOW it to be thread-safe, i.e. glibc is non-conforming. The text for dlerror reads:

"The dlerror() function shall return a null-terminated character string (with no trailing <newline>) that describes the last error that occurred during dynamic linking processing. If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL. Thus, invoking dlerror() a second time, immediately following a prior invocation, shall result in NULL being returned."

As-written, it's valid to call dlopen or dlsym in one thread and later query the error message with dlerror from another thread. This could happen in a GUI program where dlopen happens in the main thread and dlerror happens in a GUI thread displaying the error dialog box, or similar, or in a situation where a callback-based interface in a third-party library causes a callback to run in a new thread without the main program's knowledge.

While POSIX could "fix" this and specify the error status to be thread-local, the whole concept of using dlerror except for the purpose of printing error messages to the user is just bogus. dlopen and dlsym already return non-descriptive error information (success/failure). I'm aware that some users (including glib/gtk/gnome?) of dlsym are treating null as a potentially-successful return since a symbol could be defined to 0; this seems to be done for ideological purposes, not for any actual application requirements. As it is a major portability bug affecting operation of multi-threaded programs on any system except glibc-based ones, it should simply be fixed by using the return value of dlsym, rather than dlerror, to determine whether lookup failed.
Comment 5 Matthias Clasen 2013-01-01 15:10:25 UTC
we are following the dlsym man page recommendation for how to check for errors.
this is relevant: http://austingroupbugs.net/view.php?id=97
Comment 6 Matthias Clasen 2013-02-03 06:26:08 UTC
*** Bug 672665 has been marked as a duplicate of this bug. ***
Comment 7 GNOME Infrastructure Team 2018-05-24 13:00:30 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/399.