Bug 756430 – g_rw_lock_reader_lock() can return without locking, or error

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 756430 - g_rw_lock_reader_lock() can return without locking, or error


Summary:	g_rw_lock_reader_lock() can return without locking, or error


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	gthread
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal critical
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-10-12 12:31 UTC by Paul Davis
Modified:	2017-11-03 20:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gthread: Emit a critical if g_rw_lock_reader_lock() fails (1.57 KB, patch) 2017-11-03 15:26 UTC, Philip Withnall	none	Details \| Review
gthread: Emit a critical if g_rw_lock_reader_lock() fails (2.17 KB, patch) 2017-11-03 15:34 UTC, Philip Withnall	none	Details \| Review
gthread: Emit a critical if g_rw_lock_reader_lock() fails (2.20 KB, patch) 2017-11-03 18:17 UTC, Philip Withnall	committed	Details \| Review

Description Paul Davis 2015-10-12 12:31:28 UTC

g_rw_lock_reader_lock() returns void. On POSIX systems it just calls pthread_rwlock_rdlock() which can return several different error values, any of which indicate that the lock is not taken. These errors are not propagated or visible to the caller of the glib function.

Code that relies on the declared semantics of g_rw_lock_read_lock() can end up causing memory corruption because it does not in fact ensure that the protected data/code is locked as intended. 

OS X supports recursive RW locks, and so this does not occur on that particular variant of POSIX, but Linux does not. We discovered this by (incorrectly) having a thread which took a read/write lock and then attempted to take a read lock. Using pthreads directly on Linux, this should result in  EDEADLCK. Using glib's wrappers, g_rw_lock_reader_lock() returns as if there is no problem, and the lock is now held. 

This also results in undefined behaviour for the unlock, since the lock is not actually held.

Comment 1 Colin Walters 2015-10-12 13:41:33 UTC

There are earlier bugs about the differences between how glibc pthreads doesn't do error checking by default, but e.g. FreeBSD does etc.

I think in the end the best bet is to use platform debug tools for threading (e.g. helgrind), rather than GLib attempting to impose a platform-independent threading error checking mechanism.

Comment 2 Paul Davis 2015-10-13 16:53:44 UTC

This isn't about debugging.

The declared semantics of g_rw_lock_reader_lock() do not match its behaviour. Those semantics say that on return, the lock is held. This isn't true.

Comment 3 Matthias Clasen 2015-10-13 17:04:25 UTC

> Those semantics say that on return, the lock is held. This isn't true.

Its true unless an error occurred. I think your original complaint was more to the point:

> These errors are not propagated or visible to the caller of the glib function.

Comment 4 Paul Davis 2015-10-13 17:11:33 UTC

The operation invoked by g_rw_lock_reader_lock() can fail. But the function does not propagate or make visible the failure. 

Using helgrind can work once you have some idea that a problem is lock-related. But we were debugging random memory corruption, occuring in a single thread. Note that the documentation for the unlock function notes the risks of unlocking an unlocked mutex. We don't know for sure if the corruption was caused by the unlock itself, or something else, but it was fixed when we stopped causing (silent) failure in the lock call. 

To have this function present in the glib API is basically offering the illusion of its functionality with no suggestion or hint as to the problems it could cause. Is this acceptable?

Comment 5 Dan Winship 2015-10-13 17:42:28 UTC

It can't return *useful* errors. If it returns an error, then there's a bug in your program, and there's nothing you can programmatically do to recover from it.

But we could make the glib function warn/return-if-fail/abort if the pthread function returned an error.

Comment 6 Matthias Clasen 2015-10-13 17:53:39 UTC

Could we handle EAGAIN meaningfully ?

Comment 7 Dan Winship 2015-10-13 18:02:31 UTC

I think EAGAIN here is like EMFILE: it means that either you have a resource leak, or else you need to bump up your process limits. I don't think there's a meaningful way to actually handle it.

Comment 8 Paul Davis 2015-10-13 18:17:35 UTC

(In reply to Dan Winship from comment #5)
> It can't return *useful* errors. If it returns an error, then there's a bug
> in your program, and there's nothing you can programmatically do to recover
> from it.

The GThreads RW lock API is clearly modelled on POSIX, and POSIX clearly states that a rdlock can fail to be acquired. Any sane consideration of RW locks from a theoretical perspective would also indicate that an attempt to acquire the lock can fail. Why would glib use a void type for a function that may not be able to do what it claims?

Are you suggesting that it makes no sense for pthread_rwlock_rdlock() to return an error either, since there's nothing you can programmatically do to recover from it?

If it is really believed that there's no sensible code path after a failed rdlock attempt, then the glib version should abort, if for no reason that to prevent the program from entering "undefined behaviour" territory when it unlocks.

Comment 9 Paul Davis 2016-02-21 00:23:23 UTC

Is anything going to be done about this? Nobody has responded to my comparison of the POSIX RW Lock API and Glib behaviour. I cannot see how the current glib behaviour can be remotely acceptable: there is NOTHING wrong with a read lock failing to be acquired (it just means that the write lock is currently held).

Comment 10 Dan Winship 2016-02-21 14:06:18 UTC

(In reply to Paul Davis from comment #9)
> there is NOTHING wrong
> with a read lock failing to be acquired (it just means that the write lock
> is currently held).

The pthread_rwlock_rdlock() man page does seem to go out of its way to imply that for some reason, but it eventually clarifies that "If the read lock is not acquired, the calling thread shall block until it can acquire the lock." The only cases where it returns without acquiring the lock are in situations that you can't plausibly recover from (programmer error or resource exhaustion).

Comment 11 Matthias Clasen 2016-02-22 15:41:15 UTC

I think you simply want pthread_rwlock_tryrdlock ?

Comment 12 Philip Withnall 2017-11-03 15:26:31 UTC

Created attachment 362909 [details] [review]
gthread: Emit a critical if g_rw_lock_reader_lock() fails

It can only fail if there’s been a leak or programmer error, so this is
really unlikely to happen. At least make it obvious something has gone
wrong, though, rather than silently carrying on and returning as if the
reader lock has been acquired.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

Comment 13 Philip Withnall 2017-11-03 15:27:45 UTC

As others have said, you’re only going to get an error return here if there’s a programmer error. We can’t change the API of g_rw_lock_reader_lock() to expose the failure, but we can emit a critical warning, which will at least help with debugging the programmer error (and is consistent with our reporting of other programmer errors).

Comment 14 Philip Withnall 2017-11-03 15:34:48 UTC

Created attachment 362913 [details] [review]
gthread: Emit a critical if g_rw_lock_reader_lock() fails

It can only fail if there’s been a leak or programmer error, so this is
really unlikely to happen. At least make it obvious something has gone
wrong, though, rather than silently carrying on and returning as if the
reader lock has been acquired.

Do the same for g_rw_lock_writer_lock().

It should be safe to use g_critical() for reporting the problems, since
GRWLock is not used in gmessages.c, and printing a critical seems better
than aborting, just in case we do hit the ‘maximum number of reader
locks’ error code.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

Comment 15 Colin Walters 2017-11-03 17:29:49 UTC

Review of attachment 362913 [details] [review]:

::: glib/gthread-posix.c
@@ +542,3 @@
+
+  if (retval != 0)
+    g_critical ("Failed to get RW lock (error %d)", retval);

Let's use %p here so if the process dumps core a developer can see *which* lock?  And why not `g_strerror (errno)`?

Comment 16 Philip Withnall 2017-11-03 18:17:47 UTC

Created attachment 362925 [details] [review]
gthread: Emit a critical if g_rw_lock_reader_lock() fails

It can only fail if there’s been a leak or programmer error, so this is
really unlikely to happen. At least make it obvious something has gone
wrong, though, rather than silently carrying on and returning as if the
reader lock has been acquired.

Do the same for g_rw_lock_writer_lock().

It should be safe to use g_critical() for reporting the problems, since
GRWLock is not used in gmessages.c, and printing a critical seems better
than aborting, just in case we do hit the ‘maximum number of reader
locks’ error code.

Signed-off-by: Philip Withnall <withnall@endlessm.com>

Comment 17 Colin Walters 2017-11-03 18:28:10 UTC

Review of attachment 362925 [details] [review]:

LGTM.

Comment 18 Philip Withnall 2017-11-03 20:52:53 UTC

Attachment 362925 [details] pushed as fc817eb - gthread: Emit a critical if g_rw_lock_reader_lock() fails