GNOME Bugzilla – Bug 670144
unconditional use of CLOCK_MONOTONIC is broken
Last modified: 2014-02-21 21:42:45 UTC
glib/gmain.c: ... #ifdef CLOCK_MONOTONIC clock_gettime (CLOCK_MONOTONIC, &ts); #else ... leads in practice in certain cases to clock_gettime() returning -1 with EINVAL -- as the presence of the constant at compile time does not imply that the support is to be present at run time. This happens here on a system where _POSIX_MONOTONIC_CLOCK is NOT defined, which is apparently ignored (while must be consulted according to clock_gettime(3)) while it should imply a need for runtime checking. Worse, the resulting failure (return value of clock_gettime)) is not being checked for. As a result most of the applications using glib are "mysteriously" broken, as soon as timeouts are concerned. This bug was apparently introduced as an incorrect "fix" for bug #661421/#668738. A patch for runtime detection is suggested in #668738.
You've clearly already done your research, but for others: Introduced, with subtle thread bug: http://git.gnome.org/browse/glib/commit/?id=ab548d240a9c2862ea7f7685c68b6013a68bf0c8 Broken thread safety fix (introducing bug 661421): http://git.gnome.org/browse/glib/commit/?id=2955981569808304399d16847712f0ef97c6355b Removed: http://git.gnome.org/browse/glib/commit/?id=71cf70b39cd3859ac4cb4954c369bda731a61171 I think we should probably re-introduce some mix of the original logic plus using g_once_init for thread safety.
Looking at the code, I actually think I prefer reintroducing the original form but using atomic access to the static variables to avoid the threading problems. From what I remember at the time, the only issue was that the compiler (or processor) was reordering the writes (although maybe Dan can remind me differently). Using atomics would prevent that.
(In reply to comment #2) > From what I remember at the time, the only issue was that the > compiler (or processor) was reordering the writes (although maybe Dan can > remind me differently). I have no knowledge of it *actually* happening anywhere, but yes, that was the potential problem. > Looking at the code, I actually think I prefer reintroducing the original form minus the configure checks though? (qv bug 668738) Though maybe that should just be regarded as an OpenBSD bug that should be fixed on their end.
What system was impacted by this, and is it still broken? We're considering making use of CLOCK_MONOTONIC a hard requirement...
Our applications (linked against uClibc) work as of glib 2.37.0 without patching glib. Nevertheless: > We're considering making use of CLOCK_MONOTONIC a hard requirement... uClibc as of today's git: uClibc/libpthread/nptl/sysdeps/unix/sysv/linux/bits/posix_opt.h: #define _POSIX_MONOTONIC_CLOCK 0 Which means the applications (in this case glib) must check the availability at runtime. If glib will be relying on the presence of CLOCK_MONOTONIC, it is not guaranteed to work on uClibc-systems. That would be a pity^WPITA. Hope you find a compromise to let glib be compatible with uClibc and with other systems where _POSIX_MONOTONIC_CLOCK is 0. Regards, rl
I think we've decided that we want g_get_monotonic_time() to be reliable -- ie: true monotonic time, on all systems. Is there any other way that we could support this on uClibc? Maybe uClibc should expose this functionality... I don't understand why they don't -- it's a pretty trivial wrapper around the kernel.
(To make it clear, I can not speak on behalf of uClibc, in this context I am only involved as its heavy user) Which functionality do you refer to saying "this"? Apparently uClibc exposes the functionality necessary to check the availability of monotonic clock at run time. Different hardware platforms, different kernel versions and even differently compiled instances of the same kernel version can have different properties and it would be Not Good (TM) to expect that you have all the information about the future run time environment at compile time. > we want g_get_monotonic_time() to be reliable ... on all systems I do not look at the code right now and do not remember the details but it looks like you want to rely on something which actually is not available on "all systems"? Depending on what you do (and how) you may end up making glib unsuitable for a whole class of platforms just because some instances of those do not support a certain optional feature. This is the thing I would very much like to avoid. Regards, rl
Reading glibc source code, it seems that they make an unconditional dependency on CLOCK_REALTIME and CLOCK_MONOTONIC by virtue of the fact that these are always available in Linux. From sysdeps/unix/sysv/linux/clock_gettime.c: /* The REALTIME and MONOTONIC clock are definitely supported in the kernel. */ #define SYSDEP_GETTIME \ SYSDEP_GETTIME_CPUTIME; \ case CLOCK_REALTIME: \ case CLOCK_MONOTONIC: \ retval = SYSCALL_GETTIME (clock_id, tp); \ break (and indeed, __NR_clock_gettime is just a simple syscall). I really think that this is just a case of uClibc lagging on exposing an important feature which really is available on all systems...
Also note this bug[1] from the uclibc tracker, which claims: "The POSIX monotonic clock is implemented in uClibc and actually works. However, the _POSIX_MONOTONIC_CLOCK define reads 0, so some libraries which rely on it (e.g. Qt) don't use this feature, which can cause some problems." [1] https://bugs.busybox.net/show_bug.cgi?id=5468 So it seems that CLOCK_MONOTONIC really _is_ supported in uClibc?
The bug report you refer to (which does not mean per se that there is a bug) says "actually works" which is irrelevant, it is not the same as "always works". It looks like the submitter believed that _POSIX_MONOTONIC_CLOCK == 0 implies "not supported", see the comment there. Of course monotonic clock _is_ supported in uClibc, it just is not guaranteed to work in each given running instance - not because of uClibc shortcomings but depending on the actual environment (not in the sense of "environment variables").
But if uclibc supports CLOCK_MONOTONIC in theory and the kernel always provides it in practice, which environment would we find ourselves in where there was a problem?
> if [...] the kernel always provides it in practice, Here you make a very broad assumption. What makes you believe this to be true? Did anybody analyze/test all situations and environments (hardware and kernel variations) where uClibc and glib are being used or might be (possibly very productively) used? The questions here are rhetorical. Hope you see what I mean. Best regards, rl
To avoid misunderstanding, I do not mean that your assumption is wrong, only that quite a lot of background knowledge is necessary to either confirm or reject your statement. I respect both the choice of uClibc developers and your differing perception. Hope you have a sufficient ground for a well informed decision. rl
A little bit of digging through how the clock_gettime() call is implemented in the kernel brings us to the posix_clocks[] variable, and this: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/posix-timers.c#n276 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/posix-timers.c#n223 The kernel always registers CLOCK_MONOTONIC. According to git-blame, the line that adds CLOCK_MONOTONIC has been there since at least 2005, when the kernel was imported into git. I'd say it's pretty safe to depend on this...
I appreciate that you do the due research. One thing that might be easy to miss and I'd like to mention: there is more than one implementation of Linux ABI. People regularly run Linux binaries on FreeBSD and NetBSD, I actually depend on this too. I guess a check of these kernels is relevant as well. Regards, rl
I've checked Linux, OpenBSD, NetBSD, FreeBSD, Solaris, Illumos kernel and Hurd and all of them seem to have working CLOCK_MONOTONIC. Additionally, the monotonic clock is supported in Android and on Mac OS and Windows (although not via the clock_gettime() API in those cases).
Nice. Hope there will never appear a new somehow viable platform without CLOCK_MONOTONIC (Assuming that Minix either does have it or will have). Anything beyond Linux ABI is anyway out of scope for uClibc and this implies that uClibc is indeed overly cautious (if BSDs properly expose CLOCK_MONOTONIC in their linuxulators). Yet I would not call such a cautious approach, run time testing, a bug. IOW your intended change is quite certainly harmless but it still relies on certain assumptions which have to be true "always". I can not estimate the gains and I hope we both understand the presence of some risks (and some inconvenience as corresponding changes must follow in uClibc). The change would be harmless for _my_ current needs. Thanks for taking your time to discuss this Ryan. Regards, rl
And thanks for making sure I did the necessary work :) fwiw, we're trying a new approach with regards to portability issues such as this. It's outlined here: https://wiki.gnome.org/Projects/GLib/SupportedPlatforms
I assume that you will respect _POSIX_MONOTONIC_CLOCK == 0 and make the library check the presence of monotonic clock at runtime and fail explicitly, or fail at compilation if you definitely can not live with runtime checking. With other words, NOT repeating the former mistake of unconditional clock_gettime (CLOCK_MONOTONIC, &ts); which led to a hell of mysterious errors and troubleshooting. Despite all your research you never know when CLOCK_MONOTONIC may happen to be absent. It was absent for me 2 years ago despite your assertion that this shouldn't have happened since 2005. This has nothing to do with uClibc but with the standards. If a header says you must check the presence of a functionality at run time and a library does not, this constitutes a bug. (The header in uClibc to the contrary is not a bug but an implementation choice, with some reasons behind it. One can not challenge the choice just because all the underlying code _one_ checked _could_ work with a different header.) In other words: please respect standards, not the implementations. The former exist to make the latter interoperable. (You apparently rely on having examined a lot of implementations, this makes me uncertain that you are set to follow the principle above. I do not question your competence, it's impressive. Still you can _not_ know all the implications of all the possibly concerned systems.) Yours, rl
(In reply to comment #19) > With other words, NOT repeating the former mistake of unconditional > > clock_gettime (CLOCK_MONOTONIC, &ts); > > which led to a hell of mysterious errors and troubleshooting. Ya -- this is fair. Would you be happy if we had a g_assert() here to make sure it didn't fail? > This has nothing to do with uClibc but with the standards. If a header says you > must check the presence of a functionality at run time and a library does not, > this constitutes a bug. > > (The header in uClibc to the contrary is not a bug but an implementation > choice, with some reasons behind it. One can not challenge the choice just > because all the underlying code _one_ checked _could_ work with a different > header.) > > In other words: please respect standards, not the implementations. We've had a lot of practice with attempting to respect standards, but it hasn't worked well for us. We can't really ever know if anything is really working in all cases that the standard anticipated unless we test it -- and that involves implementations. We do try to test as many implementations as possible, however. I explicitly acknowledge that what I've done here may not work everywhere. "Working CLOCK_MONOTONIC" is now on the official list of things that GLib has a hard dependency on -- if a system has a clock_gettime(CLOCK_MONOTONIC) that either fails to compile or fails at runtime then we simply no longer support it. I'll reopen this bug for the addition of the assertion if the call fails at runtime to avoid the "hell of mysterious errors and troubleshooting" that you describe, but I don't want to do a fallback. See bug 724687, bug 673607 and bug 661421 for some more information and background on the difficulties that we've encountered by attempting to support too many different clock configurations here...
Created attachment 269925 [details] [review] gmain: abort if monotonic time is unsupported We now depend on CLOCK_MONOTONIC, but it's possible that people may attempt to run GLib on systems where it isn't supported at runtime. Check the return value of clock_gettime() and abort() if it fails in order to save these people from wasting time on debugging a tricky issue.
Thanks, I understand your reasons and highly appreciate that you care to avoid the unnecessary pains. Yes, an assertion would be quite reasonable and helpful. Regards, rl
Attachment 269925 [details] pushed as 03a43c2 - gmain: abort if monotonic time is unsupported