Bug 668054 – g_timeout_add implementation not very accurate

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 668054 - g_timeout_add implementation not very accurate


Summary:	g_timeout_add implementation not very accurate


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	mainloop
Version:	2.30.x
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-01-16 21:14 UTC by Phillip Susi
Modified:	2018-05-24 13:42 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
test program (792 bytes, application/octet-stream) 2012-01-16 21:14 UTC, Phillip Susi		Details
patch (628 bytes, patch) 2012-01-16 21:15 UTC, Phillip Susi	rejected	Details \| Review
dump from git-multigrep g_timeout_add on some GNOME source repositories (160.74 KB, text/plain) 2012-01-25 20:31 UTC, Colin Walters		Details

Description Phillip Susi 2012-01-16 21:14:33 UTC

Created attachment 205407 [details]
test program

The glib timer implementation recomputes the next timeout by adding the interval to the current monotonic time each time a timeout fires.  This causes any delay in servicing the timer to accumulate over time to sometimes rather large errors.  Correcting the next timeout to be computed as previous += interval rather than current + interval corrects this error.

I wrote a simple test program to measure the error.  I measured the error on a dual core 3 GHz cpu system under a few conditions for 15 seconds both before and after this patch and got the following results:

Unloaded: 44ms absolute error, 0.296% relative
With stress -c 2 running in the background: 59.77ms ( 0.398% relative )
With a linux kernel make -j 3 in the background: 380.3ms ( 2.535% relative )
With stress -c 2 and niceing the test: 3170.8ms ( 21.138% )

After this patch:

Unloaded: 1.0ms ( 0.0067% )
Stress: 0.385ms ( 0.0026% )
Kernel make: 294ms ( 1.963% )
Stress + nice: 5.04ms ( 0.0315% )

Comment 1 Phillip Susi 2012-01-16 21:15:10 UTC

Created attachment 205409 [details] [review]
patch

Comment 2 Allison Karlitskaya (desrt) 2012-01-22 05:23:15 UTC

hi Phillip,

This is the intended documented behaviour of the timeout source.  Note the documentation for g_timeout_add:


"""
   After each call to the timeout function, the time of the next timeout
   is recalculated based on the current time and the given interval (it
   does not try to 'catch up' time lost in delays).
"""

Since we have documented this behaviour, we can only assume that some programs have come to rely on it and therefore we cannot change it.

Comment 3 Phillip Susi 2012-01-22 20:28:27 UTC

Just because it is documented to behave badly does not mean it should continue to do so.  I can not see how an application can possibly rely on the current behavior of random, unpredictable delays.  I find it far more likely that most programs using timers assume they work correctly and deliver events at the requested pace, not some randomly slower pace.

The warning in the documentation is also somewhat unclear.  I had to read it three times before I thought that it might actually mean that the overall rate would be randomly slower than requested, instead of meaning that a whole event could be lost and not caught up.  I had to look at the source code to be sure because I just could not believe a timer implementation could be that broken.

The whole reason you schedule a periodic timer instead of a one shot timer that just reschedules itself is so that these small delays do not accumulate.

Comment 4 Matthias Clasen 2012-01-23 14:17:48 UTC

If you want something that is precisely periodic, then g_timeout_add is just not the right thing to use.

Comment 5 Phillip Susi 2012-01-23 14:57:00 UTC

There is a difference between precision and accuracy.  You are correct that this is not the timer source to use if you need <= millisecond precision and/or jitter.  Accuracy on the other hand, is something that it should have a reasonable degree of ( 20% error is absurd ), and it can, at essentially no cost, with this very simple change.

Comment 6 Colin Walters 2012-01-23 18:15:18 UTC

(In reply to comment #3)
> Just because it is documented to behave badly does not mean it should continue
> to do so.  I can not see how an application can possibly rely on the current
> behavior of random, unpredictable delays. 

The current implementation is predictable - current monotonic time + timeout.  But this discussion is too abstract - let's think of possible examples.

Imagine a game where you want a particular item to be available to click on every two seconds.  Here you *don't* want the next timeout to be shorter than two seconds if the game happens to be scheduled too late by the kernel.  That punishes the player by shortening the next time period.  Instead, you just want to accept that the previous run was too long, and schedule again.

I might make sense to add a new function, but I think the current behavior since it's explicitly documented should stay.

Comment 7 Phillip Susi 2012-01-23 18:48:06 UTC

The current implementation is unpredictable because the value of the current monotonic time at the point when the timeout is serviced is unpredictable.  With this change, you know what time you start the timer, and from that, you can predict the time that every subsequent event will fire.

In that case you would not use an auto repeat timer.  You would set a one shot timer for 2 seconds, and then make the item available.

You use an auto repeat timer when you want to do something say, 10 times a second.  After 10 seconds, you expect to have done that thing 100 times, not 99, 98, or 97 times, unless you simply ran out of cpu time.  With the current implementation, you can end up only doing the task 90 times, even though there was enough cpu time to have done all 100 with ease.

Conversely, you expect each event to be 100ms apart, give or take some margin of error.  Currently that error is between 0% and infinity, with typical values in the 1-20% range.  With this change, that error band shifts to between -99% ( one event is 99% late, so next happens fast ) and infinity, with typical values in the +/-20% range.  In other words, it centers the error around zero instead of being strictly greater than zero.

Going  back to a game as an example, you might set an auto repeat timer to heal 1 hp every second.  After 100 seconds, you expect to have healed 100 hp, but the way the timers currently are, you only end up with 99 or less.

Comment 8 Colin Walters 2012-01-23 20:54:19 UTC

(In reply to comment #7)
> The current implementation is unpredictable because the value of the current
> monotonic time at the point when the timeout is serviced is unpredictable. 

The time is always unpredictable based on kernel scheduling, regardless of your patch.  However an app can call g_source_get_time() or just g_get_monotonic_time() to find out what the current monotonic time is.

> In that case you would not use an auto repeat timer.  You would set a one shot
> timer for 2 seconds, and then make the item available.

*after* your change - yes.  Before though the app could have used the current API for it.
 
> You use an auto repeat timer when you want to do something say, 10 times a
> second.  After 10 seconds, you expect to have done that thing 100 times, not
> 99, 98, or 97 times, unless you simply ran out of cpu time.  With the current
> implementation, you can end up only doing the task 90 times, even though there
> was enough cpu time to have done all 100 with ease.

I think if we had a time machine, we'd change _timeout_add to have the "catch up" semantics you're suggesting, and people who want "full duration" could open code it by just queuing a new timeout.

But you're missing the fundamental point here that GLib is a widely deployed library where we try very hard to adhere API/ABI compatibility, and such a change *could* break apps.

We could:

1) Clarify the documentation even more on this
2) Add a new API
3) Add an example of using g_source_get_time() to make a "catch up" timer

But again, I don't think we can change the semantics of the existing documented API.

Comment 9 Phillip Susi 2012-01-23 21:30:49 UTC

(In reply to comment #8)
> The time is always unpredictable based on kernel scheduling, regardless of your
> patch.  However an app can call g_source_get_time() or just
> g_get_monotonic_time() to find out what the current monotonic time is.

It is also unpredictable because of everything else you have running in your mainloop, but the point was not that it is unpredictable, but that in the current implementation, its unpredictability makes all subsequent timeouts wildly unpredictable because every little error accumulates to push back the *scheduled* time of *all* subsequent events ( not acceptable ), in addition to any error that happens when trying to service any particular event ( can't be helped ).

> *after* your change - yes.  Before though the app could have used the current
> API for it.

No, you are missing the point.  The app would not schedule an auto repeat timer to make an item available again after 2 seconds, because once it becomes available, it stays available until you use it again.  As such, it does not care which way the subsequent timeouts are scheduled because there are none.

To take another example, you want a machine gun to fire 4 rounds per second, so you would set a repeating timer to fire every 250ms.  You expect to fire at 250ms, 500ms, 750ms, etc.  If it ends up being 255, 550, 751, that is fine, but 300, 600, 900 gives you a distinct disadvantage ( 17% less rate of fire ).

> But you're missing the fundamental point here that GLib is a widely deployed
> library where we try very hard to adhere API/ABI compatibility, and such a
> change *could* break apps.

I really don't think that is likely.  The apps are already used to the timeouts being unpredictable anyhow, the only difference is that instead of *always* slowing down, sometimes they speed up a bit to catch up.

I take the text in the documentation to mean "warning: don't use this if you care about accuracy" not "you can rely on no event coming in less than interval time since the last".

> We could:
> 
> 1) Clarify the documentation even more on this
> 2) Add a new API
> 3) Add an example of using g_source_get_time() to make a "catch up" timer

I suppose adding a flag to choose the behavior that defaults to the old would work.

Comment 10 Colin Walters 2012-01-25 20:30:02 UTC

(In reply to comment #9)

> To take another example, you want a machine gun to fire 4 rounds per second, so
> you would set a repeating timer to fire every 250ms.  You expect to fire at
> 250ms, 500ms, 750ms, etc.  If it ends up being 255, 550, 751, that is fine, but
> 300, 600, 900 gives you a distinct disadvantage ( 17% less rate of fire ).

Right.  But the previous example I was trying to make up, the game cares about the *duration* of the timer.  How about a game where there's a periodic countdown of 10 seconds, and each time 10 seconds pass, your score gets reduced.

In this game, your patch would start punishing the player because the *next* time period would be shortened to "catch up" if the game happened to be scheduled slowly.  This game only cares about having a minimum of 10 seconds elapsed per timeout.

> I really don't think that is likely.  The apps are already used to the timeouts
> being unpredictable anyhow, 

The problem is we're both talking theory - I tried to turn the conversation towards possible apps with some success, but really we need to look at actual real world code.

> I suppose adding a flag to choose the behavior that defaults to the old would
> work.

I'm not sure what to call a new API, but maybe.

Comment 11 Colin Walters 2012-01-25 20:31:21 UTC

Created attachment 206129 [details]
dump from git-multigrep g_timeout_add on some GNOME source repositories

Comment 12 Phillip Susi 2012-01-25 21:32:56 UTC

Actually that is another good example of where this change will improve things.  If your score is supposed to decrement every 10 seconds, then if the first decrement happens a little late at 11 seconds, then the player is rewarded by being given an extra second before the next point loss occurs.  If they are supposed to have lost 2 points for taking more than 20 seconds to complete the maze, then completing it in 21 seconds and not having the second point lost is sort of cheating, only because the game is broken, not because of anything they did.

Comment 13 Colin Walters 2012-01-26 17:03:37 UTC

(In reply to comment #12)
> Actually that is another good example of where this change will improve things.
>  If your score is supposed to decrement every 10 seconds, then if the first
> decrement happens a little late at 11 seconds, then the player is rewarded by
> being given an extra second before the next point loss occurs.  If they are
> supposed to have lost 2 points for taking more than 20 seconds to complete the
> maze, then completing it in 21 seconds and not having the second point lost is
> sort of cheating, only because the game is broken, not because of anything they
> did.

Well if I was the author of this game, I'd say no - I don't want to punish the player if the process got swapped out, because they couldn't interact with the game anyways.

Ultimately though the semantics here need to be up to the app - I believe I've given you at least some (admittedly still theoretical but also plausible) examples of behavior that would change for the worse with your patch.

So that leaves us with either updating the documentation, or a new API.

Comment 14 Phillip Susi 2012-01-26 18:06:35 UTC

The process didn't get swapped out.  It just may be running on a slow video card so it is spending more than 50% of the cpu drawing graphics, which introduces a little latency before each timeout is serviced.  It isn't enough for the user to notice, other than the fact that it accumulates more and more with each tick, so that after some time, the total delay is rather noticeable.

If the process were swapped out or otherwise totally frozen for several seconds, they still would only lose at most 1 second with this patch.

Or instead of graphics, maybe the game saves a game state file frequently and that adds a 0.25 second delay to each tick.  Each individual delay is small enough to not be noticed, but at the end of 60 seconds the player ends up with 1.5 seconds more than they are supposed to have.  A player with a fast solid state disk would not get that extra time.

You have given examples where the behavior will change, but it in each case it is for the better, not worse.

Comment 15 Allison Karlitskaya (desrt) 2012-01-26 19:03:53 UTC

note that glib's typical use case is not a game...

Comment 16 Phillip Susi 2012-01-30 15:32:09 UTC

True but they make a good example.  Checking over the list of gnome apps there that user timers, and would not care at all about this patch:

udisks
upower
polkit
gnome-power-manager: none of the files in the listing appear to exist, removed?
gnome-screensaver

If anyone were to care, I would think it would be rhythmbox.  Perhaps a rhythmbox developer could evaluate whether it would be affected?

Comment 17 Allison Karlitskaya (desrt) 2012-02-17 08:22:11 UTC

Comment on attachment 205409 [details] [review]
patch

I think the only way we can do this is with a new API.  If we were to add such a new API, it may also be a good way to introduce microsecond-accurate timeouts.

Comment 18 Dan Winship 2012-02-17 12:45:48 UTC

(In reply to comment #17)
> (From update of attachment 205409 [details] [review])
> I think the only way we can do this is with a new API.  If we were to add such
> a new API, it may also be a good way to introduce microsecond-accurate
> timeouts.

a thought I had while thinking about the g_socket_condition_timed_wait() bug; have the new API take both a duration and a resolution, so you can specify the amount of accuracy you need, and then internally coalesce timeouts a la g_timeout_add_seconds(), but at multiple levels of resolution.

Comment 19 Phillip Susi 2012-02-17 14:42:43 UTC

Microsecond level timeouts are very unlikely to be feasible in a glib main loop since dispatching events often involves at least a ms or two of processing time each.

Specifying the acceptable jitter ( not accuracy, that's different ) might be nice for a new API, but goes a bit beyond the scope of simply fixing the current badly behaving timers.

Unless you can find a real world example that would be harmed by this change, I don't see why it needs a new API.

Comment 20 Allison Karlitskaya (desrt) 2012-02-17 14:46:55 UTC

dan: i think coalescing wakeups is generally _vastly_ overrated.  particularly for things like socket timeouts.

phillip: we can definitely do better than milliseconds for mainloop wakeups.  having only millisecond accuracy there caused me quite some troubles when i was working on frame-based rendering (at 60Hz).

also: the main benefit of using microseconds for timeouts would be that g_timeout_add() is one of the few APIs in GLib that doesn't use microseconds for specifying time and it would be nice if it did, for consistency reasons.

Comment 21 Phillip Susi 2012-02-17 14:56:42 UTC

You can try, but unless you set up an auxiliary main context running in a dedicated thread, it isn't going to work.  Any kind of main loop processing typical of a gtk app is easily going to delay the timer processing by a millisecond.

By the way, 60 Hz is 16.6ms, so why would you need a microsecond timer for that?

Comment 22 Dan Winship 2012-02-17 14:59:23 UTC

I didn't mean that coalescing timeouts was particularly relevant to g_socket_condition_timed_wait(), I just meant that working on that bug got me thinking about g_timeout_add(), which led to me thinking about how we might merge g_timeout_add() and g_timeout_add_seconds().

Comment 23 Allison Karlitskaya (desrt) 2012-02-19 13:03:56 UTC

Another thing that occurs to me: it's actually impossible for g_timeout_add() to be "accurate" in any meaningful sense.

The most usual/likely case I can think of for a periodic timeout that needs to be accurate is 60Hz.  It's not possible to specify that timeout value in either ms or µs.  You need fractional time for this to work properly or else, even with nanoseconds and no matter what we do to the algorithms, there will be drift over time...

Some sort of g_fractional_timeout_add() may be in order.

void
g_fractional_timeout_add (int         n,
                          int         d,
                          GSourceFunc callback,
                          gpointer    user_data);

or we could introduce a general-purpose simple fraction type and have an API using that.

Comment 24 Phillip Susi 2012-02-19 15:15:15 UTC

With ms precision, the closest you can get to 60 Hz is 17ms, which is a 2% rounding error.  For some applications this may be unacceptable, but it hardly means that timers are not accurate in any meaningful sense.

With microsecond precision, the rounding error would only be 0.002%, which should be more than accurate enough for any application.

At any rate, 2% +/- 1% error is much better than the current behavior of 2% + up to 20% error.

Comment 25 Dan Winship 2012-02-19 23:11:53 UTC

from reading Owen's compositing timing investigations (http://blog.fishsoup.net/2011/11/08/applicationcompositor-synchronization/ and the linked posts) it's not clear that having an API for "call me as close to every 1/60s as you can manage" is actually necessary/useful. At any rate, you'd want it to be synced to the vblank in some well-defined way, so it would have to happen mostly at a higher level than glib anyway.

Comment 26 GNOME Infrastructure Team 2018-05-24 13:42:18 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/503.