GNOME Bugzilla – Bug 481569
Calling gobject.threads_init() causes a lot of wakeups
Last modified: 2010-02-07 23:01:39 UTC
I'm the author of an audio player that uses PyGTK, and I have been notified that my player wakes up very frequently:
After some tests using powertop, If found that the call to gobject.threads_init() is the culprit. Even without any threads, wakeups are there just because of this call.
So I'm not saying there's a bug in the way threads are managed in PyGTK, maybe these wakeups are needed, I just wanted to report that issue.
Yes, it is a known issue. Limitation in the Python/C API.
I even sent a patch to Python to give us an API to solve the problem, but they did not take it.
Can you please reference the thread where the patch was refused?
I think it was this thread:
Thanks. For the record this is the Sugar ticket about the issue:
Well, the patch wasn't "formally" refused, it just has been left ignored after some initial discussions.
The patch is at http://bugs.python.org/issue1564547
2 questions / suggestion:
* Is it possible to install another C signal handler that does whatever pygtk wants and then calls the other handler?
* Add an api for whether to add the timeout or not and do by default only when stdin is a tty and / or run w/ __debug__ true. Or is the ^C behavior needed for more than interactive shell like usage?
(In reply to comment #6)
> 2 questions / suggestion:
> * Is it possible to install another C signal handler that does whatever pygtk
> wants and then calls the other handler?
There's a chance it could work.
Personally, I have no time to work on this problem anymore; already too much time wasted for nothing :(
> * Add an api for whether to add the timeout or not and do by default only when
> stdin is a tty and / or run w/ __debug__ true. Or is the ^C behavior needed
> for more than interactive shell like usage?
The problem with this is that while the gtk main loop is running it isn't only Ctrl-C that is not caught; any other unix signal is also not caught until some python code is scheduled for execution. Applications relying on unix signals will not work correctly while the gtk main loop is running.
My only suggestion is to focus on what you can change, namely pygtk / pygobject here. It's often difficult to get changes into any code base and all you can do is work around the issues.
I also guess that I don't have faith in much of any signal behavior because it's set globally and any library can potentially change it. I think signals made sense when programs were small and single threaded... An api to turn this off would probably be appreciated by those who care about cpu usage when idle.
Yes, I guess you're right. An API to disable checking for signals (in threaded programs, or in win32 where even unthreaded programs have this problem) would be better than nothing.
Emesene is affected by this bug too
Gustavo Carneiro wrote:
> Well, the patch wasn't "formally" refused, it just
> has been left ignored after some initial discussions.
Wouldn't it be possible to bring some attention to it?
Ok, I've given it attention. By rejecting it. :-) My reading of the thread on python-dev is that the current patch as provided does not adequately resolve the problem.
Perhaps my rejection will generate further activity on it, resolving it? If anyone has any ways to resolve the outstanding issues on it, please add it to the Python tracker or start up a discussion on python-dev about it.
I don't have the experience to come up with a solution to this, but perhaps my update+rejection of the bug in the Python tracker will help to get it the attention from people who can resolve it.
Note that I've re-started the discussion on this issue at:
Guido is asking on the python-dev thread above for some explination of how pygtk uses threads and signals. If you can answer this, please do so there, or pass on information to me and I'll pass it on. In particular: http://mail.python.org/pipermail/python-dev/2007-December/075590.html
A new, simplified patch is now being proposed for Python 2.6:
We (python developers) would appreciate review from the pygtk developers on it. Especially appreciated would be patching pygtk to use it and verifying that everything works.
Python trunk (will be included in 2.6) has support for this now:
int PySignal_SetWakeupFd(int fd)
This utility function specifies a file descriptor to which a '\0' byte will
be written whenever a signal is received. It returns the previous such file
descriptor. The value -1 disables the feature; this is the initial state.
This is equivalent to signal.set_wakeup_fd in Python, but without any
error checking. fd should be a valid file descriptor. The function should
only be called from the main thread.
Created attachment 102043 [details] [review]
Initial patch, which creates an extra pipe and waits for read from it.
Testing on Linux with threading enabled against a trunk python build which appears to work fine.
Why this change?
- if (pyg_threads_enabled)
For Python 2.5 that will represent a regression; we start using timeout always instead of only if threads are enabled... Python 2.6 will take a while yet to be deployed, you know...
(In reply to comment #17)
> Why this change?
> - if (pyg_threads_enabled)
> For Python 2.5 that will represent a regression; we start using timeout always
> instead of only if threads are enabled... Python 2.6 will take a while yet to
> be deployed, you know...
It was just moved up a bit, read the patch again ;-)
Created attachment 102046 [details] [review]
patch v2: with changelog and improved comments
Great, then. Looking good.
Just don't forget to copy-paste the solution into PyGtk as well, or export PySignalWatchSource as a PyGObject API and use it from PyGtk.
2008-01-03 Johan Dahlin <email@example.com>
Reviewed by: Gustavo
* gobject/pygmainloop.c (pyg_signal_watch_prepare): Optinally
use PySignal_SetWakeupFd to avoid having to do a timeout to find
out if there are any pending signals from python.
those changes are creating abnormal cpu usage in some cases, on example is on http://twistedmatrix.com/trac/ticket/3096, rebuilding pygobject without using PySignal_SetWakeupFd workaround the issue
The patch looks broken. I don't see any attempt to read from the file descriptor, so it would forever be left in a readable state.
Aside: reading should be done before giving back control to python, to avoid a race condition.
The read should probably be in pyg_signal_watch_check, before the call to PyErr_CheckSignals(). (This apparently didn't get documented anywhere, but it was intended with the PySignal_SetWakeupFd API that the caller (ie pygtk) would take care of clearing the contents.)
Additionally, the pipe should be set non-blocking. At a minimum the write end needs to be, in case there's a burst of signals and it fills. Otherwise the signal handler's write() call may block, hanging the process.
I don't have a pygtk source tree handy, nevermind the twisted test case, so I'll leave it to someone else to implement and test it.
Created attachment 108006 [details] [review]
patch_v3: implementing the changes suggested by Adam
This patch is untested since I cannot easily reproduce the 100% issue in question.
It sets the write end of the file descriptor to non-blocking and read one byte from it before calling CheckSignals.
Can someone who sees this please test the patch and see if it solves it?
Sorry Johan, I tested your patch and it doesn't work. I'm testing with the test-case at http://twistedmatrix.com/trac/ticket/3096 (mentionned in one of the comments above).
After investigation with Johan, the read() call is performed on the wrong end of the pipe ;) It should be done on the read end (fds) instead of the write end (fds).
Modification to the patch is trivial and makes my use-case work as expected. Anyone else to give it a shot?
Created attachment 109405 [details] [review]
reads on the read end of the pipe which needs to be non-blocking
What's blocking the inclusion of this patch? It seems to apply against SVN trunk and to fix the bug.
Created attachment 110994 [details] [review]
proper support for the Python wakeup fd API
The previous patch didn't have #ifdefs in pyg_signal_watch_check() checking for the HAVE_PYSIGNAL_SETWAKEUPFD variable
The *write* end should be set non-blocking, not the read end. We only read 1 byte at a time and only in response to poll/select, so reading should never block.
It should be possible to test this without twisted. Install a python signal handler for some signal and have it do nothing but print a message. Fire up the gtk mainloop, then use some tool (top, kill, etc) to send that signal. It begin using 100% CPU without the patch and not affect the CPU with the correct patch. Either way it should print the message once.
If there was a way to count the number of unnecessary wakeups we could make this an automated test. Probably not necessary though.
Created attachment 111029 [details] [review]
patch v6: will it ever end?
Updated patch, following my suggestions.
Turns out the test case actually requires gobject.threads_init(). An example:
>>> import gobject
>>> ml = gobject.MainLoop()
>>> ml.run() # Ctrl-C the first time sends a signal
Traceback (most recent call last):
KeyboardInterrupt >>> ml.run() # If 100% CPU is used now, it's broken. If 0% is used, it's good.
(In reply to comment #31)
> The *write* end should be set non-blocking, not the read end. We only read 1
> byte at a time and only in response to poll/select, so reading should never
Both read and write ends should be non-blocking.
The write end because you want to handle signals and system calls should not block in signal handlers or else the program deadlocks there and then.
The read end because, when you finally get a chance to look at the pipe, multiple signal instances may have happened. You don't know how many, you just know you need to completely empty the pipe (else in the next poll iteration you'll be called again to handle a very old signal). This means reading all bytes available. For this reason the read end also needs to be non-blocking.
Currently we read only 1 byte at a time, so we will get called again if 2 signals came in, but this is harmless. Python itself has a record of what signals came and it will process them all in the first time around. The second time around will be a no-op.
We could read more than 1 byte, to avoid extra passes, and yes that would require read to be non-blocking. However, it's purely an optimization, and one I don't want to see until we've verified that the basic version works correctly.
OK, I guess that works too. But it doesn't hurt to make the read end non-blocking as well, just to be on the safe side...
When making only the write end non-blocking, pygobject does not even pass the test suite. Only making the read end makes threaded applications work correctly.
Furthermore, the patch is broken for the non-threaded case. In this case pyg_signal_watch_prepare returns without setting up the pipe, and pyg_signal_watch_check tries reading fd, which is stdin.
Created attachment 112315 [details] [review]
Do not use the wakeup fd if it not initialised
Here is a version of the patch that seems to actually work with both threaded and non-threaded applications.
(In reply to comment #37)
> Created an attachment (id=112315) 
> Do not use the wakeup fd if it not initialised
> Here is a version of the patch that seems to actually work with both threaded
> and non-threaded applications.
Can you provide a patch against trunk?
AFAICT the patch applies cleanly to trunk.
The patch is broken, but I never replied because I'm unable to run the test suite. Does anybody know how to run the test suite against python trunk (2.6)?
If the test suite doesn't work, I recommend rolling back the original fix to this bug until it can be done properly.
Created attachment 117767 [details] [review]
Actually read the byte and make the read end of the pipe non-blocking
Was still an issue with 2.15.1 as parts of the patch (read() and fctnl()) weren't applied; see LP #250876 https://bugs.launchpad.net/bugs/250876 which has a testcase at http://launchpadlibrarian.net/12725702/testcase.py
I'm attaching a patch merged against SVN trunk
Adam: What do you think of the latest patch in attachment 117767 [details] [review] ?
It makes no sense. If our handler is spuriously woken up then a blocking read should result in a hang, not a busy loop. Conversely, making the read non-blocking should cause a busy loop (if the spurious wakeups continue).
So what's actually going wrong? How the heck could a blocking read cause a busy loop? Somebody throw gdb at it already!
Nevermind that the *write* end should be non-blocking! Although it should be so obscure as to possibly never happen, a burst of signals could fill up the pipe, and then block in the signal handler when trying to over-fill it. Why does everybody insist on removing this to make the read end non-blocking, even though I've explained it several times?
The one fix that has come out of these patches is to not read from pipe if it's still fd 0 (which for us means not set).
Maybe there's something wrong with how we're using GSourceFuncs' check and dispatch. That API is virtually undocumented and I can't make sense of it anyway. (Update: After discussing on IRC and reading the source.. I still can't make sense of the API.)
Has any of this been tested on windows? I saw mention that the poll API behaves differently there, and might not even accept a pipe..
Okay, after more discussion on IRC I'm finally making sense of the API. There's two substantial oversights in our usage of it: First, prepare is called *repeatedly*, once for every iteration. Second, check is called *unconditionally*, whether or not our fd became readable.
So the glib main loop looks like this:
We need to make our prepare idempotent, allocating the pipe only once, calling PySignal_SetWakeupFd only once, and calling g_source_add_poll only once.
Then check should look in real_source->fd.revents to determine if our pipe is readable, before reading from it.
We should also add an explicit flag for when this code has been enabled, ie pyglib_threads_enabled() returned true in prepare. Checking if the read fd is 0 is too hacky.
I'm glad to see you-all working on this. OLPC can use the patch whenever you're happy with it. They can't start switching to a timer-queue-based CPU power suspend architecture until their Python programs stop polling every second.
This bug needs a little more focus. Since I can't build & test pygtk, is there somebody on here who can build & test pygtk, as well as being able to reproduce the problems, and who'd like to pair up with me on IRC to fix it?
I can help, I'm sure I can manage to build pygtk from the fedora RPMs at least. Can you point me to the "latest" patch I should be testing?
I'm cscott on #olpc-devel on oftc.net.
Created attachment 124052 [details] [review]
I'm in agreement that the current implementation is incorrect (doesn't implement GSourceFuncs semantics properly) and that the read end should be non-blocking. Here's the best GSourceFuncs documentation that I'm aware of:
Here's a patch which incorporates the feedback above, except that it actually adds the GPollFD to the source before the main loop iteration. I'm not confident that it's safe or sensible to modify the set of pollable file descriptors in prepare. This is how I've seen it done in other software (and implemented it in my own projects).
It also adds some cleanup code to the deallocation routines.
Help testing this is appreciated. I'm going to add it to OLPC's development builds to assist.
Created attachment 124068 [details] [review]
Improved version of the patch: the cleanup stuff was already there (I missed the finalize GSourceFunc), and make revents checking more accurate.
Created attachment 124069 [details] [review]
equivalent patch for PyGTK
With both of these, Sugar finally has zero wakeups when idle -- strace shows poll() called with a timeout of -1 :)
Sorry, meant to look at this sooner.
You said the *read* end is non-blocking? That's not right. If things are functioning correctly only the write end need be non-blocking (and only for an obscure circumstance that won't show up in testing.)
See comment #36.
comment #36 was do to grossly misusing the GSourceFuncs API. If we do things right it is NOT necessary. See comment #44.
Created attachment 125878 [details] [review]
updated pygobject fix
Sorry, I had a typo in my patch description. When I said "I'm in agreement that the read end should be non-blocking," I actually meant "blocking" - I was agreeing with you.
My previous patch did not make either end non-blocking. Here's an updated one which makes the write end non-blocking, to avoid possible signal-burst issues that you describe in comment #43. How does this look?
Johan, OK for me to commit this to pygobject (trunk and pypygobject-2-16) and pygtk (trunk and pygtk-2-14)? I will include appropriate changelog entries.
Don't ask me, I'm not maintaining these modules any longer.
Gian/Gustavo, is it OK for me to commit this?
This is the very first time I look at this issue, so bear with me.
I looked at the code with the proposed patch applied, and I only found one corner-case problem. If the user ever decides to run nested main loops (GLib allows that), the inner main loop will call PySignal_SetWakeupFd, overriding the outer main loop. When the inner main loop ends, the outer main loop will no longer be unable to wakeup with signals.
To solve this, perhaps the pipe file descriptors should just be static global variables, or the signal watch source be made a singleton.
Daniel, do you have time to look at the issue pointer out by Gustavo, we could push this into 2.17/2.18 for everybody happiness :)
Apologies, I'm busy with a new job and a new language so I don't think I'll get to this any time soon. I'm happy to glance at patches that anyone else posts though.
I implemented the tweak suggested by Gustavo in Daniel's initial patch. You can find it there: http://github.com/philn/pygobject/tree/wakeupfd
I just made the pipe file descriptors a static global variable and adapted the rest of the code accordingly, hoping i didn't messed up things. The twisted test-case runs fine, as well as the interactive test suggested in one of the comments above.
Please review :)
Hil Philippe, thanks a lot for reviving this bug.
I looked at the patch but I can't see how it modifies the behaviour. It just changes the storage of some data, but does not attack the problem that things will go wrong when there are multiple users of it.
Hi Daniel, in comment 60 Gustavo is suggesting to make the pipe file descriptors global, which, I think, is what I did. But sure I probably misunderstood the solution proposed ;)
making the data global is just the first step needed in order to prevent the two main loops trampling on each other.. The next step is to add supporting logic to prevent such trampling, now that both main loops can access the data.
Ok I think I understand a little more now, SetWakeupFd() should be called once only, right? In the case of nested main loops, the second mainloop is currently calling it, overriding the previously sets pipe file descriptors.
So adding a check of the pipes file descriptors in pyg_signal_watch_new() would do the trick?
that sounds more or less correct..show us the code :)
Check out my github branch ;)
Ping Daniel! Did you look at my github branch?
Sorry for the delay. That looks better, but still it doesn't cover all cases: it looks like after the inner main loop terminates, pyg_signal_watch_finalize() would close the (shared) file descriptors and cause breakage in the outer main loop.
Ok so I need to destroy and unref the signal_source only in the case where we are dealloc'ing the last main loop. How can I know the main loops number? excepted by keeping a static variable to count them? ;)
Personally i would make it a singleton and never close the descriptors -- hopefully that would be acceptable. If not, it could be a reference-counted singleton.
The pipe fds array is already a static, isn't it enough?
So I removed the source finalize callback, so that the file descriptors are not closed:
I'm not sure if Philippes patch covers all corner cases but it definitely fixes most of them (and, what is most important for me and probably most of people, it still fixes the main issue). Some distributions are shipping their packages with the 1st version of patch for a long time already and it's almost 2 years from initial report on this bug so it would be great if it would be finally fixed in mainstream. Why there's so low interest from pygobject maintainers?
perhaps Philippe could attach the most recent version of his patch
Created attachment 141219 [details] [review]
Is this good to merge now?
Does someone have a pygtk patch too?
This patch fixes moovida taking close to 100% CPU when doing nothing substantial. After applying the patch to my F-11 rpm of pygobject2 2.16.1, moovida drops to below 1% CPU in standby.
Is there any reason to hold off merging this patch ? Who should be prodded to get this accepted and ensure that python programs don't end up consuming lots of CPU needlessly ?
Comment on attachment 141219 [details] [review]
Philippe please commit, we will issue an unstable release soon including major changes and we can get a lot of testing.
(In reply to comment #80)
> (From update of attachment 141219 [details] [review])
> Philippe please commit, we will issue an unstable release soon including major
> changes and we can get a lot of testing.
Thanks for accepting the patch! Can someone else commit this? I don't have a GNOME account.
Philippe doesn't have a git.gnome.org account, I pushed the patch on his behalf.
Author: Philippe Normad <firstname.lastname@example.org>
Date: Wed Oct 21 18:01:16 2009 +0200
pygmainloop: fix use of PySignal_WakeUpFD API for nested loops
Fixes bug #481569
Created attachment 145961 [details] [review]
Formatted version of previous version. No code change, ready to push ;)
great, thanks everyone! did you commit to both pygobject and pygtk?
This was only against pygobject, unless I completely missed something.
the same thing needs to go into PyGTK
When this patch for pygobject is backported to version 2.20, moovida becomes unresponsive (input events are very slow to be processed). Is there anything else which is needed to make this work fine (maybe the PyGTK fix mentioned in comment #86)?
i'm running Moovida under python2.6 and OpenSuse 11.1 (X86_64).
Probably a stupid question, but how can I apply this patch. I installed the debugsource and applied the patch to pygmainloop.c
I don't know exactly how to compile, anybody can help?
(Getting errors, i suspect because of missing configure)
Or maybe there is an RPM containing this fix?
Would greatly appreciate this.