GNOME Bugzilla – Bug 652072
gmain: make use of signalfd()
Last modified: 2011-11-15 14:23:56 UTC
See attached.
Created attachment 189421 [details] [review] gmain: Move child watch waitpid() into helper thread This further lines up the SIGCHLD handling with the other UNIX code. We keep track of all the child watch sources, and call waitpid() in the helper thread, rather than calling it repeatedly in each context.
Created attachment 189422 [details] [review] gmain: Only wake up affected contexts on receipt of UNIX signal Now that we maintain lists of both UNIX signal watches and SIGCHLD handlers, simply iterate over them and wake up their context, rather than all contexts. If your app has 1000 threads but is only watching a subprocess exit from the main thread, this saves 999 wakeups.
Both patches make sense to me. Do we have enough test coverage of child watch sources to actually verify the reduced wakeup effect ?
(In reply to comment #3) > Both patches make sense to me. Do we have enough test coverage of child watch > sources to actually verify the reduced wakeup effect ? Well...the wakeup would be harmless before; we would just interrupt the poll(), which for child watch sources would call waitpid() again. I could try to write a test case for this which would have a custom source which would assert if it was called before the child in another thread exited...but on the other hand I'm not sure I want to state permanently that main loops can't be "spuriously" woken up. Anyways these patches are just some preparatory cleanup for signalfd() basically, the wakeups I just noticed after the fact.
Created attachment 189485 [details] [review] gmain: Move child watch waitpid() into helper thread Misc fixes
Created attachment 189486 [details] [review] gmain: Only wake up affected contexts on receipt of UNIX signal Misc fixes
Created attachment 189487 [details] [review] gmain: Only run through signal delivery once per read() This is what I intended to do before.
Comment on attachment 189485 [details] [review] gmain: Move child watch waitpid() into helper thread This turns out to be racy =( The problem is that we don't have a lock on the mainloop, so it can call prepare() on the source, see the child hasn't exited and drop into poll(), and in between those two we notice the child exited (and we would wake up the context, but since it's not poll_waiting we do nothing).
Created attachment 189637 [details] [review] gmain: Move child watch waitpid() into helper thread Fix a race condition
Created attachment 189638 [details] [review] gmain: Only wake up affected contexts on receipt of UNIX signal Rebase on previous patch
Created attachment 189639 [details] [review] gmain: Only run through signal delivery once per read() This is what I intended to do before.
Created attachment 189642 [details] [review] spawn-multithreaded: Also look for lt-test-echo I hate libtool =(
Created attachment 189643 [details] [review] spawn-singlethread.c: New test Simpler than threads.
Created attachment 189644 [details] [review] gspawn: Handle EINTR in a few more cases I was debugging gthread/tests/spawn-multithreaded.c, and while I don't think I actually hit EINTR in any of these cases, it'd be good to fix them anyways.
Created attachment 189645 [details] [review] spawn-multithreaded: Clean up IO channel code I modeled the new bits after how gunixmount.c handles GIOChannel; it's apparently easier not to look at the condition.
Created attachment 189646 [details] [review] gspawn: Reset signal handlers for synchronous spawning We should by default reset signal handlers; particularly with the ability to install them via g_unix_signal_source_new(), we don't want to call a user callback inside a fork()ed helper process.
So the current state is that these patches work fine on RHEL6 + git glib, but fail on Fedora 15. Most likely a kernel or glibc regression =( I think my code is correct here. The Linux man page for waitpid() clearly says that it's allowed nowadays to wait on child processes of other threads, and I think my locking is correct etc. To do a torture test: while true; do ./gthread/tests/spawn-multithreaded; done This has been running for ~10 minutes fine on RHEL6, but fails very quickly on Fedora 15, and even worse - gdb hangs attempting to attach to the child process =( =(
Created attachment 189738 [details] [review] gmain: Use sigset_t for keeping track of installed signals Minor code cleanup.
Created attachment 189739 [details] [review] glib/tests/unix.c: Also test SIGTERM We expect this to not kill the process, so it'd be a good one to test.
Created attachment 189740 [details] [review] gmain: Initial signalfd usage This is just using signalfd() as a replacement for the pipe in the helper thread. Unfortunately, signalfd has a problem for this. Our previous use of sigaction() ensures that we get the signal, even in the presence of other threads that may have it unblocked. However if we don't use sigaction(), we are open to the possibility that the thread reading from the signalfd() doesn't get it =( Needs more investigation.
Review of attachment 189644 [details] [review]: ::: glib/gspawn.c @@ +1010,3 @@ + goto retry; + + return ret; Can that actually happen ? I can see how close could be slow and thus get interrupted, but open ?
Review of attachment 189738 [details] [review]: man sigaddset says: Objects of type sigset_t must be initialized by a call to either sigemptyset() or sigfillset() before being passed to the functions sigaddset(), sigdelset() and sigismember() or the additional glibc functions described below (sigisemptyset(), sigandset(), and sig‐ orset()). The results are undefined if this is not done. I guess we should call sigemptyset(unix_signal_mask) somewhere ?
Review of attachment 189739 [details] [review]: Good idea
Review of attachment 189740 [details] [review]: ::: glib/gmain.c @@ +4447,3 @@ + * can be called from any thread. + */ + pthread_sigmask (SIG_BLOCK, &unix_signal_mask, NULL); Hmm, does this only work if we use -pthread in CFLAGS/LDFLAGS ? I don't think we do for libglib
Attachment 189639 [details] pushed as 211d7ad - gmain: Only run through signal delivery once per read() Attachment 189642 [details] pushed as 01ee944 - spawn-multithreaded: Also look for lt-test-echo Attachment 189643 [details] pushed as 922f6aa - spawn-singlethread.c: New test Attachment 189644 [details] pushed as 7e1886b - gspawn: Handle EINTR in a few more cases Attachment 189645 [details] pushed as c2364ce - spawn-multithreaded: Clean up IO channel code Attachment 189646 [details] pushed as 1874ad9 - gspawn: Reset signal handlers for synchronous spawning Attachment 189738 [details] pushed as ed827de - gmain: Use sigset_t for keeping track of installed signals Attachment 189739 [details] pushed as b9c67b4 - glib/tests/unix.c: Also test SIGTERM
Created attachment 189951 [details] [review] gmain: Move child watch waitpid() into helper thread rebased
Created attachment 189952 [details] [review] gmain: Only wake up affected contexts on receipt of UNIX signal rebased
Created attachment 189953 [details] [review] gmain: Initial signalfd usage rebased
So I pushed my cleanup patches, leaving the waitpid()/signalfd() changes for further thought and work. I think what I really want is a flag for signalfd() like SFD_ALWAYS which ensured that I would read the signal, even if another unrelated bit in the process (say OpenJDK) had installed a handler. Then we could set up a signalfd for SIGCHLD say, and watch for changes in our children.
*** Bug 455114 has been marked as a duplicate of this bug. ***
eventfd move is done, so remove it from the summary
Mathias asked me to comment here with what i knew about this, which isn't much. Basically, signal handling can only be done by a single piece of code per program, you cannot have multiple parts sharing it. While you can use signalfd and WNOWAIT to peek into the data of a SIGCHLD doing this is necessarily racy, since multiple SIGCHLD signals will be merged on reception in the queue, but waitid() won't. I think it is practically impossible to write code against current interfaces that would allow you to multiplex signal dispatching into multiple independent code blocks. signalfd() to my knowledge will only pass signals to you if you have blocked them otherwise with sigprocmask.
(In reply to comment #32) > I think it is practically impossible to write code against current interfaces > that would allow you to multiplex signal dispatching into multiple independent > code blocks. signalfd() to my knowledge will only pass signals to you if you > have blocked them otherwise with sigprocmask. Yes, that was my conclusion as well; we would need new kernel interfaces to improve things here. My strawman interface proposal was in https://bugzilla.gnome.org/show_bug.cgi?id=652072#c29
I wonder if distributing signals among multiple users in the same process is actually useful for anything but SIGCHLD. If that is the case then it might be possible to introduce childfd() or so, which queues all childs dying independently of the (borked) signal queue. childfd() could then specifically designed to allow multiple consumers of it in the same process.
Colin seems to have concluded that signalfd is almost useless for our purposes (and he's probably right), so I'll close this.