After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 652072 - gmain: make use of signalfd()
gmain: make use of signalfd()
Status: RESOLVED WONTFIX
Product: glib
Classification: Platform
Component: mainloop
unspecified
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 455114 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-06-07 17:34 UTC by Colin Walters
Modified: 2011-11-15 14:23 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gmain: Move child watch waitpid() into helper thread (5.08 KB, patch)
2011-06-07 17:34 UTC, Colin Walters
none Details | Review
gmain: Only wake up affected contexts on receipt of UNIX signal (2.57 KB, patch)
2011-06-07 17:34 UTC, Colin Walters
none Details | Review
gmain: Move child watch waitpid() into helper thread (5.71 KB, patch)
2011-06-08 17:26 UTC, Colin Walters
needs-work Details | Review
gmain: Only wake up affected contexts on receipt of UNIX signal (3.39 KB, patch)
2011-06-08 17:46 UTC, Colin Walters
none Details | Review
gmain: Only run through signal delivery once per read() (1.15 KB, patch)
2011-06-08 17:46 UTC, Colin Walters
none Details | Review
gmain: Move child watch waitpid() into helper thread (6.94 KB, patch)
2011-06-10 15:16 UTC, Colin Walters
none Details | Review
gmain: Only wake up affected contexts on receipt of UNIX signal (3.41 KB, patch)
2011-06-10 15:16 UTC, Colin Walters
none Details | Review
gmain: Only run through signal delivery once per read() (1.15 KB, patch)
2011-06-10 15:16 UTC, Colin Walters
committed Details | Review
spawn-multithreaded: Also look for lt-test-echo (1.01 KB, patch)
2011-06-10 15:17 UTC, Colin Walters
committed Details | Review
spawn-singlethread.c: New test (5.80 KB, patch)
2011-06-10 15:17 UTC, Colin Walters
committed Details | Review
gspawn: Handle EINTR in a few more cases (2.07 KB, patch)
2011-06-10 15:17 UTC, Colin Walters
committed Details | Review
spawn-multithreaded: Clean up IO channel code (2.33 KB, patch)
2011-06-10 15:17 UTC, Colin Walters
committed Details | Review
gspawn: Reset signal handlers for synchronous spawning (1.15 KB, patch)
2011-06-10 15:17 UTC, Colin Walters
committed Details | Review
gmain: Use sigset_t for keeping track of installed signals (2.04 KB, patch)
2011-06-11 21:28 UTC, Colin Walters
committed Details | Review
glib/tests/unix.c: Also test SIGTERM (3.15 KB, patch)
2011-06-11 21:28 UTC, Colin Walters
committed Details | Review
gmain: Initial signalfd usage (8.89 KB, patch)
2011-06-11 21:29 UTC, Colin Walters
none Details | Review
gmain: Move child watch waitpid() into helper thread (6.77 KB, patch)
2011-06-14 23:35 UTC, Colin Walters
none Details | Review
gmain: Only wake up affected contexts on receipt of UNIX signal (3.27 KB, patch)
2011-06-14 23:35 UTC, Colin Walters
none Details | Review
gmain: Initial signalfd usage (8.84 KB, patch)
2011-06-14 23:35 UTC, Colin Walters
none Details | Review

Description Colin Walters 2011-06-07 17:34:41 UTC
See attached.
Comment 1 Colin Walters 2011-06-07 17:34:44 UTC
Created attachment 189421 [details] [review]
gmain: Move child watch waitpid() into helper thread

This further lines up the SIGCHLD handling with the other UNIX
code.  We keep track of all the child watch sources, and
call waitpid() in the helper thread, rather than calling it
repeatedly in each context.
Comment 2 Colin Walters 2011-06-07 17:34:48 UTC
Created attachment 189422 [details] [review]
gmain: Only wake up affected contexts on receipt of UNIX signal

Now that we maintain lists of both UNIX signal watches and SIGCHLD
handlers, simply iterate over them and wake up their context,
rather than all contexts.

If your app has 1000 threads but is only watching a subprocess exit
from the main thread, this saves 999 wakeups.
Comment 3 Matthias Clasen 2011-06-08 04:21:36 UTC
Both patches make sense to me. Do we have enough test coverage of child watch sources to actually verify the reduced wakeup effect ?
Comment 4 Colin Walters 2011-06-08 14:00:45 UTC
(In reply to comment #3)
> Both patches make sense to me. Do we have enough test coverage of child watch
> sources to actually verify the reduced wakeup effect ?

Well...the wakeup would be harmless before; we would just interrupt the poll(), which for child watch sources would call waitpid() again.

I could try to write a test case for this which would have a custom source which would assert if it was called before the child in another thread exited...but on the other hand I'm not sure I want to state permanently that main loops can't be "spuriously" woken up.

Anyways these patches are just some preparatory cleanup for signalfd() basically, the wakeups I just noticed after the fact.
Comment 5 Colin Walters 2011-06-08 17:26:07 UTC
Created attachment 189485 [details] [review]
gmain: Move child watch waitpid() into helper thread

Misc fixes
Comment 6 Colin Walters 2011-06-08 17:46:45 UTC
Created attachment 189486 [details] [review]
gmain: Only wake up affected contexts on receipt of UNIX signal

Misc fixes
Comment 7 Colin Walters 2011-06-08 17:46:51 UTC
Created attachment 189487 [details] [review]
gmain: Only run through signal delivery once per read()

This is what I intended to do before.
Comment 8 Colin Walters 2011-06-08 17:49:09 UTC
Comment on attachment 189485 [details] [review]
gmain: Move child watch waitpid() into helper thread

This turns out to be racy =(  The problem is that we don't have a lock on the mainloop, so it can call prepare() on the source, see the child hasn't exited and drop into poll(), and in between those two we notice the child exited (and we would wake up the context, but since it's not poll_waiting we do nothing).
Comment 9 Colin Walters 2011-06-10 15:16:00 UTC
Created attachment 189637 [details] [review]
gmain: Move child watch waitpid() into helper thread

Fix a race condition
Comment 10 Colin Walters 2011-06-10 15:16:15 UTC
Created attachment 189638 [details] [review]
gmain: Only wake up affected contexts on receipt of UNIX signal

Rebase on previous patch
Comment 11 Colin Walters 2011-06-10 15:16:23 UTC
Created attachment 189639 [details] [review]
gmain: Only run through signal delivery once per read()

This is what I intended to do before.
Comment 12 Colin Walters 2011-06-10 15:17:10 UTC
Created attachment 189642 [details] [review]
spawn-multithreaded: Also look for lt-test-echo

I hate libtool =(
Comment 13 Colin Walters 2011-06-10 15:17:23 UTC
Created attachment 189643 [details] [review]
spawn-singlethread.c: New test

Simpler than threads.
Comment 14 Colin Walters 2011-06-10 15:17:28 UTC
Created attachment 189644 [details] [review]
gspawn: Handle EINTR in a few more cases

I was debugging gthread/tests/spawn-multithreaded.c, and while I
don't think I actually hit EINTR in any of these cases, it'd be
good to fix them anyways.
Comment 15 Colin Walters 2011-06-10 15:17:33 UTC
Created attachment 189645 [details] [review]
spawn-multithreaded: Clean up IO channel code

I modeled the new bits after how gunixmount.c handles GIOChannel;
it's apparently easier not to look at the condition.
Comment 16 Colin Walters 2011-06-10 15:17:37 UTC
Created attachment 189646 [details] [review]
gspawn: Reset signal handlers for synchronous spawning

We should by default reset signal handlers; particularly with the
ability to install them via g_unix_signal_source_new(), we don't
want to call a user callback inside a fork()ed helper process.
Comment 17 Colin Walters 2011-06-10 15:20:32 UTC
So the current state is that these patches work fine on RHEL6 + git glib, but fail on Fedora 15.

Most likely a kernel or glibc regression =(  I think my code is correct here.  The Linux man page for waitpid() clearly says that it's allowed nowadays to wait on child processes of other threads, and I think my locking is correct etc.

To do a torture test:

while true; do ./gthread/tests/spawn-multithreaded; done

This has been running for ~10 minutes fine on RHEL6, but fails very quickly on Fedora 15, and even worse - gdb hangs attempting to attach to the child process =( =(
Comment 18 Colin Walters 2011-06-11 21:28:53 UTC
Created attachment 189738 [details] [review]
gmain: Use sigset_t for keeping track of installed signals

Minor code cleanup.
Comment 19 Colin Walters 2011-06-11 21:28:58 UTC
Created attachment 189739 [details] [review]
glib/tests/unix.c: Also test SIGTERM

We expect this to not kill the process, so it'd be a good one to test.
Comment 20 Colin Walters 2011-06-11 21:29:01 UTC
Created attachment 189740 [details] [review]
gmain: Initial signalfd usage

This is just using signalfd() as a replacement for the pipe in the
helper thread.

Unfortunately, signalfd has a problem for this.  Our previous use of
sigaction() ensures that we get the signal, even in the presence of
other threads that may have it unblocked.

However if we don't use sigaction(), we are open to the possibility
that the thread reading from the signalfd() doesn't get it =(

Needs more investigation.
Comment 21 Matthias Clasen 2011-06-13 02:29:40 UTC
Review of attachment 189644 [details] [review]:

::: glib/gspawn.c
@@ +1010,3 @@
+    goto retry;
+
+  return ret;

Can that actually happen ? I can see how close could be slow and thus get interrupted, but open ?
Comment 22 Matthias Clasen 2011-06-13 02:38:14 UTC
Review of attachment 189738 [details] [review]:

man sigaddset says:

  Objects of type sigset_t must  be  initialized  by  a  call  to  either
       sigemptyset()  or  sigfillset()  before  being  passed to the functions
       sigaddset(), sigdelset() and  sigismember()  or  the  additional  glibc
       functions  described  below  (sigisemptyset(),  sigandset(),  and  sig‐
       orset()).  The results are undefined if this is not done.

I guess we should call sigemptyset(unix_signal_mask) somewhere ?
Comment 23 Matthias Clasen 2011-06-13 02:39:01 UTC
Review of attachment 189739 [details] [review]:

Good idea
Comment 24 Matthias Clasen 2011-06-13 02:52:15 UTC
Review of attachment 189740 [details] [review]:

::: glib/gmain.c
@@ +4447,3 @@
+   * can be called from any thread.
+   */
+  pthread_sigmask (SIG_BLOCK, &unix_signal_mask, NULL);

Hmm, does this only work if we use -pthread in CFLAGS/LDFLAGS ?
I don't think we do for libglib
Comment 25 Colin Walters 2011-06-14 23:25:50 UTC
Attachment 189639 [details] pushed as 211d7ad - gmain: Only run through signal delivery once per read()
Attachment 189642 [details] pushed as 01ee944 - spawn-multithreaded: Also look for lt-test-echo
Attachment 189643 [details] pushed as 922f6aa - spawn-singlethread.c: New test
Attachment 189644 [details] pushed as 7e1886b - gspawn: Handle EINTR in a few more cases
Attachment 189645 [details] pushed as c2364ce - spawn-multithreaded: Clean up IO channel code
Attachment 189646 [details] pushed as 1874ad9 - gspawn: Reset signal handlers for synchronous spawning
Attachment 189738 [details] pushed as ed827de - gmain: Use sigset_t for keeping track of installed signals
Attachment 189739 [details] pushed as b9c67b4 - glib/tests/unix.c: Also test SIGTERM
Comment 26 Colin Walters 2011-06-14 23:35:36 UTC
Created attachment 189951 [details] [review]
gmain: Move child watch waitpid() into helper thread

rebased
Comment 27 Colin Walters 2011-06-14 23:35:45 UTC
Created attachment 189952 [details] [review]
gmain: Only wake up affected contexts on receipt of UNIX signal

rebased
Comment 28 Colin Walters 2011-06-14 23:35:55 UTC
Created attachment 189953 [details] [review]
gmain: Initial signalfd usage

rebased
Comment 29 Colin Walters 2011-06-14 23:42:12 UTC
So I pushed my cleanup patches, leaving the waitpid()/signalfd() changes for further thought and work.

I think what I really want is a flag for signalfd() like SFD_ALWAYS which ensured that I would read the signal, even if another unrelated bit in the process (say OpenJDK) had installed a handler.  Then we could set up a signalfd for SIGCHLD say, and watch for changes in our children.
Comment 30 Marc-Andre Lureau 2011-06-16 14:50:53 UTC
*** Bug 455114 has been marked as a duplicate of this bug. ***
Comment 31 Matthias Clasen 2011-06-22 13:20:57 UTC
eventfd move is done, so remove it from the summary
Comment 32 Lennart Poettering 2011-06-22 13:45:43 UTC
Mathias asked me to comment here with what i knew about this, which isn't much. Basically, signal handling can only be done by a single piece of code per program, you cannot have multiple parts sharing it. While you can use signalfd and WNOWAIT to peek into the data of a SIGCHLD doing this is necessarily racy, since multiple SIGCHLD signals will be merged on reception in the queue, but waitid() won't. 

I think it is practically impossible to write code against current interfaces that would allow you to multiplex signal dispatching into multiple independent code blocks. signalfd() to my knowledge will only pass signals to you if you have blocked them otherwise with sigprocmask.
Comment 33 Colin Walters 2011-06-22 14:04:31 UTC
(In reply to comment #32)

> I think it is practically impossible to write code against current interfaces
> that would allow you to multiplex signal dispatching into multiple independent
> code blocks. signalfd() to my knowledge will only pass signals to you if you
> have blocked them otherwise with sigprocmask.

Yes, that was my conclusion as well; we would need new kernel interfaces to improve things here.

My strawman interface proposal was in https://bugzilla.gnome.org/show_bug.cgi?id=652072#c29
Comment 34 Lennart Poettering 2011-06-22 14:07:28 UTC
I wonder if distributing signals among multiple users in the same process is actually useful for anything but SIGCHLD. If that is the case then it might be possible to introduce childfd() or so, which queues all childs dying independently of the (borked) signal queue. childfd() could then specifically designed to allow multiple consumers of it in the same process.
Comment 35 Allison Karlitskaya (desrt) 2011-11-15 14:23:56 UTC
Colin seems to have concluded that signalfd is almost useless for our purposes (and he's probably right), so I'll close this.