GNOME Bugzilla – Bug 136871
New g_child_watch implementation.
Last modified: 2011-06-04 03:57:04 UTC
I know, this isn't the time to propose such changes, but after noticing that the current g_child_watch implementation does not work for multithreaded programs for at least linux < 2.6 I decided to reimplement it to make it work there too. This is done by calling waitpid in the signal handler (This is safe according to the docs, it is in fact recommended for SIGCHLD signal handlers). However locking mutexes is not allowed, so how to get the pids to check for (we shouldn't call waitpid (-1) and interfere with the application) into the signal handler. Here I found quite a nice solution, which, I think, is legally working. I use a list of pids, which is read by the signal handler without a lock. This list can only grow and never shrink or otherwise lose elements. Also proper memory barriers are used. Thus the signal handler always interates a valid list of pids. The pids can occur twice or they can be stale, but both is not a problem for calling waitpid. An added advantage is, that the extra helper thread isn't needed anymore (This could also have been done with the current solution) Also sigaction(2) is used instead of signal(2), as the latter isn't MT-safe. However we use signal at other places also. sigaction should be pretty portable by now. We might want to use it at the other places also.
Created attachment 25511 [details] [review] new g_child_watch implementation
I don't see how this sort of implementation can possible be 100% reliable - we can't use a blocking write to the pipe, and once the pipe is full there is no place for the status to be stored. (Even using a helper thread to remove data from the pipe doesn't help because the interrupted thread might be holding a lock on malloc data structures) Of course, it would be *hard* to trigger this problem ... you would need thousands of children exiting at one time, but it does seem fundemental.
Another problem is that you can have races where signal handlers in multiple threads wait for the same PID, though there you mostly have harmless ECHILD as long as PIDs take a while to get reused. Thinking about the memory allocation problem, maybe it's possible to make child_watch_pids actually have structures in it, not just pids - so you have: PID FLAG STATUS With transitions g_child_watch_add() FLAG = 0 <memory barrier> PID = PID signal handler if (PID != -1 && FLAG != 0) { if (waitpid (PID) != 0/-1) { STATUS = status; <memory barrier> FLAG = 1 } } check if (FLAG) { status = STATUS pid = PID <memory barrier> PID = -1 notify (pid, status); } Though gettting all the memory barriers portably right is hard...
Probably not a 2.6 issue at this point. Moving to 2.8, for further consideration
The childwatch implementation has been refactored in the meantime, and will perhaps soon be supplanted by a signalfd-based implementation.