GNOME Bugzilla – Bug 311240
[autospawn] Race conditions in unix socket creation/deletion.
Last modified: 2012-02-28 20:32:42 UTC
I use xmms with esd and fairly regularly xmms complains that it can't open the audio device after it's played one song and is about to play the next. I captured a strace of xmms/esd during such a failure. It seems that xmms is closing the connection to esd and relaunching it. Here is a selection of lines from the esd strace. execve("/usr/bin/esd", ["esd", "-terminate", "-nobeeps", "-as", "2", "-spawnfd", "11"], [/* 39 vars */]) = 0 open("/usr/lib/libesd.so.0", O_RDONLY) = 3 mkdir("/tmp/.esd", 01777) = -1 EEXIST (File exists) lstat64("/tmp/.esd", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0 connect(3, {sa_family=AF_UNIX, path="/tmp/.esd/socket"}, 18) = -1 ECONNREFUSED ( Connection refused) unlink("/tmp/.esd/socket" <unfinished ...> <... unlink resumed> ) = -1 ENOENT (No such file or directory) bind(14, {sa_family=AF_UNIX, path="/tmp/.esd/socket"}, 18) = -1 ENOENT (No such file or directory) unlink("/tmp/.esd/socket") = -1 ENOENT (No such file or directory) rmdir("/tmp/.esd") = -1 ENOENT (No such file or directory) exit_group(1) = ? So, when it does the connect, the socket exists but is not listening. When it does the unlink, it gets rescheduled and the socket is gone when it gets back. Getting ENOENT from bind means that the directory is not there. This is confirmed by the rmdir. I assume that previous instance of esd is removing the socket directory at an inconvenient moment. It seems to me that if one process is doing mkdir and another is doing rmdir, the first cannot assume that the directory exists after the mkdir has succeeded. Similarly, if one process is doing bind and another is doing unlink, the first cannot assume that the socket exists after the bind succeeded. The mkdir/rmdir problem could be handled by doing mkdir/bind in a loop while mkdir succeeds and bind returns ENOENT, however this approach doesn't work with the bind/unlink problem. If the terminating esd gets rescheduled just before the unlink call, the unlink could occur after the next esd has started playing. This gets ugly quickly. Another approach is to use a lock file to guard the critical sections. This requires that the unlink of the lock file happen while the lock is held (to avoid the problem above), so after acquiring a lock there needs to be a test to make sure that the same file is still in place. It also needs care in the creation of the lock file so that things work if the lock file already exists. In that case, the lock file must be owned by the user or root to stop some other user renaming it under our feet. The lock file's contents also need to be left intact since it could be a hard link of one of the user's files. Those are my thoughts. I'm sure there are other approaches too.
You could just run esound as a daemon instead of having applications spawn one. The latter use pattern is strongly discouraged.
I'm no longer willing to support autospawning, since it's fundamentally broken. I'll leave this bug open as "enhancement" and tag it "autospawn" in case someone decides to fix the problem.
"esound" has not seen code changes for more than three years according to http://git.gnome.org/browse/esound/log/ , and it will not see further active development anymore according to its developers. Closing this report as WONTFIX - Please feel free to reopen this bug report in the future if anyone takes the responsibility for active development again.