GNOME Bugzilla – Bug 797258
shmsrc: client main socket should be non-blocking
Last modified: 2018-11-03 14:35:39 UTC
While testing bug 797203, I have found that client socket is opened without O_NONBLOCK. This might cause some race condition. See the blocking in the send system call below:
+ Trace 238696
Thread 5 (Thread 0x7feb37fff700 (LWP 22116))
Thread 4 (Thread 0x7feb3c956700 (LWP 22115))
Thread 3 (Thread 0x7feb3d157700 (LWP 22114))
Created attachment 373872 [details] [review] use non-blocking socket
It seems "test_shm_alloc" might freeze after this patch. Investigating.
Just tried the patch with both 1.14 and master with gst-uninstalled and no problem. Let's wait for Josep since he was having some issues.
This fixes the deadlocks with the unit test and seems correct to me.
I'm a bit uncomfortable with this, as it means we'd need to add more code to resend the command if original send fails. Why does it block? Is the receiver not receiving?
(In reply to Olivier Crête from comment #5) > I'm a bit uncomfortable with this, as it means we'd need to add more code to > resend the command if original send fails. Why does it block? Is the > receiver not receiving? To be honest, I don't know why it blocked. But, see that both pipelines are block inside the "send" command. Both the receiver and sender are trying to use "send" on the same socket.
Also, the "recv" command was already set with MSG_DONTWAIT which means it was already async, so we are not changing anything there. So, this change only affects to the "send" command for the receiver (shmsrc), because the sender (shmsink) was already set with O_NONBLOCK.
And now that I'm seeing, the "send" command was already using MSG_DONTWAIT... I'm very confused right now. So, before it was already async and we were hoping to send everything at the first time. MSG_DONTWAIT is supposed to have the same effect as O_NONBLOCK for only that call. May be there's an issue in glibc?
From the man page: """ ... but differs in that MSG_DONTWAIT is a per-call option, whereas O_NONBLOCK is a setting on the open file description (see open(2)), which will affect all threads in the calling process and as well as other processes that hold file descriptors referring to the same open file description. """ O_NONBLOCK affects other processes, MSG_DONTWAIT it seems it doesn't, may be that's the issue? In any case, it seems we can also remove MSG_DONTWAIT since O_NONBLOCK already does the same (and more). And since we were already ignoring the async return codes, we can keep doing that or try 2 or 3 times if we get EAGAIN?
Created attachment 373889 [details] [review] use non-blocking socket (async) Remove MSG_DONTWAIT and take care of retries.
I used MSG_DONTWAIT because the receiver has code to retry (it blocks on a select instead), so it will retry if nothing is received. But the sender doesn't have this, if we really need to make it non blocking, then we need to add some code to retry later and make the whole thing asynchronous, which I'm afraid a big refactoring. This is why I'd rather we find why it blocks.. It's probably because the receiver doesn't have a receiving thread at this point. Does it always block in sp_writer_send_buf() called from gst_shm_sink_render() for you ? If that's the case, I guess we can add a poll() in there, with a sp_writer_set_flushing() type call that can unblock the send.
(In reply to Olivier Crête from comment #11) > I used MSG_DONTWAIT because the receiver has code to retry (it blocks on a > select instead), so it will retry if nothing is received. But the sender > doesn't have this, if we really need to make it non blocking, then we need > to add some code to retry later and make the whole thing asynchronous, which > I'm afraid a big refactoring. This is why I'd rather we find why it blocks.. > It's probably because the receiver doesn't have a receiving thread at this > point. > I see. Forget about my lastest patch, it's just a crappy hack. > Does it always block in sp_writer_send_buf() called from > gst_shm_sink_render() for you ? If that's the case, I guess we can add a > poll() in there, with a sp_writer_set_flushing() type call that can unblock > the send. Yes, it always blocks there. I'll see if I can find out more.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/797.