After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 797258 - shmsrc: client main socket should be non-blocking
shmsrc: client main socket should be non-blocking
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
1.14.x
Other Linux
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2018-10-08 20:10 UTC by Aleix Conchillo Flaqué
Modified: 2018-11-03 14:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
use non-blocking socket (792 bytes, patch)
2018-10-08 20:13 UTC, Aleix Conchillo Flaqué
none Details | Review
use non-blocking socket (async) (2.12 KB, patch)
2018-10-10 20:11 UTC, Aleix Conchillo Flaqué
none Details | Review

Description Aleix Conchillo Flaqué 2018-10-08 20:10:29 UTC
While testing bug 797203, I have found that client socket is opened without O_NONBLOCK. This might cause some race condition. See the blocking in the send system call below:


Thread 5 (Thread 0x7feb37fff700 (LWP 22116))

  • #0 __libc_send
    at ../sysdeps/unix/sysv/linux/x86_64/send.c line 26
  • #1 send_command
    at shmpipe.c line 481
  • #2 sp_client_recv_finish
    at shmpipe.c line 767
  • #3 free_buffer
    at gstshmsrc.c line 313
  • #4 default_free
    at gstallocator.c line 527
  • #5 _gst_memory_free
    at gstmemory.c line 97
  • #6 gst_memory_unref
    at ../gst/gstmemory.h line 345
  • #7 _gst_buffer_free
    at gstbuffer.c line 749
  • #8 gst_base_sink_chain_unlocked
    at gstbasesink.c line 3655
  • #9 gst_base_sink_chain_main
    at gstbasesink.c line 3672
  • #10 gst_base_sink_chain
    at gstbasesink.c line 3701
  • #11 gst_pad_chain_data_unchecked
    at gstpad.c line 4322
  • #12 gst_pad_push_data
    at gstpad.c line 4578
  • #13 gst_pad_push
    at gstpad.c line 4697
  • #14 gst_base_src_loop
    at gstbasesrc.c line 2957
  • #15 gst_task_func
    at gsttask.c line 332
  • #16 g_thread_pool_thread_proxy
    at gthreadpool.c line 307
  • #17 g_thread_proxy
    at gthread.c line 784
  • #18 start_thread
    at pthread_create.c line 333
  • #19 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Thread 4 (Thread 0x7feb3c956700 (LWP 22115))

  • #0 syscall
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S line 38
  • #1 g_cond_wait_until
    at gthread-posix.c line 1449
  • #2 g_async_queue_pop_intern_unlocked
    at gasyncqueue.c line 422
  • #3 g_async_queue_timeout_pop
    at gasyncqueue.c line 545
  • #4 g_thread_pool_wait_for_new_pool
    at gthreadpool.c line 167
  • #5 g_thread_pool_thread_proxy
    at gthreadpool.c line 364
  • #6 g_thread_proxy
    at gthread.c line 784
  • #7 start_thread
    at pthread_create.c line 333
  • #8 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Thread 3 (Thread 0x7feb3d157700 (LWP 22114))

  • #0 __libc_send
    at ../sysdeps/unix/sysv/linux/x86_64/send.c line 26
  • #1 send_command
    at shmpipe.c line 481
  • #2 sp_writer_send_buf
    at shmpipe.c line 620
  • #3 gst_shm_sink_render
    at gstshmsink.c line 785
  • #4 gst_base_sink_chain_unlocked
    at gstbasesink.c line 3546
  • #5 gst_base_sink_chain_main
    at gstbasesink.c line 3672
  • #6 gst_base_sink_chain
    at gstbasesink.c line 3701
  • #7 gst_pad_chain_data_unchecked
    at gstpad.c line 4322
  • #8 gst_pad_push_data
    at gstpad.c line 4578
  • #9 gst_pad_push
    at gstpad.c line 4697
  • #10 gst_base_src_loop
    at gstbasesrc.c line 2957
  • #11 gst_task_func
    at gsttask.c line 332
  • #12 g_thread_pool_thread_proxy
    at gthreadpool.c line 307
  • #13 g_thread_proxy
    at gthread.c line 784
  • #14 start_thread
    at pthread_create.c line 333
  • #15 clone
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S line 109

Comment 1 Aleix Conchillo Flaqué 2018-10-08 20:13:32 UTC
Created attachment 373872 [details] [review]
use non-blocking socket
Comment 2 Aleix Conchillo Flaqué 2018-10-08 23:02:15 UTC
It seems "test_shm_alloc" might freeze after this patch. Investigating.
Comment 3 Aleix Conchillo Flaqué 2018-10-08 23:51:49 UTC
Just tried the patch with both 1.14 and master with gst-uninstalled and no problem. Let's wait for Josep since he was having some issues.
Comment 4 Josep Torra Valles 2018-10-09 16:20:39 UTC
This fixes the deadlocks with the unit test and seems correct to me.
Comment 5 Olivier Crête 2018-10-10 16:10:24 UTC
I'm a bit uncomfortable with this, as it means we'd need to add more code to resend the command if original send fails. Why does it block? Is the receiver not receiving?
Comment 6 Aleix Conchillo Flaqué 2018-10-10 19:32:49 UTC
(In reply to Olivier Crête from comment #5)
> I'm a bit uncomfortable with this, as it means we'd need to add more code to
> resend the command if original send fails. Why does it block? Is the
> receiver not receiving?

To be honest, I don't know why it blocked. But, see that both pipelines are block inside the "send" command. Both the receiver and sender are trying to use "send" on the same socket.
Comment 7 Aleix Conchillo Flaqué 2018-10-10 19:36:51 UTC
Also, the "recv" command was already set with MSG_DONTWAIT which means it was already async, so we are not changing anything there.

So, this change only affects to the "send" command for the receiver (shmsrc), because the sender (shmsink) was already set with O_NONBLOCK.
Comment 8 Aleix Conchillo Flaqué 2018-10-10 19:44:24 UTC
And now that I'm seeing, the "send" command was already using MSG_DONTWAIT... I'm very confused right now.

So, before it was already async and we were hoping to send everything at the first time. MSG_DONTWAIT is supposed to have the same effect as O_NONBLOCK for only that call.

May be there's an issue in glibc?
Comment 9 Aleix Conchillo Flaqué 2018-10-10 19:50:16 UTC
From the man page:

"""
... but differs in that MSG_DONTWAIT is a per-call option, whereas O_NONBLOCK is a setting on the open file description (see open(2)), which will affect all threads in the calling process and as well as other processes that hold file descriptors referring to the same open file description.
"""

O_NONBLOCK affects other processes, MSG_DONTWAIT it seems it doesn't, may be that's the issue?

In any case, it seems we can also remove MSG_DONTWAIT since O_NONBLOCK already does the same (and more). And since we were already ignoring the async return codes, we can keep doing that or try 2 or 3 times if we get EAGAIN?
Comment 10 Aleix Conchillo Flaqué 2018-10-10 20:11:51 UTC
Created attachment 373889 [details] [review]
use non-blocking socket (async)

Remove MSG_DONTWAIT and take care of retries.
Comment 11 Olivier Crête 2018-10-10 20:12:27 UTC
I used MSG_DONTWAIT because the receiver has code to retry (it blocks on a select instead), so it will retry if nothing is received. But the sender doesn't have this, if we really need to make it non blocking, then we need to add some code to retry later and make the whole thing asynchronous, which I'm afraid a big refactoring. This is why I'd rather we find why it blocks.. It's probably because the receiver doesn't have a receiving thread at this point.

Does it always block in sp_writer_send_buf() called from gst_shm_sink_render() for you ? If that's the case, I guess we can add a poll() in there, with a sp_writer_set_flushing() type call that can unblock the send.
Comment 12 Aleix Conchillo Flaqué 2018-10-10 20:18:42 UTC
(In reply to Olivier Crête from comment #11)
> I used MSG_DONTWAIT because the receiver has code to retry (it blocks on a
> select instead), so it will retry if nothing is received. But the sender
> doesn't have this, if we really need to make it non blocking, then we need
> to add some code to retry later and make the whole thing asynchronous, which
> I'm afraid a big refactoring. This is why I'd rather we find why it blocks..
> It's probably because the receiver doesn't have a receiving thread at this
> point.
> 

I see. Forget about my lastest patch, it's just a crappy hack.

> Does it always block in sp_writer_send_buf() called from
> gst_shm_sink_render() for you ? If that's the case, I guess we can add a
> poll() in there, with a sp_writer_set_flushing() type call that can unblock
> the send.

Yes, it always blocks there. I'll see if I can find out more.
Comment 13 GStreamer system administrator 2018-11-03 14:35:39 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/issues/797.