GNOME Bugzilla – Bug 775495
Deadlock when using shmsrc+decodebin+videomixer
Last modified: 2018-11-03 11:52:07 UTC
Created attachment 341182 [details] source code I have a long running parent process that consumes mkv from multiple temporary processes using shmsrc and then combines them via videomixer. If there is a better way to do this I would love to hear as well. I also am not sure which element is the culprit for the bug, if you swap/remove any of them the issues doesn't happen. Sometimes when the parent process is overloaded it will deadlock. I am able to reproduce the deadlock by creating an artificial pause in the handoff of an identity. Once the fpsdisplaysink stops printing I can attach via gdb and see where everything is stuck. I have attached the source code and the bt The shmsink for the example is gst-launch-1.0 videotestsrc ! matroskamux ! shmsink wait-for-connection=true socket-path=/tmp/foobar
Created attachment 341183 [details] bt across all threads
(In reply to Sean-Der from comment #1) > Created attachment 341183 [details] > bt across all threads I assume this is the bt on the videomixer side? It seems like the pipe to talk to the sender is full.. As if the sender was deadlocked.
Created attachment 341193 [details] bt across all shmsink process threads
Hi Olivier! thanks for the quick response Yes that is the bt on the videomixer side, I attached a bt from the other process. Is there anymore debugging I can/should do? I would love to solve this myself, but barely understand the problem myself. thanks
Created attachment 341209 [details] bt for when GstShmSrc has a free on a different thread It looks like both processes are doing a blocking send at the same time. The shmsink is doing a send of a buffer, and the shmsrc is notifying for a free. I tried putting the shmsrc on its own thread (so that it could do a recv and not be blocked by the hung free) but now it is hung on a g_mutex_lock in the attached bt
Olivier, (This is from me reading the code for a day, everything is probably wrong below) So I am pretty sure that is the issue, and it is just exacerbated by me using matroska (I usually get hung because of the lots of little frees for ebml). If I use raw+gdppay and the appropriate queues I can't get a hang. Right now there is a lock around the pipe so a process can only send/recv one at a time, what do you think about have a mutex for reading and then one for writing? Also, what purpose does notifying the shmsink about an unref serve? Is there anything I could put behind a property that might disable features, but fix my case? I would love to try and fix this myself (good chance to learn the code, and don't have to take someone elses time) but have no idea what the idiomatic solution would be.
(In reply to Sean-Der from comment #6) > Right now there is a lock around the pipe so a process can only send/recv > one at a time, what do you think about have a mutex for reading and then one > for writing? No, the reading and writing should be instantenous unless there is something wrong. > Also, what purpose does notifying the shmsink about an unref serve? Is there > anything I could put behind a property that might disable features, but fix > my case? So that the shmsink can re-use the memory, the memory can't be re-used at the sender side before the receiver side is done with it. Maybe you just need to increase the size of the shared memory area by increasing the "shm-size" property on shmsink.
> Maybe you just need to increase the size of the shared memory area by > increasing the "shm-size" property on shmsink. The following sink still deadlocks at around ~200 rendered (each run varies -/+ 100) using the source code I attached gst-launch-1.0 videotestsrc ! matroskamux ! queue ! shmsink shm-size=524288000 socket-path=/tmp/foobar I will attached the bt of the src/sink also
Created attachment 341372 [details] sink bt
Created attachment 341373 [details] src bt
Hey Olivier! Any chance to look at the newest backtraces? I definitely don't know as much as you, but it looks like both sides are waiting on the send. (Thread 4 sink bt) (Thread 3 src bt) As a really minimal patch I was thinking of putting a semaphore around sending, and if the other side is sending already release the lock on the object (allowing the recv to happen) and just sleep the thread for some arbitrary amount of microseconds. If you have any other explanation/ideas for the backtrace I am seeing I would love to know but increasing the shm-size didn't have any change for me (~30 runs) thanks
it doesn't reproduce on my computer with my test, but since it's a race, I'm no so surprised. One possibility is that the receiver thread releases so many threads at the same time that it fulls the pipe. I think to fix this, we need to release the mutex when blocking on the pipe. Which probably means make it non blocking and blocking a poll()...
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/314.