Bug 775495 – Deadlock when using shmsrc+decodebin+videomixer

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 775495 - Deadlock when using shmsrc+decodebin+videomixer


Summary:	Deadlock when using shmsrc+decodebin+videomixer


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-12-01 22:40 UTC by Sean-Der
Modified:	2018-11-03 11:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
source code (746 bytes, text/plain) 2016-12-01 22:40 UTC, Sean-Der	Details
bt across all threads (5.38 KB, text/plain) 2016-12-01 22:50 UTC, Sean-Der	Details
bt across all shmsink process threads (5.89 KB, text/plain) 2016-12-02 02:29 UTC, Sean-Der	Details
bt for when GstShmSrc has a free on a different thread (5.79 KB, text/plain) 2016-12-02 04:29 UTC, Sean-Der	Details
sink bt (8.13 KB, text/plain) 2016-12-05 03:06 UTC, Sean-Der	Details
src bt (4.27 KB, text/plain) 2016-12-05 03:06 UTC, Sean-Der	Details

Description Sean-Der 2016-12-01 22:40:17 UTC

Created attachment 341182 [details]
source code

I have a long running parent process that consumes mkv from multiple temporary processes using shmsrc and then combines them via videomixer. If there is a better way to do this I would love to hear as well. I also am not sure which element is the culprit for the bug, if you swap/remove any of them the issues doesn't happen. 

Sometimes when the parent process is overloaded it will deadlock. I am able to reproduce the deadlock by creating an artificial pause in the handoff of an identity. Once the fpsdisplaysink stops printing I can attach via gdb and see where everything is stuck.

I have attached the source code and the bt

The shmsink for the example is 
    gst-launch-1.0 videotestsrc ! matroskamux ! shmsink wait-for-connection=true socket-path=/tmp/foobar

Comment 1 Sean-Der 2016-12-01 22:50:05 UTC

Created attachment 341183 [details]
bt across all threads

Comment 2 Olivier Crête 2016-12-01 23:55:10 UTC

(In reply to Sean-Der from comment #1)
> Created attachment 341183 [details]
> bt across all threads

I assume this is the bt on the videomixer side? It seems like the pipe to talk to the sender is full.. As if the sender was deadlocked.

Comment 3 Sean-Der 2016-12-02 02:29:49 UTC

Created attachment 341193 [details]
bt across all shmsink process threads

Comment 4 Sean-Der 2016-12-02 02:42:17 UTC

Hi Olivier! thanks for the quick response

Yes that is the bt on the videomixer side, I attached a bt from the other process. Is there anymore debugging I can/should do? I would love to solve this myself, but barely understand the problem myself.

thanks

Comment 5 Sean-Der 2016-12-02 04:29:15 UTC

Created attachment 341209 [details]
bt for when GstShmSrc has a free on a different thread

It looks like both processes are doing a blocking send at the same time. The shmsink is doing a send of a buffer, and the shmsrc is notifying for a free.

I tried putting the shmsrc on its own thread (so that it could do a recv and not be blocked by the hung free) but now it is hung on a g_mutex_lock in the attached bt

Comment 6 Sean-Der 2016-12-03 04:07:16 UTC

Olivier, 
(This is from me reading the code for a day, everything is probably wrong below)

So I am pretty sure that is the issue, and it is just exacerbated by me using matroska (I usually get hung because of the lots of little frees for ebml). If I use raw+gdppay and the appropriate queues I can't get a hang.

Right now there is a lock around the pipe so a process can only send/recv one at a time, what do you think about have a mutex for reading and then one for writing?

Also, what purpose does notifying the shmsink about an unref serve? Is there anything I could put behind a property that might disable features, but fix my case?

I would love to try and fix this myself (good chance to learn the code, and don't have to take someone elses time) but have no idea what the idiomatic solution would be.

Comment 7 Olivier Crête 2016-12-05 02:08:46 UTC

(In reply to Sean-Der from comment #6)
> Right now there is a lock around the pipe so a process can only send/recv
> one at a time, what do you think about have a mutex for reading and then one
> for writing?

No, the reading and writing should be instantenous unless there is something wrong.

> Also, what purpose does notifying the shmsink about an unref serve? Is there
> anything I could put behind a property that might disable features, but fix
> my case?

So that the shmsink can re-use the memory, the memory can't be re-used at the sender side before the receiver side is done with it.


Maybe you just need to increase the size of the shared memory area by increasing the "shm-size" property on shmsink.

Comment 8 Sean-Der 2016-12-05 03:05:11 UTC

> Maybe you just need to increase the size of the shared memory area by 
> increasing the "shm-size" property on shmsink.

The following sink still deadlocks at around ~200 rendered (each run varies -/+ 100) using the source code I attached
  gst-launch-1.0 videotestsrc ! matroskamux ! queue ! shmsink shm-size=524288000 socket-path=/tmp/foobar

I will attached the bt of the src/sink also

Comment 9 Sean-Der 2016-12-05 03:06:03 UTC

Created attachment 341372 [details]
sink bt

Comment 10 Sean-Der 2016-12-05 03:06:51 UTC

Created attachment 341373 [details]
src bt

Comment 11 Sean-Der 2016-12-09 21:10:37 UTC

Hey Olivier!

Any chance to look at the newest backtraces? I definitely don't know as much as you, but it looks like both sides are waiting on the send. (Thread 4 sink bt) (Thread 3 src bt)

As a really minimal patch I was thinking of putting a semaphore around sending, and if the other side is sending already release the lock on the object (allowing the recv to happen) and just sleep the thread for some arbitrary amount of microseconds. 

If you have any other explanation/ideas for the backtrace I am seeing I would love to know but increasing the shm-size didn't have any change for me (~30 runs)

thanks

Comment 12 Olivier Crête 2016-12-09 21:45:12 UTC

it doesn't reproduce on my computer with my test, but since it's a race, I'm no so surprised. One possibility is that the receiver thread releases so many threads at the same time that it fulls the pipe. I think to fix this, we need to release the mutex when blocking on the pipe. Which probably means make it non blocking and blocking a poll()...

Comment 13 GStreamer system administrator 2018-11-03 11:52:07 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/314.