After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 741854 - omxvideodec: potential deadlock in acquire_buffer() when starting decoding
omxvideodec: potential deadlock in acquire_buffer() when starting decoding
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-omx
git master
Other Linux
: Normal normal
: 1.9.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-12-22 12:53 UTC by George Kiagiadakis
Modified: 2016-07-06 09:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
bugfix (3.47 KB, patch)
2014-12-22 12:53 UTC, George Kiagiadakis
none Details | Review
bugfix (3.43 KB, patch)
2014-12-22 13:00 UTC, George Kiagiadakis
committed Details | Review

Description George Kiagiadakis 2014-12-22 12:53:52 UTC
Created attachment 293176 [details] [review]
bugfix

Hello, 

There is a race condition in gst_omx_port_acquire_buffer() when decoding starts. This bug has been observed during overnight continuous video loop runs on a raspberry pi, where each loop iteration starts a new playbin and sets it to play a short h264 video file. It doesn't take longer than 24 hours for the deadlock to occur. 

Patch attached. My commit message (below for convenience) has the full details.

------
gstomx: call handle_messages() only once in acquire_buffer() to avoid potential deadlock

There is one rare case where calling handle_messages() more than once can cause a deadlock in the video decoder element:

- sink pad thread starts the src pad task (gst_omx_video_dec_loop())
- _video_dec_loop() calls gst_omx_port_acquire_buffer() on dec_out_port
- blocks in gst_omx_component_wait_message() releasing comp->lock and comp->messages_lock (initially, there are no buffers configured on that port, so it waits for OMX_EventPortSettingsChanged)
- the sink pad thread pushes a buffer to the decoder with gst_omx_port_release_buffer()
- _release_buffer() grabs comp->lock and sends the buffer to OMX, which consumes it immediately
- EmptyBufferDone gets called at this point, which signals _wait_message() to unblock
- the message from EmptyBufferDone is processed in gst_omx_component_handle_messages() called from gst_omx_port_release_buffer()
- gst_omx_port_release_buffer releases comp->lock
- the src pad thread now gets to run, grabbing comp->lock while it exits from _wait_message()
- _acquire_buffer() calls the _handle_messages() on the next line after _wait_message(), which does nothing (no pending messages)
- then it goes to "retry:" and calls _handle_messages() again, which also does nothing (still no pending messages)
- scheduler switches to a videocore thread that calls EventHandler, informing us about the OMX_EventPortSettingsChanged event that just arrived
- EventHandler graps comp->messages_lock, but not comp->lock, so it can run in parallel at this point just fine.
- scheduler switches back to the src pad thread (which is in the middle of _acquire_buffer())
- the next _handle_messages() which is right before if (g_queue_is_empty (&port->pending_buffers)) processes the OMX_EventPortSettingsChanged
- the buffer queue is still empty, so that thread blocks again in _wait_message()
- the sink pad thread tries to acquire the next input port buffer
- _acquire_buffer() also blocks this thread in: if (comp->pending_reconfigure_outports) { ... _wait_message() ... }
- DEADLOCK. gstreamer is waiting for omx to do something, omx waits for gstreamer to do something.

By removing those extra _handle_messages() calls, we can ensure that all the checks of _acquire_buffer() will re-run. In the above case, after the scheduler switches back to the middle of _acquire_buffer(), the code will enter _wait_message(), which will see that there are pending messages and will return immediately, going back to "retry:" and re-doing all the checks properly. 
------
Comment 1 George Kiagiadakis 2014-12-22 13:00:18 UTC
Created attachment 293177 [details] [review]
bugfix

Whoops, I just noticed that my previous patch does not apply cleanly to master due to another local commit. Here is a version that applies to master.