GNOME Bugzilla – Bug 741854
omxvideodec: potential deadlock in acquire_buffer() when starting decoding
Last modified: 2016-07-06 09:14:37 UTC
Created attachment 293176 [details] [review] bugfix Hello, There is a race condition in gst_omx_port_acquire_buffer() when decoding starts. This bug has been observed during overnight continuous video loop runs on a raspberry pi, where each loop iteration starts a new playbin and sets it to play a short h264 video file. It doesn't take longer than 24 hours for the deadlock to occur. Patch attached. My commit message (below for convenience) has the full details. ------ gstomx: call handle_messages() only once in acquire_buffer() to avoid potential deadlock There is one rare case where calling handle_messages() more than once can cause a deadlock in the video decoder element: - sink pad thread starts the src pad task (gst_omx_video_dec_loop()) - _video_dec_loop() calls gst_omx_port_acquire_buffer() on dec_out_port - blocks in gst_omx_component_wait_message() releasing comp->lock and comp->messages_lock (initially, there are no buffers configured on that port, so it waits for OMX_EventPortSettingsChanged) - the sink pad thread pushes a buffer to the decoder with gst_omx_port_release_buffer() - _release_buffer() grabs comp->lock and sends the buffer to OMX, which consumes it immediately - EmptyBufferDone gets called at this point, which signals _wait_message() to unblock - the message from EmptyBufferDone is processed in gst_omx_component_handle_messages() called from gst_omx_port_release_buffer() - gst_omx_port_release_buffer releases comp->lock - the src pad thread now gets to run, grabbing comp->lock while it exits from _wait_message() - _acquire_buffer() calls the _handle_messages() on the next line after _wait_message(), which does nothing (no pending messages) - then it goes to "retry:" and calls _handle_messages() again, which also does nothing (still no pending messages) - scheduler switches to a videocore thread that calls EventHandler, informing us about the OMX_EventPortSettingsChanged event that just arrived - EventHandler graps comp->messages_lock, but not comp->lock, so it can run in parallel at this point just fine. - scheduler switches back to the src pad thread (which is in the middle of _acquire_buffer()) - the next _handle_messages() which is right before if (g_queue_is_empty (&port->pending_buffers)) processes the OMX_EventPortSettingsChanged - the buffer queue is still empty, so that thread blocks again in _wait_message() - the sink pad thread tries to acquire the next input port buffer - _acquire_buffer() also blocks this thread in: if (comp->pending_reconfigure_outports) { ... _wait_message() ... } - DEADLOCK. gstreamer is waiting for omx to do something, omx waits for gstreamer to do something. By removing those extra _handle_messages() calls, we can ensure that all the checks of _acquire_buffer() will re-run. In the above case, after the scheduler switches back to the middle of _acquire_buffer(), the code will enter _wait_message(), which will see that there are pending messages and will return immediately, going back to "retry:" and re-doing all the checks properly. ------
Created attachment 293177 [details] [review] bugfix Whoops, I just noticed that my previous patch does not apply cleanly to master due to another local commit. Here is a version that applies to master.