GNOME Bugzilla – Bug 787290
omxvideodec: Stuck after several seeks on Raspberry Pi3
Last modified: 2018-11-03 13:01:38 UTC
On Raspberry Pi3, a QuickTime movie (h264, AAC) can be played by omxh264dec (glimagesink, alsasink). However, once PAUSED, then after several flushing seek (without PLAYING), the pipeline stuck within gst_element_seek(). This is well reproducible. -------------------------------------- (gdb) info th Id Target Id Frame 21 Thread 0x68eff450 (LWP 1348) "audiosink-ringb" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 20 Thread 0x698ff450 (LWP 1347) "omxh264dec-omxh" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 19 Thread 0x6a21b450 (LWP 1346) "ILCS_HOST" 0x76a7ca40 in do_futex_wait ( isem=isem@entry=0x6cf16c7c) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 18 Thread 0x6aa1b450 (LWP 1345) "multiqueue0:src" 0x76a7a7a4 in __pthread_cond_wait (cond=cond@entry=0x6cf16ae8, mutex=mutex@entry=0x6cf16acc) at pthread_cond_wait.c:187 17 Thread 0x6b2ea450 (LWP 1344) "multiqueue0:src" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 16 Thread 0x6bbdd450 (LWP 1343) "multiqueue0:src" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 15 Thread 0x6c3dd450 (LWP 1342) "qtdemux0:sink" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 14 Thread 0x6cdff450 (LWP 1341) "aqueue:src" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 13 Thread 0x6d7ff450 (LWP 1340) "typefind:sink" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 * 12 Thread 0x6dfff450 (LWP 1339) "vqueue:src" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:37 11 Thread 0x6e9ff450 (LWP 1338) "gstglcontext" 0x7670fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 10 Thread 0x6f376450 (LWP 1337) "gldisplay-event" 0x7670fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 9 Thread 0x6fc77450 (LWP 1336) "Valencia" 0x76a7a7a4 in __pthread_cond_wait (cond=0x21f780, mutex=0x21f7b4) at pthread_cond_wait.c:187 8 Thread 0x731e3450 (LWP 1335) "Valencia" 0x76712964 in select () at ../sysdeps/unix/syscall-template.S:81 7 Thread 0x739e3450 (LWP 1334) "Valencia" 0x76a7a7a4 in __pthread_cond_wait (cond=0x1ada90, mutex=0x1adac4) at pthread_cond_wait.c:187 6 Thread 0x743ff450 (LWP 1333) "Valencia" 0x7670fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x74bff450 (LWP 1332) "HCEC Notify" 0x76a7ca40 in do_futex_wait ( isem=isem@entry=0x769335ec <cecservice_notify_available_event+24>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 4 Thread 0x753ff450 (LWP 1331) "HTV Notify" 0x76a7ca40 in do_futex_wait ( isem=isem@entry=0x76932864 <tvservice_notify_available_event+24>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 3 Thread 0x75bff450 (LWP 1330) "HDispmanx Notif" 0x76a7ca40 in do_futex_wait (isem=isem@entry=0x769336e8 <dispmanx_notify_available_event+24>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48 2 Thread 0x7656f450 (LWP 1329) "VCHIQ completio" 0x76711f2c in ioctl () at ../sysdeps/unix/syscall-template.S:81 1 Thread 0x76ff1000 (LWP 1328) "Valencia" 0x76a7d780 in __lll_lock_wait ( futex=futex@entry=0x22f2b8, private=<optimized out>) at ../ports/sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c:46 -------------------------------------- Below is a backtrace of main thread (issuing gst_element_seek()) It is just after GST_EVENT_FLUSH_START is forwarded on video stream at multiqueue. --------------------------------------
+ Trace 237927
It seems to me that omxvideodec and its ancestors (GstVideoDecoder and GstElement) does not handle GST_EVENT_FLUSH_START. Can this problem be fixed if omxvideodec handles GST_EVENT_FLUSH_START ?
Please, have a look at flush() virtual function, which is triggered by the flushing sequence. Though, this might not be sufficient for threaded decoders (see v4l2videodec as an example).
1. I have tried a Pad Probe on src of GstOMXVideoDec. It forwards GST_EVENT_FLUSH_START, then call klass->flush(). Unfortunately, this makes GstOMXVideoDec malfunction. 2. I have tried to modify gstvideodecoder.c (just for convenience). In gst_video_decoder_sink_event_default(), I added a handler for GST_EVENT_FLUSH_START. It does same as GST_EVENT_FLUSH_STOP. ------------------------------------------------------- case GST_EVENT_FLUSH_START: { ret = gst_video_decoder_push_event (decoder, event); GList *l; GST_VIDEO_DECODER_STREAM_LOCK (decoder); for (l = priv->frames; l; l = l->next) { GstVideoCodecFrame *frame = l->data; frame->events = _flush_events (decoder->srcpad, frame->events); } priv->current_frame_events = _flush_events (decoder->srcpad, decoder->priv->current_frame_events); gst_video_decoder_flush (decoder, TRUE ); GST_VIDEO_DECODER_STREAM_UNLOCK (decoder); return TRUE; } ------------------------------------------------------- This prevents infinite stuck in gst_element_seek(). Unfortunately, this makes some memory corruption (finally crash). 3. I changed argument of gst_video_decoder_flush() to not "hard". ------------------------------------------------------- case GST_EVENT_FLUSH_START: { ret = gst_video_decoder_push_event (decoder, event); GList *l; GST_VIDEO_DECODER_STREAM_LOCK (decoder); for (l = priv->frames; l; l = l->next) { GstVideoCodecFrame *frame = l->data; frame->events = _flush_events (decoder->srcpad, frame->events); } priv->current_frame_events = _flush_events (decoder->srcpad, decoder->priv->current_frame_events); gst_video_decoder_flush (decoder, FALSE /*TRUE*/ ); GST_VIDEO_DECODER_STREAM_UNLOCK (decoder); return TRUE; } ------------------------------------------------------- This works ALMOST fine. After many successive seek() operations, the pipeline stucks for 5 seconds (NOT infinite!), then log message is shown. ------------------------------------------------------- 0:01:00.216696122 29704 0x6c602030 ERROR omx gstomx.c:1978:gst_omx_port_wait_buffers_released_unlocked:<omxh264dec-omxh264dec2> Timeout waiting for egl_render port 221 to release all buffers 0:01:00.217750909 29704 0x20fac0 ERROR omx gstomx.c:836:gst_omx_component_set_state:<omxh264dec-omxh264dec2> Last operation returned an error. Setting last_error manually. 0:01:00.217836117 29704 0x20fac0 ERROR omx gstomx.c:845:gst_omx_component_set_state:<omxh264dec-omxh264dec2> Error setting egl_render state from 4 to 3: Insufficient resources (0x80001000) 0:01:00.217935075 29704 0x20fac0 ERROR omx gstomx.c:872:gst_omx_component_get_state:<omxh264dec-omxh264dec2> Component egl_render in error state: Insufficient resources (0x80001000) 0:01:00.218269448 29704 0x20fac0 ERROR omx gstomx.c:1490:gst_omx_port_set_flushing:<omxh264dec-omxh264dec2> Component egl_render is in error state: Insufficient resources (0x80001000) 0:01:00.218475437 29704 0x20fac0 ERROR omx gstomx.c:1490:gst_omx_port_set_flushing:<omxh264dec-omxh264dec2> Component egl_render is in error state: Insufficient resources (0x80001000) 0:01:00.219939493 29704 0x20fac0 ERROR omx gstomx.c:872:gst_omx_component_get_state:<omxh264dec-omxh264dec2> Component egl_render in error state: Insufficient resources (0x80001000) 0:01:00.221063759 29704 0x20fac0 ERROR omx gstomx.c:845:gst_omx_component_set_state:<omxh264dec-omxh264dec2> Error setting egl_render state from 4 to 3: Insufficient resources (0x80001000) 0:01:00.221159123 29704 0x20fac0 ERROR omx gstomx.c:872:gst_omx_component_get_state:<omxh264dec-omxh264dec2> Component egl_render in error state: Insufficient resources (0x80001000) 0:01:00.221491309 29704 0x20fac0 ERROR omx gstomx.c:1490:gst_omx_port_set_flushing:<omxh264dec-omxh264dec2> Component egl_render is in error state: Insufficient resources (0x80001000) 0:01:00.221705266 29704 0x20fac0 ERROR omx gstomx.c:1490:gst_omx_port_set_flushing:<omxh264dec-omxh264dec2> Component egl_render is in error state: Insufficient resources (0x80001000) 0:01:00.225268271 29704 0x6c602030 ERROR omx gstomx.c:1263:gst_omx_port_acquire_buffer:<omxh264dec-omxh264dec2> Component egl_render is in error state: Insufficient resources ------------------------------------------------------- I allocated 512MB for VRAM. bcmstat.sh shows free VRAM around 80%. I think there are enough free VRAM.
I ended up with following modifications. I hope original authors and other experts verify this, and consider improvements. "gstglimagesink.c" In gst_glimage_sink_event(), add handler for GST_EVENT_FLUSH_START . (almost same as GST_STATE_CHANGE_PAUSED_TO_READY in gst_glimage_sink_change_state()) I think this prevents deadlock in gst_omx_video_dec_loop() by releasing output port. (also set "enable-last-sample" to FALSE) ------------------------------------- case GST_EVENT_FLUSH_START: { GstBuffer *buf[2]; GST_GLIMAGE_SINK_LOCK (gl_sink); /* mark the redisplay_texture as unavailable (=0) * to avoid drawing */ gl_sink->redisplay_texture = 0; buf[0] = gl_sink->stored_buffer[0]; buf[1] = gl_sink->stored_buffer[1]; gl_sink->stored_buffer[0] = gl_sink->stored_buffer[1] = NULL; gl_sink->stored_sync_meta = gl_sink->next_sync_meta = NULL; if (gl_sink->stored_sync) gst_buffer_unref (gl_sink->stored_sync); gl_sink->stored_sync = NULL; // GST_GLIMAGE_SINK_UNLOCK (gl_sink); gst_buffer_replace (buf, NULL); gst_buffer_replace (buf + 1, NULL); gst_object_replace ((GstObject **) & gl_sink->convert_views, NULL); gst_buffer_replace (&gl_sink->input_buffer, NULL); gst_buffer_replace (&gl_sink->input_buffer2, NULL); gst_buffer_replace (&gl_sink->next_buffer, NULL); gst_buffer_replace (&gl_sink->next_buffer2, NULL); gst_buffer_replace (&gl_sink->next_sync, NULL); GST_GLIMAGE_SINK_UNLOCK (gl_sink); } break; ------------------------------------- "gstomxvideodec.c" Add sink_event() handler. I think this prevents deadlock in gst_omx_video_dec_handle_frame() by releasing input port. Suppress port reconfiguration error while flushing. (occasionally port reconfiguration event occur, and fails with OMX_ErrorIncorrectStateOperation) ------------------------------------- gst_omx_video_dec_class_init() { video_decoder_class->sink_event = GST_DEBUG_FUNCPTR (gst_omx_video_dec_sink_event); } gst_omx_video_dec_loop() { reconfigure_error: { if (!GST_PAD_IS_FLUSHING (GST_VIDEO_DECODER_SINK_PAD (self))) { GST_ELEMENT_ERROR (self, LIBRARY, SETTINGS, (NULL), ("Unable to reconfigure output port")); } } } static gboolean gst_omx_video_dec_sink_event (GstVideoDecoder *decoder, GstEvent *event) { gboolean ret; ret = GST_VIDEO_DECODER_CLASS(gst_omx_video_dec_parent_class)->sink_event (decoder, event); switch (GST_EVENT_TYPE(event)) { case GST_EVENT_FLUSH_START: { GstVideoDecoderClass *klass; klass = GST_VIDEO_DECODER_GET_CLASS (decoder); { GST_VIDEO_DECODER_STREAM_LOCK (decoder); klass->flush(decoder); GST_VIDEO_DECODER_STREAM_UNLOCK (decoder); } } break; default: break; } return ret; } -------------------------------------
Thanks for investigating this, but please attach your proposed/attempted modifications in form of patches (in "git format-patch" format). Usually elements should not do very much in FLUSH_START other than forward the event, as FLUSH_START is an out-of-band event that will pass through the pipeline while streaming is still going on in the streaming thread(s). Elements typically do clean-up/resets on FLUSH_STOP, which is sent once streaming has stopped.
I will try to study about patch. According to "GStreamer design documents : Events" https://gstreamer.freedesktop.org/documentation/design/events.html ----------------------------------------- Flushing happens in two stages. a source element sends the FLUSH_START event to the downstream peer element. The downstream element starts rejecting buffers from the upstream elements. It sends the flush event further downstream and discards any buffers it is holding as well as return from the chain function as soon as possible. This makes sure that all upstream elements get unblocked. This event is not synchronized with the STREAM_LOCK and can be done in the application thread. ----------------------------------------- So I think it is good that gstglimagesink releases buffers on GST_EVENT_FLUSH_START.
Created attachment 359428 [details] [review] Patch proposal
Review of attachment 359428 [details] [review]: ::: omx/gstomxvideodec.c @@ +2453,3 @@ + klass = GST_VIDEO_DECODER_GET_CLASS (decoder); + { + GST_VIDEO_DECODER_STREAM_LOCK (decoder); This is not correct, the FLUSH_START event need to be propagated before you can take the stream lock. That's why the base class will call flush on FLUSH_STOP event, which hold the stream-lock.
My impression is that this issue is caused by https://bugzilla.gnome.org/show_bug.cgi?id=786175 (GL Element missing drain support). This affect renegotation for all sort of zero-copy method. I have tested it for DMABuf, and I can't see how this would be different / better for OMX EGLImage mode.
Note that in your case, it's not just the drain, glimagesink (and all other gl base sink), should figure-out a way to copy the redraw texture in order to release the EGLImage without affecting redraws.
Thanks for comments. -------------------------------- This is not correct, the FLUSH_START event need to be propagated before you can take the stream lock. That's why the base class will call flush on FLUSH_STOP event, which hold the stream-lock. -------------------------------- In my modification, GST_EVENT_FLUSH_START event is propagated before LOCK and FLUSH. Similar pattern is seen in gst_rtp_rtx_send_sink_event(). --------------------------------- case GST_EVENT_FLUSH_START: gst_pad_push_event (rtx->srcpad, event); gst_rtp_rtx_send_set_flushing (rtx, TRUE); gst_pad_pause_task (rtx->srcpad); return TRUE; --------------------------------- LOCK is used in gst_rtp_rtx_send_set_flushing()
Also, gst_multi_queue_sink_event() --------------------------------- case GST_EVENT_FLUSH_START: GST_DEBUG_OBJECT (mq, "SingleQueue %d : received flush start event", sq->id); res = gst_pad_push_event (sq->srcpad, event); gst_single_queue_flush (mq, sq, TRUE, FALSE); goto done; --------------------------------- LOCK is used in gst_single_queue_flush() So I think behavior of glimagesink should be reconsidered.
(In reply to Yuji Kuwabara from comment #10) > Thanks for comments. > > -------------------------------- > This is not correct, the FLUSH_START event need to be propagated before you > can take the stream lock. That's why the base class will call flush on > FLUSH_STOP event, which hold the stream-lock. > -------------------------------- > > In my modification, GST_EVENT_FLUSH_START event is propagated before LOCK > and FLUSH. > Similar pattern is seen in gst_rtp_rtx_send_sink_event(). > --------------------------------- > case GST_EVENT_FLUSH_START: > gst_pad_push_event (rtx->srcpad, event); > gst_rtp_rtx_send_set_flushing (rtx, TRUE); > gst_pad_pause_task (rtx->srcpad); > return TRUE; > --------------------------------- > LOCK is used in gst_rtp_rtx_send_set_flushing() This is not similar pattern since it does not take the stream lock. It only triggers to stop.
(In reply to Yuji Kuwabara from comment #11) > Also, > > gst_multi_queue_sink_event() > --------------------------------- > case GST_EVENT_FLUSH_START: > GST_DEBUG_OBJECT (mq, "SingleQueue %d : received flush start event", > sq->id); > > res = gst_pad_push_event (sq->srcpad, event); > > gst_single_queue_flush (mq, sq, TRUE, FALSE); > goto done; > --------------------------------- > LOCK is used in gst_single_queue_flush() > > > So I think behavior of glimagesink should be reconsidered. A fast mutex is fine, the recursive stream-lock is not.
Thanks to point out. At this time, I cannot achieve a fix in the form of formal and everyone's satisfaction. I will go with this as private local modification. I hope someone manage this problem in future.
Can you share gst logs GST_DEBUG=*omx*:6,*videodec*:6 ? Are you using latest gst-omx from git ?
Source from https://gstreamer.freedesktop.org/src/gst-omx/gst-omx-1.12.2.tar.xz If it is good, I will post a log within few days.
(In reply to Yuji Kuwabara from comment #16) > Source from > https://gstreamer.freedesktop.org/src/gst-omx/gst-omx-1.12.2.tar.xz Some improvements have been made since then, so to check if it is already fixed please try: git clone git://anongit.freedesktop.org/gstreamer/gst-omx and change gst-omx/configure.ac line 51, GST_REQ=1.13.0.1 -> GST_REQ=1.12.0.1 > I will post a log within few days. Thx, and also the pipeline you use. Try to make it simple (without audio): GST_OMX_CONFIG_DIR=path_to_gstomx.conf gst-play-1.0 h264_video.mp4 --videosink=glimagesink --flags=video-audio-text Use left and right arrow to seek, space to pause.
Thanks for detailed instruction. I could installed omx-1.13 from git clone as of 2017/9/15. Other Gst components are 1.12.2 without any modification. I used option for gst-play-1.0 as --flags=0x01 (text representation was not accepted). Unfortunately, no improvement is seen for this problem. For this movie file: http://mirror.cessen.com/blender.org/peach/trailer/trailer_1080p.mov Infinite deadlock was reproduced. test_trailer.log Threads info and some back traces at this point. test_trailer.txt For this movie file: http://www.sample-videos.com/video/mp4/480/big_buck_bunny_480p_5mb.mp4 5 seconds deadlock was reproduced. test_Sample2.log Reproducing is not easy. Repeat seek back and forth quickly.
Created attachment 359880 [details] Log
Created attachment 359881 [details] Backtrace at deadlock
Created attachment 359882 [details] Log 5sec deadlock
Hi Yuji, thx a lot for the 2 videos links, the repro steps and the logs. I could not see anything obvious from the logs unfortunately. Also I could not reproduce it on Desktop using Tizonia. Guillaume that would be great if you could to try on zynqultrascaleplus ? Just use the videos in comment 18 and gst-play --flags=0x01 --videosink=glimagesink and do some seeking. Thx!
Thanks and I'm sorry to trouble you. It may be a problem specific to RaspberryPi platform. I'm planning to move to other hardware. I will check later.
(In reply to Julien Isorce from comment #22) > Guillaume that would be great if you could to try on zynqultrascaleplus ? > Just use the videos in comment 18 and gst-play --flags=0x01 > --videosink=glimagesink and do some seeking. Thx! From a quick testing it does hang on zynqultrascaleplus as well. Unfortunately I don't really have time to debug it now so the root cause may be different.
Thx Guillaume. Seems like there is a real problem. Marking as New so.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-omx/issues/17.