GNOME Bugzilla – Bug 759043
gst-omx: Timeout waiting for egl_render port 221 buffer release triggers subsequent "Insufficient resources" errors
Last modified: 2017-12-14 09:25:26 UTC
Created attachment 316786 [details] Our pipeline We're working on a Raspberry Pi 2. We have an OMXH264Dec element configured to output EGL buffers (see attached pipeline), which means that internally there are two OpenMAX components playing together, video_decode and egl_render[1]. They're not tunneled, according to the values shown by gdb. During the execution of one of our tests (VideoChangeRate from YouTube 2015, on WebKitForWayland), a call to gst_omx_video_dec_set_format()[2] is performed. That call performs several flushes on the OpenMAX component ports and, at some point [3], it disables the egl_render output port (221) and waits for the buffers currently held by the port to be released. Examining the internal GstOMX data structure[5] in gdb, there's still one buffer owned by the port (pending). The OpenMAX IL spec [4] says that the driver should call EmptyBufferDone or FillBufferDone for each buffer it has to return to us, then we free the buffer and the port disabling finishes. The problem is that we're not receiving the Empty/FillBufferDone calls, so the call to gst_omx_port_wait_buffers_released() on the out_port (ie: the egl_render 221 port) times out (even if I set to 10 seconds). This causes an early exit of the set_format function, prevents the deallocation of the buffers in the video_decode OMX component and the re-enabling of the OMX ports. This leads to subsequent "Insufficient resources" errors in the involved components and leaves the pipeline completely unusable (see attached log fragment). No matter how deep I investgate, I don't understand why this is happening. However, I've observed that just by preventing the early return after the waiting for buffers failure, the issue is fixed. [1] http://home.nouwen.name/RaspberryPi/documentation/ilcomponents/index.html [2] http://cgit.freedesktop.org/gstreamer/gst-omx/tree/omx/gstomxvideodec.c#n1819 [3] http://cgit.freedesktop.org/gstreamer/gst-omx/tree/omx/gstomxvideodec.c#n1902 [4] https://www.khronos.org/registry/omxil/specs/OpenMAX_IL_1_1_2_Specification.pdf (diagram on top of page 131) [5] http://cgit.freedesktop.org/gstreamer/gst-omx/tree/omx/gstomx.h#n207
Created attachment 316787 [details] Log extract with some "###" comments on selected lines
Created attachment 316792 [details] [review] Proposed patch This patch fixes the problem for me. It's the best I can contribute without knowing the exact cause of the problem. Suggestions are welcome.
Hi Enrique, FWIW, I remember omx won't trigger Empty/FillBufferDone until input pad has been feed with enough data. (also just a remark, video_decode and egl_render should be tunneled, something is wrong if you can see the opposite) Also worth to see if one of the last patch here https://git.collabora.com/cgit/raspberry-pi/gst-omx.git/log/?h=rpi-1.0.0.1 applies to you. (Also worth to mention, if you are still using a built-in videosink, don't forget to handle the drain query, see https://git.collabora.com/cgit/raspberry-pi/WebKit.git/tree/Source/WebCore/platform/graphics/gstreamer/VideoSinkGStreamer.cpp?id=052e962a770e7c2ee13ae64c858c67d2cc15c0cb#n629)
Changing the timeout to something higher might make sense though. The buffers might be used downstream still, and will only be released once downstream is done with it. Which might take a long time, especially if the pipeline is currently in PAUSED.
Review of attachment 316792 [details] [review]: ::: omx/gstomxvideodec.c @@ +1910,2 @@ return FALSE; +#endif Can you do something like: timeout = use_eglimage ? N : 1; It is expected to take much time when using eglimages. Maybe N=2 or 3 will be ok. If you need N > 4 then I suspect that something else is wrong.
Forget about my last comment (#5), I re-read what you said: "times out (even if I set to 10 seconds)". Also it rings some bells and I encoutered a similar issue recently for which I have a patch pending for some times. Let me attach it.
Created attachment 356020 [details] [review] omx{audio,video}{dec,enc}: sequentially disable ports because buffers are not shared Please try attached patch which makes change the disable ports from parallel to sequential. The former is only required for buffer sharing. For a decoder the buffer cannot be shared between input and output port I made this patch to be more compliante regarding the specs. And because now in 1.2.0 spec the second disable will even return an error directly (except for buffer sharing). So it is required to be done sequentially in IL 1.2.0. But I think in your case the parallel disable is confusing the underlying omx platform when it excepts a sequential disable. This could explain the bug you encountered. Any chance you can try ?
Created attachment 356844 [details] Test case results Hi, Julien, thanks a lot for the effort of caring about this bug and for taking the time to write your patch. The patch didn't apply cleanly, so I had to reapply the changes by hand. Then I tested again this issue with the patch I originally submitted, without it, and without it plus Julien's patch. It seems that the issues persist with Julien's patch. See the specific errors in the attached report. I don't remember about the exact situation that triggered the problem originally, but this time I've done several seek operations (without problems) and then just let the video finish by itself (problematic). I'm using GStreamer 1.10.4 (including gst-omx 1.10.4).
Thx for checking. Could you explain why in the attached "Test case results" you said to seek near the end of the video, i.e. 4.08 (the video has a duration of 4:14) but then in log the error happens around 0.36 ? Also note that the lines for which your patch applies, should not be triggered on EOS because it is inside set_format and guarded by 'is_format_changed'. Could you attach full log so that we can see why set_format is called at the end ? Is it possible to run the test you mention in comment 0 ? (VideoChangeRate from https://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2016.html)
Created attachment 358136 [details] New test case results First of all, the original error reported by the bug isn't reproducible anymore on the VideoChangeRate test from YouTube 2015[1] nor YouTube 2016[2] MSE tests using gst-omx 1.10.4. Something might have changed in our WebKit implementation or in gst-omx since the bug was reported. However, the patch I originally proposed is still relevant. Without it, the call to gst_omx_video_dec_set_format() still fails in the middle of a video (few times) or at the end of it (more frequently). > Could you explain why in the attached "Test case results" you said to seek near > the end of the video, i.e. 4.08 (the video has a duration of 4:14) but then in > log the error happens around 0.36 ? 0.36 is the time the application has been running. I seeked to 4.08 as soon as I started the test, so didn't wait 4 minutes for the player to reach that position by itself. I'm attaching a new log, where I seeked to 4:07 (movie time) at ~27 GStreamer run time. Selected log lines follow: 0:00:27.120713323 569 0x71102380 DEBUG videodecoder gstvideodecoder.c:1396:gst_video_decoder_sink_event:<omxh264dec-omxh264dec0> received event 2563, flush-start 0:00:27.272474729 569 0x1872430 DEBUG videodecoder gstvideodecoder.c:849:gst_video_decoder_push_event:<omxh264dec-omxh264dec0> segment time segment start=0:04:07.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x01, time=0:04:07.000000000, base=0:00:00.000000000, position 0:04:07.000000000, duration 99:99:99.999999999 Then, at some point there's a change in the caps (only framerate) which causes an error: 0:00:28.375073270 569 0x174c090 DEBUG videodecoder gstvideodecoder.c:703:gst_video_decoder_setcaps:<omxh264dec-omxh264dec0> Checking if caps changed old video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, level=(string)4, profile=(string)high, width=(int)1920, height=(int)1012, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)50000/1677, parsed=(boolean)true new video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, level=(string)4, profile=(string)high, width=(int)1920, height=(int)1012, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)89/3, parsed=(boolean)true 0:00:32.633606862 569 0x174c090 ERROR omx gstomx.c:1978:gst_omx_port_wait_buffers_released_unlocked:<omxh264dec-omxh264dec0> Timeout waiting for egl_render port 221 to release all buffers 0:00:32.633974987 569 0x174c090 WARN videodecoder gstvideodecoder.c:745:gst_video_decoder_setcaps:<omxh264dec-omxh264dec0> Subclass refused caps And the errors reappear since then, putting the pipeline in an error state and probably causing an EOS: 0:00:32.666314102 569 0x174c090 DEBUG videodecoder gstvideodecoder.c:1396:gst_video_decoder_sink_event:<omxh264dec-omxh264dec0> received event 28174, eos Suprisingly, a new segment is sent to the pipeline after that: 0:00:33.463679154 569 0x6ce04ef0 DEBUG videodecoder gstvideodecoder.c:849:gst_video_decoder_push_event:<omxh264dec-omxh264dec1> segment time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999 But the pipeline is already in an error state. [1] http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2015.html?enablewebm=false&command=run&tests=37 [2] http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2016.html?enablewebm=false&command=run&tests=37 And nothing of this happens with the patch I originally proposed.
Created attachment 365430 [details] [review] gstomxvideodec: fix framerate overflow In the attached logs we can see that the framerate is set to 50000/1677 which leads to buffer overflow when computing the framerate in a q16 format because fps_n is a gint. See commit message.
Review of attachment 365430 [details] [review]: ::: omx/gstomxvideo.c @@ +208,3 @@ + return 0; + + return (((OMX_U32) info->fps_n) << 16) / (OMX_U32) (info->fps_d); Wouln't it be even safe to expand to 64bit and decide the 64bit value to later truncate it to 32bit ?
Julien, is that patch a replacement or just an extra fix ?
(In reply to Nicolas Dufresne (stormer) from comment #13) > Julien, is that patch a replacement or just an extra fix ? I believe the exact fix for the timeout mentioned in the bug title is https://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=68518acb53d4808cf8352ed4fd1cb8b331553559 . Indeed this should allow to not timeout during the dynamic format change when using appsink like in webkit. But need confirmation from Enrique. Along the way we found that the dynamic format change should not happen. Indeed the resolution does not change, only the framerate changes but very slightly, i.e. from 50000/1677=29.81 to 89/3=29.66 so a "percentage change" of less than 1% (i.e. 100*(29.81-29.66)/29.66 = 0.50 < 1 ). So the change of the framerate should be ignored really. Even worst the framerate would overflow in the first place so let's fix that first, see attached patch. But I am going to submit a second patch to make the comparison of the framerate more permissive, i.e. do not consider the framerate to have changed if the "percentage change" is less than 1%. This future patch will just avoid an unnecessary renegotiation.
Created attachment 365448 [details] [review] gstomxvideodec: fix framerate overflow
Review of attachment 365448 [details] [review]: Good.
Created attachment 365453 [details] [review] omxvideodec: ignore very little variations of the framerate
Created attachment 365477 [details] [review] omxvideodec: ignore very little variations of the framerate
Comment on attachment 365448 [details] [review] gstomxvideodec: fix framerate overflow commit 0d2d695eff56655e4f7cb50c59fe3a6456f17e11 Author: Julien Isorce <jisorce@oblong.com> Date: Tue Dec 12 14:45:30 2017 +0000 gstomxvideodec: fix framerate overflow Some live streams can set the framerate to 50000/1677 (=29.81). GstVideoInfo.fps_n << 16 is wrong if the fps_n is 50000 (i.e. greater than 32767). https://bugzilla.gnome.org/show_bug.cgi?id=759043
Comment on attachment 365477 [details] [review] omxvideodec: ignore very little variations of the framerate commit 681087819372e57157a13fbf3ac813a3025a2a29 Author: Julien Isorce <jisorce@oblong.com> Date: Tue Dec 12 17:30:27 2017 +0000 omxvideodec: ignore very little variations of the framerate If less than 1%. The dynamic format change should not happen when the resolution does not change and when only the framerate changes but very slightly, i.e. from 50000/1677=29.81 to 89/3=29.66 so a "percentage change" of less than 1% (i.e. 100*(29.81-29.66)/29.66 = 0.50 < 1 ). In that case just ignore it to avoid unnecessary renegotiation. https://bugzilla.gnome.org/show_bug.cgi?id=759043