After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 759043 - gst-omx: Timeout waiting for egl_render port 221 buffer release triggers subsequent "Insufficient resources" errors
gst-omx: Timeout waiting for egl_render port 221 buffer release triggers subs...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-omx
git master
Other Linux
: Normal normal
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2015-12-04 19:17 UTC by Enrique Ocaña González
Modified: 2017-12-14 09:25 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Our pipeline (36.70 KB, application/pdf)
2015-12-04 19:17 UTC, Enrique Ocaña González
  Details
Log extract with some "###" comments on selected lines (49.85 KB, text/plain)
2015-12-04 19:19 UTC, Enrique Ocaña González
  Details
Proposed patch (1.13 KB, patch)
2015-12-04 20:09 UTC, Enrique Ocaña González
rejected Details | Review
omx{audio,video}{dec,enc}: sequentially disable ports because buffers are not shared (10.11 KB, patch)
2017-07-20 09:21 UTC, Julien Isorce
none Details | Review
Test case results (27.49 KB, text/plain)
2017-08-03 12:39 UTC, Enrique Ocaña González
  Details
New test case results (3.06 MB, text/plain)
2017-08-22 11:37 UTC, Enrique Ocaña González
  Details
gstomxvideodec: fix framerate overflow (4.38 KB, patch)
2017-12-12 14:51 UTC, Julien Isorce
none Details | Review
gstomxvideodec: fix framerate overflow (4.35 KB, patch)
2017-12-12 17:16 UTC, Julien Isorce
committed Details | Review
omxvideodec: ignore very little variations of the framerate (2.61 KB, patch)
2017-12-12 17:37 UTC, Julien Isorce
none Details | Review
omxvideodec: ignore very little variations of the framerate (2.81 KB, patch)
2017-12-13 10:06 UTC, Julien Isorce
committed Details | Review

Description Enrique Ocaña González 2015-12-04 19:17:55 UTC
Created attachment 316786 [details]
Our pipeline

We're working on a Raspberry Pi 2. We have an OMXH264Dec element configured to output EGL buffers (see attached pipeline), which means that internally there are two OpenMAX components playing together, video_decode and egl_render[1]. They're not tunneled, according to the values shown by gdb.

During the execution of one of our tests (VideoChangeRate from YouTube 2015, on WebKitForWayland), a call to gst_omx_video_dec_set_format()[2] is performed. That call performs several flushes on the OpenMAX component ports and, at some point [3], it disables the egl_render output port (221) and waits for the buffers currently held by the port to be released. Examining the internal GstOMX data structure[5] in gdb, there's still one buffer owned by the port (pending). The OpenMAX IL spec [4] says that the driver should call EmptyBufferDone or FillBufferDone for each buffer it has to return to us, then we free the buffer and the port disabling finishes.

The problem is that we're not receiving the Empty/FillBufferDone calls, so the call to gst_omx_port_wait_buffers_released() on the out_port (ie: the egl_render 221 port) times out (even if I set to 10 seconds). This causes an early exit of the set_format function, prevents the deallocation of the buffers in the video_decode OMX component and the re-enabling of the OMX ports. This leads to subsequent "Insufficient resources" errors in the involved components and leaves the pipeline completely unusable (see attached log fragment).

No matter how deep I investgate, I don't understand why this is happening. However, I've observed that just by preventing the early return after the waiting for buffers failure, the issue is fixed.


[1] http://home.nouwen.name/RaspberryPi/documentation/ilcomponents/index.html
[2] http://cgit.freedesktop.org/gstreamer/gst-omx/tree/omx/gstomxvideodec.c#n1819
[3] http://cgit.freedesktop.org/gstreamer/gst-omx/tree/omx/gstomxvideodec.c#n1902
[4] https://www.khronos.org/registry/omxil/specs/OpenMAX_IL_1_1_2_Specification.pdf (diagram on top of page 131)
[5] http://cgit.freedesktop.org/gstreamer/gst-omx/tree/omx/gstomx.h#n207
Comment 1 Enrique Ocaña González 2015-12-04 19:19:41 UTC
Created attachment 316787 [details]
Log  extract with some "###" comments on selected lines
Comment 2 Enrique Ocaña González 2015-12-04 20:09:46 UTC
Created attachment 316792 [details] [review]
Proposed patch

This patch fixes the problem for me. It's the best I can contribute without knowing the exact cause of the problem.

Suggestions are welcome.
Comment 3 Julien Isorce 2015-12-04 20:35:55 UTC
Hi Enrique, FWIW, I remember omx won't trigger Empty/FillBufferDone until input pad has been feed with enough data.
(also just a remark, video_decode and egl_render should be tunneled, something is wrong if you can see the opposite)

Also worth to see if one of the last patch here https://git.collabora.com/cgit/raspberry-pi/gst-omx.git/log/?h=rpi-1.0.0.1 applies to you.

(Also worth to mention, if you are still using a built-in videosink, don't forget to handle the drain query, see https://git.collabora.com/cgit/raspberry-pi/WebKit.git/tree/Source/WebCore/platform/graphics/gstreamer/VideoSinkGStreamer.cpp?id=052e962a770e7c2ee13ae64c858c67d2cc15c0cb#n629)
Comment 4 Sebastian Dröge (slomo) 2015-12-05 08:33:32 UTC
Changing the timeout to something higher might make sense though. The buffers might be used downstream still, and will only be released once downstream is done with it. Which might take a long time, especially if the pipeline is currently in PAUSED.
Comment 5 Julien Isorce 2017-07-10 19:40:06 UTC
Review of attachment 316792 [details] [review]:

::: omx/gstomxvideodec.c
@@ +1910,2 @@
         return FALSE;
+#endif

Can you do something like: timeout = use_eglimage ? N : 1;
It is expected to take much time when using eglimages. Maybe N=2 or 3 will be ok.
If you need N > 4 then I suspect that something else is wrong.
Comment 6 Julien Isorce 2017-07-20 09:09:02 UTC
Forget about my last comment (#5), I re-read what you said: "times out (even if I set to 10 seconds)". Also it rings some bells and I encoutered a similar issue recently for which I have a patch pending for some times. Let me attach it.
Comment 7 Julien Isorce 2017-07-20 09:21:31 UTC
Created attachment 356020 [details] [review]
omx{audio,video}{dec,enc}: sequentially disable ports because buffers are not shared

Please try attached patch which makes change the disable ports from parallel to sequential. The former is only required for buffer sharing. For a decoder the buffer cannot be shared between input and output port  I made this patch to be more compliante regarding the specs. And because now in 1.2.0 spec the second disable will even return an error directly (except for buffer sharing). So it is required to be done sequentially in IL 1.2.0.
But I think in your case the parallel disable is confusing the underlying omx platform when it excepts a sequential disable. This could explain the bug you encountered. Any chance you can try ?
Comment 8 Enrique Ocaña González 2017-08-03 12:39:56 UTC
Created attachment 356844 [details]
Test case results

Hi, Julien, thanks a lot for the effort of caring about this bug and for taking the time to write your patch.

The patch didn't apply cleanly, so I had to reapply the changes by hand. Then I tested again this issue with the patch I originally submitted, without it, and without it plus Julien's patch. It seems that the issues persist with Julien's patch. See the specific errors in the attached report.

I don't remember about the exact situation that triggered the problem originally, but this time I've done several seek operations (without problems) and then just let the video finish by itself (problematic).

I'm using GStreamer 1.10.4 (including gst-omx 1.10.4).
Comment 9 Julien Isorce 2017-08-16 08:35:34 UTC
Thx for checking. 

Could you explain why in the attached "Test case results" you said to seek near the end of the video, i.e. 4.08 (the video has a duration of 4:14) but then in log the error happens around 0.36 ?

Also note that the lines for which your patch applies, should not be triggered on EOS because it is inside set_format and guarded by 'is_format_changed'. Could you attach full log so that we can see why set_format is called at the end ?

Is it possible to run the test you mention in comment 0 ? (VideoChangeRate from https://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2016.html)
Comment 10 Enrique Ocaña González 2017-08-22 11:37:40 UTC
Created attachment 358136 [details]
New test case results

First of all, the original error reported by the bug isn't reproducible anymore on the VideoChangeRate test from YouTube 2015[1] nor YouTube 2016[2] MSE tests using gst-omx 1.10.4. Something might have changed in our WebKit implementation or in gst-omx since the bug was reported.

However, the patch I originally proposed is still relevant. Without it, the call to gst_omx_video_dec_set_format() still fails in the middle of a video (few times) or at the end of it (more frequently).

> Could you explain why in the attached "Test case results" you said to seek near > the end of the video, i.e. 4.08 (the video has a duration of 4:14) but then in > log the error happens around 0.36 ?

0.36 is the time the application has been running. I seeked to 4.08 as soon as I started the test, so didn't wait 4 minutes for the player to reach that position by itself.

I'm attaching a new log, where I seeked to 4:07 (movie time) at ~27 GStreamer run time. Selected log lines follow:

0:00:27.120713323   569 0x71102380 DEBUG           videodecoder gstvideodecoder.c:1396:gst_video_decoder_sink_event:<omxh264dec-omxh264dec0> received event 2563, flush-start
0:00:27.272474729   569  0x1872430 DEBUG           videodecoder gstvideodecoder.c:849:gst_video_decoder_push_event:<omxh264dec-omxh264dec0> segment time segment start=0:04:07.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x01, time=0:04:07.000000000, base=0:00:00.000000000, position 0:04:07.000000000, duration 99:99:99.999999999

Then, at some point there's a change in the caps (only framerate) which causes an error:

0:00:28.375073270   569  0x174c090 DEBUG           videodecoder gstvideodecoder.c:703:gst_video_decoder_setcaps:<omxh264dec-omxh264dec0> Checking if caps changed old video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, level=(string)4, profile=(string)high, width=(int)1920, height=(int)1012, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)50000/1677, parsed=(boolean)true new video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, level=(string)4, profile=(string)high, width=(int)1920, height=(int)1012, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)89/3, parsed=(boolean)true
0:00:32.633606862   569  0x174c090 ERROR                    omx gstomx.c:1978:gst_omx_port_wait_buffers_released_unlocked:<omxh264dec-omxh264dec0> Timeout waiting for egl_render port 221 to release all buffers
0:00:32.633974987   569  0x174c090 WARN            videodecoder gstvideodecoder.c:745:gst_video_decoder_setcaps:<omxh264dec-omxh264dec0> Subclass refused caps

And the errors reappear since then, putting the pipeline in an error state and probably causing an EOS:

0:00:32.666314102   569  0x174c090 DEBUG           videodecoder gstvideodecoder.c:1396:gst_video_decoder_sink_event:<omxh264dec-omxh264dec0> received event 28174, eos

Suprisingly, a new segment is sent to the pipeline after that:

0:00:33.463679154   569 0x6ce04ef0 DEBUG           videodecoder gstvideodecoder.c:849:gst_video_decoder_push_event:<omxh264dec-omxh264dec1> segment time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999

But the pipeline is already in an error state.

[1] http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2015.html?enablewebm=false&command=run&tests=37
[2] http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2016.html?enablewebm=false&command=run&tests=37

And nothing of this happens with the patch I originally proposed.
Comment 11 Julien Isorce 2017-12-12 14:51:52 UTC
Created attachment 365430 [details] [review]
gstomxvideodec: fix framerate overflow

In the attached logs we can see that the framerate is set to 50000/1677 which leads to buffer overflow when computing the framerate in a q16 format because fps_n is a gint. See commit message.
Comment 12 Nicolas Dufresne (ndufresne) 2017-12-12 15:34:37 UTC
Review of attachment 365430 [details] [review]:

::: omx/gstomxvideo.c
@@ +208,3 @@
+    return 0;
+
+  return (((OMX_U32) info->fps_n) << 16) / (OMX_U32) (info->fps_d);

Wouln't it be even safe to expand to 64bit and decide the 64bit value to later truncate it to 32bit ?
Comment 13 Nicolas Dufresne (ndufresne) 2017-12-12 15:36:10 UTC
Julien, is that patch a replacement or just an extra fix ?
Comment 14 Julien Isorce 2017-12-12 16:09:43 UTC
(In reply to Nicolas Dufresne (stormer) from comment #13)
> Julien, is that patch a replacement or just an extra fix ?

I believe the exact fix for the timeout mentioned in the bug title is https://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=68518acb53d4808cf8352ed4fd1cb8b331553559 . Indeed this should allow to not timeout during the dynamic format change when using appsink like in webkit. But need confirmation from Enrique.

Along the way we found that the dynamic format change should not happen. Indeed the resolution does not change, only the framerate changes but very slightly, i.e. from 50000/1677=29.81 to 89/3=29.66 so a "percentage change" of less than 1% (i.e. 100*(29.81-29.66)/29.66 = 0.50 < 1 ). So the change of the framerate should be ignored really.

Even worst the framerate would overflow in the first place so let's fix that first, see attached patch. But I am going to submit a second patch to make the comparison of the framerate more permissive, i.e. do not consider the framerate to have changed if the "percentage change" is less than 1%. This future patch will just avoid an unnecessary renegotiation.
Comment 15 Julien Isorce 2017-12-12 17:16:01 UTC
Created attachment 365448 [details] [review]
gstomxvideodec: fix framerate overflow
Comment 16 Nicolas Dufresne (ndufresne) 2017-12-12 17:36:07 UTC
Review of attachment 365448 [details] [review]:

Good.
Comment 17 Julien Isorce 2017-12-12 17:37:54 UTC
Created attachment 365453 [details] [review]
omxvideodec: ignore very little variations of the framerate
Comment 18 Julien Isorce 2017-12-13 10:06:02 UTC
Created attachment 365477 [details] [review]
omxvideodec: ignore very little variations of the framerate
Comment 19 Julien Isorce 2017-12-13 10:13:03 UTC
Comment on attachment 365448 [details] [review]
gstomxvideodec: fix framerate overflow

commit 0d2d695eff56655e4f7cb50c59fe3a6456f17e11
Author: Julien Isorce <jisorce@oblong.com>
Date:   Tue Dec 12 14:45:30 2017 +0000

    gstomxvideodec: fix framerate overflow
    
    Some live streams can set the framerate to 50000/1677 (=29.81).
    
    GstVideoInfo.fps_n << 16 is wrong if the fps_n is 50000
    (i.e. greater than 32767).
    
    https://bugzilla.gnome.org/show_bug.cgi?id=759043
Comment 20 Julien Isorce 2017-12-14 09:22:26 UTC
Comment on attachment 365477 [details] [review]
omxvideodec: ignore very little variations of the framerate

commit 681087819372e57157a13fbf3ac813a3025a2a29
Author: Julien Isorce <jisorce@oblong.com>
Date:   Tue Dec 12 17:30:27 2017 +0000

    omxvideodec: ignore very little variations of the framerate
    
    If less than 1%.
    
    The dynamic format change should not happen when the
    resolution does not change and when only the framerate
    changes but very slightly, i.e. from 50000/1677=29.81
    to 89/3=29.66 so a "percentage change" of less than 1%
    (i.e. 100*(29.81-29.66)/29.66 = 0.50 < 1 ). In that case
    just ignore it to avoid unnecessary renegotiation.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=759043