After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 787311 - vtenc: leaks frames when shut down before EOS
vtenc: leaks frames when shut down before EOS
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other Linux
: Normal critical
: 1.13.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-09-05 13:45 UTC by Nicola
Modified: 2017-10-30 06:20 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
add logs (3.59 KB, patch)
2017-09-05 13:45 UTC, Nicola
none Details | Review
add logs (3.66 KB, patch)
2017-09-05 16:40 UTC, Nicola
rejected Details | Review
fix memleak (3.63 KB, patch)
2017-09-05 16:41 UTC, Nicola
none Details | Review
flush encoder in in transition PAUSED->READY (843 bytes, patch)
2017-09-11 21:28 UTC, Nicola
none Details | Review
fix memleak (4.26 KB, patch)
2017-09-11 21:33 UTC, Nicola
none Details | Review
drop frames on flush (2.27 KB, patch)
2017-09-11 21:47 UTC, Nicola
none Details | Review
flush encoder in transition PAUSED->READY (880 bytes, patch)
2017-09-27 09:05 UTC, Nicola
none Details | Review
fix memleak (4.31 KB, patch)
2017-09-27 09:40 UTC, Nicola
committed Details | Review
video decoder patch (987 bytes, patch)
2017-09-27 14:10 UTC, Nicola
none Details | Review
audio encoder patch (976 bytes, patch)
2017-09-27 14:33 UTC, Nicola
none Details | Review
audio decoder patch (945 bytes, patch)
2017-09-27 14:43 UTC, Nicola
none Details | Review
remove lock from gst_video_encoder_flush (1.02 KB, patch)
2017-09-28 11:18 UTC, Nicola
committed Details | Review
flush encoder in transition PAUSED->READY (985 bytes, patch)
2017-09-28 11:19 UTC, Nicola
none Details | Review
flush encoder on stop (771 bytes, patch)
2017-10-23 12:03 UTC, Nicola
committed Details | Review
flush decoder on stop (746 bytes, patch)
2017-10-23 12:04 UTC, Nicola
none Details | Review
flush decoder on stop (775 bytes, patch)
2017-10-23 16:31 UTC, Nicola
committed Details | Review

Description Nicola 2017-09-05 13:45:08 UTC
Created attachment 359186 [details] [review]
add logs

Starting stopping a pipeline using vtenc in my application I see a memory increase at each cycle. If I replace vtenc_h264 with openh264enc no memory is leaked. Analyzing the leak using istruments shows that one or more CMSampleBuffer are leaked.

I added some logs to vtenc to understand what happens, 

1) vtenc clean cur_outframes only on finish that based on my understanding should be triggerd by an EOS, so if you set the pipeline to NULL without sending an EOS as I do in my application, finish is not called
2) without calling finish gst_vtenc_stop call g_async_queue_unref (self->cur_outframes); but the queue size is > 0

I'll try to send a patch later.

With the attached patch you can see logs such as this:

0:00:14.943205000   982  0x3634138 TRACE                  vtenc vtenc.c:1240:gst_vtenc_encode_frame:<stream_vencoder> encoding frame 294
0:00:14.981574000   982  0x36765c0 DEBUG                  vtenc vtenc.c:518:gst_vtenc_stop:<stream_vencoder> stopping
0:00:14.981618000   982  0x36765c0 DEBUG                  vtenc vtenc.c:521:gst_vtenc_stop:<stream_vencoder> destroy session
0:00:14.981643000   982  0x36765c0 DEBUG                  vtenc vtenc.c:891:gst_vtenc_destroy_session:<stream_vencoder> destroy session
0:00:14.985491000   982  0x3677340 TRACE                  vtenc vtenc.c:1322:gst_vtenc_enqueue_buffer:<stream_vencoder> encoded frame 294
0:00:15.001487000   982  0x36765c0 DEBUG                  vtenc vtenc.c:539:gst_vtenc_stop:<stream_vencoder> unref async queue, size 1
Comment 1 Nicola 2017-09-05 16:40:46 UTC
Created attachment 359212 [details] [review]
add logs
Comment 2 Nicola 2017-09-05 16:41:33 UTC
Created attachment 359213 [details] [review]
fix memleak

please review, this patch seems to fix the memory leak
Comment 3 Sebastian Dröge (slomo) 2017-09-11 09:33:15 UTC
There's also ::drain() and ::reset(), which might be relevant here.

::stop() seems a bit late, and there you would like to drop all buffers. In ::drain() and ::finish() you would like to forward all remaining buffers downstream.
Comment 4 Nicola 2017-09-11 09:49:35 UTC
well if you set the pipeline to NULL whithout sending an EOS I think is expected to drop all remaining buffers, I think other elements use this same logic, but I could be wrong

::finish is already handled, 

actually:

1) if you send an EOS gst_vtenc_finish_encoding is called and the remaining buffers are forwarded downstream
2) if you set the pipeline to NULL (in my case I streaming and I can set the pipeline to NULL for example for a network error) gst_vtenc_stop is called and any remaining buffer is leaked
3) with this patch if you set the pipeline to NULL any remaining buffer is dropped and no leak happen

based on the docs here:

https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-GstVideoEncoder.html

- reset is deprecated
- flush will drop the buffers anyway
- finish is handled but is relevant only if you send an EOS

please let me know if you prefer an alternative approach
Comment 5 Sebastian Dröge (slomo) 2017-09-11 10:25:34 UTC
I think in PAUSED->READY the base class should also call ::flush() before it calls ::stop(), and then you do the clean-up like that in ::flush() (as the same behaviour also has to happen when receiving flush events).
Comment 6 Nicola 2017-09-11 12:21:04 UTC
Hi, thanks for the suggestion, I added ::flush to vtenc but this method seems not called from the base class in PAUSED->READY, for example here are the logs for this pipeline:

videotestsrc ! video/x-raw,width=1920,height=1080 ! vtenc_h264 ! fakesink

when I press CTRL+C

0:00:01.059625000  1034 0x7faa23002190 TRACE                  vtenc vtenc.c:1264:gst_vtenc_encode_frame:<vtenc_h264-0> encoding frame 30
^Chandling interrupt.
Interrotto: arresto della pipeline ...
Execution ended after 0:00:00.822902000
Impostazione della pipeline a PAUSED ...
Impostazione della pipeline a READY ...
0:00:01.098942000  1034 0x7faa21f00e90 TRACE                  vtenc vtenc.c:1346:gst_vtenc_enqueue_buffer:<vtenc_h264-0> encoded frame 17
0:00:01.099227000  1034 0x7faa21c84920 DEBUG                  vtenc vtenc.c:557:gst_vtenc_stop:<vtenc_h264-0> stopping
0:00:01.099247000  1034 0x7faa21c84920 DEBUG                  vtenc vtenc.c:560:gst_vtenc_stop:<vtenc_h264-0> destroy session

....

0:00:01.609942000  1034 0x7faa21c84920 DEBUG                  vtenc vtenc.c:578:gst_vtenc_stop:<vtenc_h264-0> unref async queue, size 12

and so 12 frames are leaked.

Looking at gstvideoencoder.c ::flush seems called only for GST_EVENT_FLUSH_STOP event, so the cleanup on stop seems still needed

am I missing something? Thanks
Comment 7 Sebastian Dröge (slomo) 2017-09-11 13:07:43 UTC
Yes I meant that it should be added to PAUSED->READY in the base class (maybe also for the decoder), right before ::stop() is called.
Comment 8 Nicola 2017-09-11 13:16:36 UTC
Ok, do you have suggestion to improve the patch or do you think that is good as is?

If you want I can send another patch that add to vtenc the flush method too
Comment 9 Sebastian Dröge (slomo) 2017-09-11 17:28:12 UTC
Your patch is not changing GstVideoEncoder. I think the right way of solving it here would be to do that, and then here in vtenc implement the flush vfunc.
Comment 10 Nicola 2017-09-11 21:28:36 UTC
Created attachment 359562 [details] [review]
flush encoder in  in transition PAUSED->READY
Comment 11 Nicola 2017-09-11 21:33:23 UTC
Created attachment 359563 [details] [review]
fix memleak

should be ok now, please review
Comment 12 Nicola 2017-09-11 21:47:04 UTC
Created attachment 359564 [details] [review]
drop frames on flush
Comment 13 Sebastian Dröge (slomo) 2017-09-26 08:21:46 UTC
Review of attachment 359562 [details] [review]:

::: gst-libs/gst/video/gstvideoencoder.c
@@ +1488,3 @@
       gboolean stopped = TRUE;
 
+      gst_video_encoder_flush (encoder);

I think this should happen *before* chaining up to the parent class' change_state
Comment 14 Sebastian Dröge (slomo) 2017-09-26 08:23:19 UTC
Review of attachment 359563 [details] [review]:

::: sys/applemedia/vtenc.c
@@ +532,3 @@
+    ret =
+        gst_video_encoder_finish_frame (GST_VIDEO_ENCODER_CAST (self),
+        outframe);

In flush() we should just drop frames and not finish them, in finish() we should do exactly the above
Comment 15 Sebastian Dröge (slomo) 2017-09-26 08:23:44 UTC
Review of attachment 359564 [details] [review]:

Ah you do that here. Please merge this and the other patch :)
Comment 16 Nicola 2017-09-27 09:05:16 UTC
Created attachment 360516 [details] [review]
flush encoder in transition PAUSED->READY
Comment 17 Nicola 2017-09-27 09:40:48 UTC
Created attachment 360517 [details] [review]
fix memleak

please review
Comment 18 Sebastian Dröge (slomo) 2017-09-27 10:10:46 UTC
Comment on attachment 360516 [details] [review]
flush encoder in transition PAUSED->READY

Should probably do the same for audio encoder/decoder and video decoder. Video decoder currently calls reset() *after* stop(), but reset() does not call flush(), and I think flush() should be called in the same place as in the encoder.

Can you provide patches for the 3 other base classes too so we can keep consistency?
Comment 19 Sebastian Dröge (slomo) 2017-09-27 10:16:52 UTC
Review of attachment 360517 [details] [review]:

Looks good
Comment 20 Nicola 2017-09-27 10:18:23 UTC
(In reply to Sebastian Dröge (slomo) from comment #18)
> Comment on attachment 360516 [details] [review] [review]
> flush encoder in transition PAUSED->READY
> 
> Should probably do the same for audio encoder/decoder and video decoder.
> Video decoder currently calls reset() *after* stop(), but reset() does not
> call flush(), and I think flush() should be called in the same place as in
> the encoder.
> 
> Can you provide patches for the 3 other base classes too so we can keep
> consistency?

OK, I'll send the requested patch this evening or tommorrow
Comment 21 Nicola 2017-09-27 14:10:44 UTC
Created attachment 360536 [details] [review]
video decoder patch

tested with jpegdec that reimplement flush, the method is called in transition PAUSED->READY
Comment 22 Nicola 2017-09-27 14:33:29 UTC
Created attachment 360540 [details] [review]
audio encoder patch

tested with vorbisenc
Comment 23 Nicola 2017-09-27 14:43:25 UTC
Created attachment 360547 [details] [review]
audio decoder patch
Comment 24 Sebastian Dröge (slomo) 2017-09-28 09:25:50 UTC
Comment on attachment 360536 [details] [review]
video decoder patch

Place is ok, but it seems weird to take the stream lock here. Is that generally done? Usually the point of flush is that you don't take the stream lock so that it can immediately unlock the streaming thread (without waiting for the streaming thread to do that by itself).
Comment 25 Sebastian Dröge (slomo) 2017-09-28 09:27:18 UTC
Same also for the other patches. In videoencoder it doesn't take the lock
Comment 26 Nicola 2017-09-28 09:48:16 UTC
(In reply to Sebastian Dröge (slomo) from comment #25)
> Same also for the other patches. In videoencoder it doesn't take the lock

Hi Sebastian,

in videoencoder the lock is taken inside gst_video_encoder_flush:

https://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/video/gstvideoencoder.c#n405

so the behaviour should be the same.

I think videoencoder is buggy here:

https://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/video/gstvideoencoder.c#n1082

the lock is taken two times, this works since it is a recursive lock but for encoder such as vtenc that could take the lock from another thread on flush this will deadlock (not tested but it will deadlock for sure). 

I suggest to remove the lock from gst_video_encoder_flush and explictly use the lock before calling flush, this is needed since flush can be chained with other method and the whole operations should be atomic, what do you think about?
Comment 27 Sebastian Dröge (slomo) 2017-09-28 10:12:59 UTC
I agree but nonetheless I'm confused that flush is taking the lock at all, in all the base classes.
Comment 28 Nicola 2017-09-28 10:22:39 UTC
OK, so please let me know how do you think I have to proceed to close this bug:

1) remove the lock on flush in all the base classes (need to be tested carefully)
2) remove the lock from inside gst_video_encoder_flush and modify my patch for the videoencoder to lock before calling flush as the other patchs do
Comment 29 Sebastian Dröge (slomo) 2017-09-28 10:34:17 UTC
I think 1) but I would have to carefully check all the code. Why is the mutex taken there at all right now? I don't have the time to look into this in detail right now.

2) would be the minimum.
Comment 30 Nicola 2017-09-28 11:18:26 UTC
Created attachment 360593 [details] [review]
remove lock from gst_video_encoder_flush
Comment 31 Nicola 2017-09-28 11:19:20 UTC
Created attachment 360594 [details] [review]
flush encoder in transition PAUSED->READY
Comment 32 Nicola 2017-10-03 10:14:31 UTC
please note that the decoder patch fix a memory leak in vtdec too.

The test case is the same if you stop the pipeline before eos flush method in vtdec is not called without this patch and gst_vtdec_push_frames_if_needed is not called on stop

https://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/sys/applemedia/vtdec.c#n801
Comment 33 Nicola 2017-10-15 18:26:38 UTC
is there anything other that I can do here?
Comment 34 Sebastian Dröge (slomo) 2017-10-16 07:45:33 UTC
Not really, I or someone else just has to find some time to go through all it in detail and do further testing to see if it breaks something else.
Comment 35 Matthew Waters (ystreet00) 2017-10-19 14:56:54 UTC
base:

commit 877664a414a466cfcc71c79d28c470722408c9a7 (HEAD -> master)
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Thu Sep 28 13:17:05 2017 +0200

    videoencoder: flush encoder in transition PAUSED->READY
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311

commit 6627dd3ae3797eb551ed2ac9980d6a19b060ba7c
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Thu Sep 28 13:12:58 2017 +0200

    videoencoder: remove the lock from gst_video_encoder_flush
    
    The lock is already taken before calling the flush method and can lead to
    deadlock for some encoders that need to take the same lock from another
    thread while flushing
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311

commit 6e9edc3031935ce8d6640d9774a7c093d20d4366
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Wed Sep 27 16:08:10 2017 +0200

    videodecoder: flush decoder in transition PAUSED->READY
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311

commit 2dcdd13512e788797d03f9990994eecdd339ca0e
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Wed Sep 27 16:32:13 2017 +0200

    audioencoder: flush encoder in transition PAUSED->READY
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311

commit e7cf4c058ddd466a797623b5ddf162c263d07059
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Wed Sep 27 16:41:51 2017 +0200

    audiodecoder: flush decoder in transition PAUSED->READY
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311

bad:

commit c808b4dd115498e3b9f62b0cec32a55a8cf0f84d
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Wed Sep 27 11:37:26 2017 +0200

    vtenc: fix memory leak
    
    finish encoding and clean buffers queue on flush. This avoid a memory leak
    if the element shuts down before EOS
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311
Comment 36 Nicolas Dufresne (ndufresne) 2017-10-19 21:50:18 UTC
(In reply to Matthew Waters (ystreet00) from comment #35)
> commit e7cf4c058ddd466a797623b5ddf162c263d07059
> Author: Nicola Murino <nicola.murino@gmail.com>
> Date:   Wed Sep 27 16:41:51 2017 +0200
> 
>     audiodecoder: flush decoder in transition PAUSED->READY
>     
>     https://bugzilla.gnome.org/show_bug.cgi?id=787311

That patch is problematic in GstV4l2VideoDecoder. First, we where already flushing, so it's a duplicated. But then, the actual flush implementation brings back the decoder to it's active state, to be ready to receive future buffers. That's is a terrible idea, since it is a lot of IOCTL to requeue all the flushed buffers into capture queue. It's makes tracing of the driver interaction quite horrible and the quick STREAMOFF/STREAMON sequence caused by the fact that we'll shut it down again in _stop() is known to trigger race condition in some buggy firmware.
Comment 37 Nicolas Dufresne (ndufresne) 2017-10-19 21:52:46 UTC
(In reply to Nicola from comment #28)
> OK, so please let me know how do you think I have to proceed to close this
> bug:
> 
> 1) remove the lock on flush in all the base classes (need to be tested
> carefully)
> 2) remove the lock from inside gst_video_encoder_flush and modify my patch
> for the videoencoder to lock before calling flush as the other patchs do

The base class stream lock is recursive, so in theory, this should have no effect.
Comment 38 Nicolas Dufresne (ndufresne) 2017-10-19 21:55:03 UTC
Maybe what you wanted is an unlock() virtual method ?
Comment 39 Nicola 2017-10-19 22:32:55 UTC
(In reply to Nicolas Dufresne (stormer) from comment #37)
> (In reply to Nicola from comment #28)
> > OK, so please let me know how do you think I have to proceed to close this
> > bug:
> > 
> > 1) remove the lock on flush in all the base classes (need to be tested
> > carefully)
> > 2) remove the lock from inside gst_video_encoder_flush and modify my patch
> > for the videoencoder to lock before calling flush as the other patchs do
> 
> The base class stream lock is recursive, so in theory, this should have no
> effect.

if the lock is acquired more than one time vtenc will deadlock, please see the comment here:

https://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/sys/applemedia/vtenc.c#n513
Comment 40 Nicolas Dufresne (ndufresne) 2017-10-20 07:36:44 UTC
I see, that's because the subclass can't unlock it. This lock is pretty wrong, it should go away in the next base class generation.
Comment 41 Edward Hervey 2017-10-21 12:36:06 UTC
109 failures in gst-validate

https://ci.gstreamer.net/job/GStreamer-master-validate/5503/testReport/

It also makes other decoders (like gst-ffmpeg) fail

Can we revert this ?
Comment 42 Matthew Waters (ystreet00) 2017-10-21 14:00:42 UTC
commit a240880664d13b10fe46eee3f55cb7facf97090f
Author: Matthew Waters <matthew@centricular.com>
Date:   Sun Oct 22 01:00:10 2017 +1100

    Revert "videoencoder: flush encoder in transition PAUSED->READY"
    
    This reverts commit 877664a414a466cfcc71c79d28c470722408c9a7.

commit a81b5a95ba81c8ae0b4093045f84ca750f1eb4d2
Author: Matthew Waters <matthew@centricular.com>
Date:   Sun Oct 22 01:00:08 2017 +1100

    Revert "videodecoder: flush decoder in transition PAUSED->READY"
    
    This reverts commit 6e9edc3031935ce8d6640d9774a7c093d20d4366.

commit 06aba17d19c35470cd652fb880b0527a96dea368
Author: Matthew Waters <matthew@centricular.com>
Date:   Sun Oct 22 01:00:06 2017 +1100

    Revert "audioencoder: flush encoder in transition PAUSED->READY"
    
    This reverts commit 2dcdd13512e788797d03f9990994eecdd339ca0e.

commit b8369ba20d36b52d6f738970dc86b5265fb10ba3
Author: Matthew Waters <matthew@centricular.com>
Date:   Sun Oct 22 01:00:03 2017 +1100

    Revert "audiodecoder: flush decoder in transition PAUSED->READY"
    
    This reverts commit e7cf4c058ddd466a797623b5ddf162c263d07059.
Comment 43 Nicola 2017-10-21 20:58:23 UTC
I can reproduce deadlocks (using vtdec) and segfaults (using avdec_h264) in my app too. 

The encoding patch seems to cause no issues to my app,

anyway, can we flush on stop then? I mean flush only vtenc on stop.

A similar patch is needed for vtdec too that actually leak buffers is stopped before eos
Comment 44 Nicola 2017-10-23 12:03:35 UTC
Created attachment 362095 [details] [review]
flush encoder on stop

fix memory leak flusing on stop
Comment 45 Nicola 2017-10-23 12:04:07 UTC
Created attachment 362096 [details] [review]
flush decoder on stop

fix memory leak flusing decoder on stop
Comment 46 Nicola 2017-10-23 16:31:22 UTC
Created attachment 362113 [details] [review]
flush decoder on stop

the actual flush implementation does not wait for the async frames so we call directly gst_vtdec_push_frames_if_needed passing

1) drain = TRUE to wait fo all async frames
2) flush = TRUE, to discard all the frames in the async queue
Comment 47 Matthew Waters (ystreet00) 2017-10-30 06:20:24 UTC
commit 6165e07b62bcaffc946d897bbddc724008bcf3cf
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Mon Oct 23 10:42:51 2017 +0200

    vtdec: flush decoder on stop
    
    fix a memory leak if the decoder shut down before EOS
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311

commit 889e73567549fc7ecd9d999224576735546f1672
Author: Nicola Murino <nicola.murino@gmail.com>
Date:   Mon Oct 23 10:40:43 2017 +0200

    vtenc: flush encoder on stop
    
    https://bugzilla.gnome.org/show_bug.cgi?id=787311