GNOME Bugzilla – Bug 787311
vtenc: leaks frames when shut down before EOS
Last modified: 2017-10-30 06:20:51 UTC
Created attachment 359186 [details] [review] add logs Starting stopping a pipeline using vtenc in my application I see a memory increase at each cycle. If I replace vtenc_h264 with openh264enc no memory is leaked. Analyzing the leak using istruments shows that one or more CMSampleBuffer are leaked. I added some logs to vtenc to understand what happens, 1) vtenc clean cur_outframes only on finish that based on my understanding should be triggerd by an EOS, so if you set the pipeline to NULL without sending an EOS as I do in my application, finish is not called 2) without calling finish gst_vtenc_stop call g_async_queue_unref (self->cur_outframes); but the queue size is > 0 I'll try to send a patch later. With the attached patch you can see logs such as this: 0:00:14.943205000 982 0x3634138 TRACE vtenc vtenc.c:1240:gst_vtenc_encode_frame:<stream_vencoder> encoding frame 294 0:00:14.981574000 982 0x36765c0 DEBUG vtenc vtenc.c:518:gst_vtenc_stop:<stream_vencoder> stopping 0:00:14.981618000 982 0x36765c0 DEBUG vtenc vtenc.c:521:gst_vtenc_stop:<stream_vencoder> destroy session 0:00:14.981643000 982 0x36765c0 DEBUG vtenc vtenc.c:891:gst_vtenc_destroy_session:<stream_vencoder> destroy session 0:00:14.985491000 982 0x3677340 TRACE vtenc vtenc.c:1322:gst_vtenc_enqueue_buffer:<stream_vencoder> encoded frame 294 0:00:15.001487000 982 0x36765c0 DEBUG vtenc vtenc.c:539:gst_vtenc_stop:<stream_vencoder> unref async queue, size 1
Created attachment 359212 [details] [review] add logs
Created attachment 359213 [details] [review] fix memleak please review, this patch seems to fix the memory leak
There's also ::drain() and ::reset(), which might be relevant here. ::stop() seems a bit late, and there you would like to drop all buffers. In ::drain() and ::finish() you would like to forward all remaining buffers downstream.
well if you set the pipeline to NULL whithout sending an EOS I think is expected to drop all remaining buffers, I think other elements use this same logic, but I could be wrong ::finish is already handled, actually: 1) if you send an EOS gst_vtenc_finish_encoding is called and the remaining buffers are forwarded downstream 2) if you set the pipeline to NULL (in my case I streaming and I can set the pipeline to NULL for example for a network error) gst_vtenc_stop is called and any remaining buffer is leaked 3) with this patch if you set the pipeline to NULL any remaining buffer is dropped and no leak happen based on the docs here: https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-GstVideoEncoder.html - reset is deprecated - flush will drop the buffers anyway - finish is handled but is relevant only if you send an EOS please let me know if you prefer an alternative approach
I think in PAUSED->READY the base class should also call ::flush() before it calls ::stop(), and then you do the clean-up like that in ::flush() (as the same behaviour also has to happen when receiving flush events).
Hi, thanks for the suggestion, I added ::flush to vtenc but this method seems not called from the base class in PAUSED->READY, for example here are the logs for this pipeline: videotestsrc ! video/x-raw,width=1920,height=1080 ! vtenc_h264 ! fakesink when I press CTRL+C 0:00:01.059625000 1034 0x7faa23002190 TRACE vtenc vtenc.c:1264:gst_vtenc_encode_frame:<vtenc_h264-0> encoding frame 30 ^Chandling interrupt. Interrotto: arresto della pipeline ... Execution ended after 0:00:00.822902000 Impostazione della pipeline a PAUSED ... Impostazione della pipeline a READY ... 0:00:01.098942000 1034 0x7faa21f00e90 TRACE vtenc vtenc.c:1346:gst_vtenc_enqueue_buffer:<vtenc_h264-0> encoded frame 17 0:00:01.099227000 1034 0x7faa21c84920 DEBUG vtenc vtenc.c:557:gst_vtenc_stop:<vtenc_h264-0> stopping 0:00:01.099247000 1034 0x7faa21c84920 DEBUG vtenc vtenc.c:560:gst_vtenc_stop:<vtenc_h264-0> destroy session .... 0:00:01.609942000 1034 0x7faa21c84920 DEBUG vtenc vtenc.c:578:gst_vtenc_stop:<vtenc_h264-0> unref async queue, size 12 and so 12 frames are leaked. Looking at gstvideoencoder.c ::flush seems called only for GST_EVENT_FLUSH_STOP event, so the cleanup on stop seems still needed am I missing something? Thanks
Yes I meant that it should be added to PAUSED->READY in the base class (maybe also for the decoder), right before ::stop() is called.
Ok, do you have suggestion to improve the patch or do you think that is good as is? If you want I can send another patch that add to vtenc the flush method too
Your patch is not changing GstVideoEncoder. I think the right way of solving it here would be to do that, and then here in vtenc implement the flush vfunc.
Created attachment 359562 [details] [review] flush encoder in in transition PAUSED->READY
Created attachment 359563 [details] [review] fix memleak should be ok now, please review
Created attachment 359564 [details] [review] drop frames on flush
Review of attachment 359562 [details] [review]: ::: gst-libs/gst/video/gstvideoencoder.c @@ +1488,3 @@ gboolean stopped = TRUE; + gst_video_encoder_flush (encoder); I think this should happen *before* chaining up to the parent class' change_state
Review of attachment 359563 [details] [review]: ::: sys/applemedia/vtenc.c @@ +532,3 @@ + ret = + gst_video_encoder_finish_frame (GST_VIDEO_ENCODER_CAST (self), + outframe); In flush() we should just drop frames and not finish them, in finish() we should do exactly the above
Review of attachment 359564 [details] [review]: Ah you do that here. Please merge this and the other patch :)
Created attachment 360516 [details] [review] flush encoder in transition PAUSED->READY
Created attachment 360517 [details] [review] fix memleak please review
Comment on attachment 360516 [details] [review] flush encoder in transition PAUSED->READY Should probably do the same for audio encoder/decoder and video decoder. Video decoder currently calls reset() *after* stop(), but reset() does not call flush(), and I think flush() should be called in the same place as in the encoder. Can you provide patches for the 3 other base classes too so we can keep consistency?
Review of attachment 360517 [details] [review]: Looks good
(In reply to Sebastian Dröge (slomo) from comment #18) > Comment on attachment 360516 [details] [review] [review] > flush encoder in transition PAUSED->READY > > Should probably do the same for audio encoder/decoder and video decoder. > Video decoder currently calls reset() *after* stop(), but reset() does not > call flush(), and I think flush() should be called in the same place as in > the encoder. > > Can you provide patches for the 3 other base classes too so we can keep > consistency? OK, I'll send the requested patch this evening or tommorrow
Created attachment 360536 [details] [review] video decoder patch tested with jpegdec that reimplement flush, the method is called in transition PAUSED->READY
Created attachment 360540 [details] [review] audio encoder patch tested with vorbisenc
Created attachment 360547 [details] [review] audio decoder patch
Comment on attachment 360536 [details] [review] video decoder patch Place is ok, but it seems weird to take the stream lock here. Is that generally done? Usually the point of flush is that you don't take the stream lock so that it can immediately unlock the streaming thread (without waiting for the streaming thread to do that by itself).
Same also for the other patches. In videoencoder it doesn't take the lock
(In reply to Sebastian Dröge (slomo) from comment #25) > Same also for the other patches. In videoencoder it doesn't take the lock Hi Sebastian, in videoencoder the lock is taken inside gst_video_encoder_flush: https://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/video/gstvideoencoder.c#n405 so the behaviour should be the same. I think videoencoder is buggy here: https://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/video/gstvideoencoder.c#n1082 the lock is taken two times, this works since it is a recursive lock but for encoder such as vtenc that could take the lock from another thread on flush this will deadlock (not tested but it will deadlock for sure). I suggest to remove the lock from gst_video_encoder_flush and explictly use the lock before calling flush, this is needed since flush can be chained with other method and the whole operations should be atomic, what do you think about?
I agree but nonetheless I'm confused that flush is taking the lock at all, in all the base classes.
OK, so please let me know how do you think I have to proceed to close this bug: 1) remove the lock on flush in all the base classes (need to be tested carefully) 2) remove the lock from inside gst_video_encoder_flush and modify my patch for the videoencoder to lock before calling flush as the other patchs do
I think 1) but I would have to carefully check all the code. Why is the mutex taken there at all right now? I don't have the time to look into this in detail right now. 2) would be the minimum.
Created attachment 360593 [details] [review] remove lock from gst_video_encoder_flush
Created attachment 360594 [details] [review] flush encoder in transition PAUSED->READY
please note that the decoder patch fix a memory leak in vtdec too. The test case is the same if you stop the pipeline before eos flush method in vtdec is not called without this patch and gst_vtdec_push_frames_if_needed is not called on stop https://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/sys/applemedia/vtdec.c#n801
is there anything other that I can do here?
Not really, I or someone else just has to find some time to go through all it in detail and do further testing to see if it breaks something else.
base: commit 877664a414a466cfcc71c79d28c470722408c9a7 (HEAD -> master) Author: Nicola Murino <nicola.murino@gmail.com> Date: Thu Sep 28 13:17:05 2017 +0200 videoencoder: flush encoder in transition PAUSED->READY https://bugzilla.gnome.org/show_bug.cgi?id=787311 commit 6627dd3ae3797eb551ed2ac9980d6a19b060ba7c Author: Nicola Murino <nicola.murino@gmail.com> Date: Thu Sep 28 13:12:58 2017 +0200 videoencoder: remove the lock from gst_video_encoder_flush The lock is already taken before calling the flush method and can lead to deadlock for some encoders that need to take the same lock from another thread while flushing https://bugzilla.gnome.org/show_bug.cgi?id=787311 commit 6e9edc3031935ce8d6640d9774a7c093d20d4366 Author: Nicola Murino <nicola.murino@gmail.com> Date: Wed Sep 27 16:08:10 2017 +0200 videodecoder: flush decoder in transition PAUSED->READY https://bugzilla.gnome.org/show_bug.cgi?id=787311 commit 2dcdd13512e788797d03f9990994eecdd339ca0e Author: Nicola Murino <nicola.murino@gmail.com> Date: Wed Sep 27 16:32:13 2017 +0200 audioencoder: flush encoder in transition PAUSED->READY https://bugzilla.gnome.org/show_bug.cgi?id=787311 commit e7cf4c058ddd466a797623b5ddf162c263d07059 Author: Nicola Murino <nicola.murino@gmail.com> Date: Wed Sep 27 16:41:51 2017 +0200 audiodecoder: flush decoder in transition PAUSED->READY https://bugzilla.gnome.org/show_bug.cgi?id=787311 bad: commit c808b4dd115498e3b9f62b0cec32a55a8cf0f84d Author: Nicola Murino <nicola.murino@gmail.com> Date: Wed Sep 27 11:37:26 2017 +0200 vtenc: fix memory leak finish encoding and clean buffers queue on flush. This avoid a memory leak if the element shuts down before EOS https://bugzilla.gnome.org/show_bug.cgi?id=787311
(In reply to Matthew Waters (ystreet00) from comment #35) > commit e7cf4c058ddd466a797623b5ddf162c263d07059 > Author: Nicola Murino <nicola.murino@gmail.com> > Date: Wed Sep 27 16:41:51 2017 +0200 > > audiodecoder: flush decoder in transition PAUSED->READY > > https://bugzilla.gnome.org/show_bug.cgi?id=787311 That patch is problematic in GstV4l2VideoDecoder. First, we where already flushing, so it's a duplicated. But then, the actual flush implementation brings back the decoder to it's active state, to be ready to receive future buffers. That's is a terrible idea, since it is a lot of IOCTL to requeue all the flushed buffers into capture queue. It's makes tracing of the driver interaction quite horrible and the quick STREAMOFF/STREAMON sequence caused by the fact that we'll shut it down again in _stop() is known to trigger race condition in some buggy firmware.
(In reply to Nicola from comment #28) > OK, so please let me know how do you think I have to proceed to close this > bug: > > 1) remove the lock on flush in all the base classes (need to be tested > carefully) > 2) remove the lock from inside gst_video_encoder_flush and modify my patch > for the videoencoder to lock before calling flush as the other patchs do The base class stream lock is recursive, so in theory, this should have no effect.
Maybe what you wanted is an unlock() virtual method ?
(In reply to Nicolas Dufresne (stormer) from comment #37) > (In reply to Nicola from comment #28) > > OK, so please let me know how do you think I have to proceed to close this > > bug: > > > > 1) remove the lock on flush in all the base classes (need to be tested > > carefully) > > 2) remove the lock from inside gst_video_encoder_flush and modify my patch > > for the videoencoder to lock before calling flush as the other patchs do > > The base class stream lock is recursive, so in theory, this should have no > effect. if the lock is acquired more than one time vtenc will deadlock, please see the comment here: https://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/sys/applemedia/vtenc.c#n513
I see, that's because the subclass can't unlock it. This lock is pretty wrong, it should go away in the next base class generation.
109 failures in gst-validate https://ci.gstreamer.net/job/GStreamer-master-validate/5503/testReport/ It also makes other decoders (like gst-ffmpeg) fail Can we revert this ?
commit a240880664d13b10fe46eee3f55cb7facf97090f Author: Matthew Waters <matthew@centricular.com> Date: Sun Oct 22 01:00:10 2017 +1100 Revert "videoencoder: flush encoder in transition PAUSED->READY" This reverts commit 877664a414a466cfcc71c79d28c470722408c9a7. commit a81b5a95ba81c8ae0b4093045f84ca750f1eb4d2 Author: Matthew Waters <matthew@centricular.com> Date: Sun Oct 22 01:00:08 2017 +1100 Revert "videodecoder: flush decoder in transition PAUSED->READY" This reverts commit 6e9edc3031935ce8d6640d9774a7c093d20d4366. commit 06aba17d19c35470cd652fb880b0527a96dea368 Author: Matthew Waters <matthew@centricular.com> Date: Sun Oct 22 01:00:06 2017 +1100 Revert "audioencoder: flush encoder in transition PAUSED->READY" This reverts commit 2dcdd13512e788797d03f9990994eecdd339ca0e. commit b8369ba20d36b52d6f738970dc86b5265fb10ba3 Author: Matthew Waters <matthew@centricular.com> Date: Sun Oct 22 01:00:03 2017 +1100 Revert "audiodecoder: flush decoder in transition PAUSED->READY" This reverts commit e7cf4c058ddd466a797623b5ddf162c263d07059.
I can reproduce deadlocks (using vtdec) and segfaults (using avdec_h264) in my app too. The encoding patch seems to cause no issues to my app, anyway, can we flush on stop then? I mean flush only vtenc on stop. A similar patch is needed for vtdec too that actually leak buffers is stopped before eos
Created attachment 362095 [details] [review] flush encoder on stop fix memory leak flusing on stop
Created attachment 362096 [details] [review] flush decoder on stop fix memory leak flusing decoder on stop
Created attachment 362113 [details] [review] flush decoder on stop the actual flush implementation does not wait for the async frames so we call directly gst_vtdec_push_frames_if_needed passing 1) drain = TRUE to wait fo all async frames 2) flush = TRUE, to discard all the frames in the async queue
commit 6165e07b62bcaffc946d897bbddc724008bcf3cf Author: Nicola Murino <nicola.murino@gmail.com> Date: Mon Oct 23 10:42:51 2017 +0200 vtdec: flush decoder on stop fix a memory leak if the decoder shut down before EOS https://bugzilla.gnome.org/show_bug.cgi?id=787311 commit 889e73567549fc7ecd9d999224576735546f1672 Author: Nicola Murino <nicola.murino@gmail.com> Date: Mon Oct 23 10:40:43 2017 +0200 vtenc: flush encoder on stop https://bugzilla.gnome.org/show_bug.cgi?id=787311