GNOME Bugzilla – Bug 696770
avdec_h264: Memory leak when processing h.264 field pictures
Last modified: 2013-12-01 13:41:21 UTC
Our application demuxes, parses and decodes a live h.264 transport stream. The input consists, mostly, of "field picutres". These seem to decode properly. The decoder produces one output frame for each pair of input buffers, as expected. However, some memory is not free'd. Three related items are not free'd. 1. The input buffer created by the parser. 2. A GstVideoCodecFrame allocated by GstVideoDecoder. It has a reference to the buffer. 3. A linked list element used to keep a list of active frames in GstVideoDecoder. The list is in _GstVideoDecoderPrivate.frames. GstVideoDecoder will release the memory in question if gst_video_decoder_drop_frame or gst_video_decoder_finish_frame is called. GstFFMpegVidDec (gst-libav\ext\libav\gstavviddec.c) calls gst_video_decoder_finish_frame for each output frame, but doesn't call anything for the other field picture. I tried calling gst_video_decoder_finish_frame as well as some other ways of releasing the memory but couldn't find a reliable way to release the memory without releasing memory that the decoder was still using. As an experiment, I tried releasing old frames when the GstVideoDecoder frame list grew longer than 100 frames. This fixed the memory leak. I don't think that this a solution, but it does help validate the above analysis.
Could you make a sample stream available by any chance, or point us to one? (Or does it happen with all interlaced H.264 streams?) Does it happen with fakesink as well?
Here's a link to a file: https://www.box.com/s/mfiu8vx10bwfno0em9wm Please note that this file contains a transport stream with two programs, one in h.264, the other is MPEG2. Please use program 1. I don't think the problem happens with all interlaced streams. But I don't know for sure. We don't have any problem with MPEG2 interlaced streams, but they have interlaced "frame pictures", so two fields per picture, one picture in, one out. I would expect the same sort of problem if we had MPEG2 interlaced "field pictures". These would have the "two in, one out", problem. So, I think the problem is related to "field pictures" rather than interlace or h.264 vs MPEG2. You want me to put a fakesink in place of the decoder? Thanks, Michael
I have noticed the same issue with field-encoded mpeg2 streams sometimes before but didn't get time to investigate. It seems that the following field-encoded sample is causing memory leak. Clearly noticeable with top command. http://samples.mplayerhq.hu/MPEG-VOB/interlaced/uruseiyatsura.vob Now it seems that there are some timestamp issue also !
Thanks for the samples.
For the field encoded pictures (interlaced stream with individual field encoded frames), the get_buffer() method of libav will get invoked only for one field (either top or bottom). Which means the decoder is using the same buffer for rendering both top and bottom field. If I understood correctly we are supposed to call _finish_frame() or drop_frame() for all the frames that reaches in handle_frame().Here we are not calling any of them for alternate fields. To fix this mem leak: 1) either we need to drop the alternate fields 2) or set the DECODE_ONLY flag to alternate fields and invoke _finish_frame. I will attach a quick fix.
Created attachment 245998 [details] [review] Fix the memory leak associated with field picture decoding
Created attachment 246005 [details] [review] Fix some comments
Review of attachment 245998 [details] [review]: ::: ext/libav/gstavviddec.c @@ +1400,3 @@ + ffmpeg is allocating the output buffer only for complete frames and not + for the individual fields. */ + if (frame && !frame->output_buffer) { I don't think this is correct. libav is not necessarily allocating an output buffer for the current input frame immediately Also does this mean that every GstVideoCodecFrame was a field here, not a frame? In that case it would be something that the parser should've fixed, GstVideoCodecFrame is supposed to be one complete frame with everything related to it.
I'm attaching my temporary fix. It's a diff from 1.0.7. However, I edited the diff to remove unrelated changes. The basic approach is to detect when the decoder processes a frame without allocating an output buffer. I set a flag when the buffer is allocated and check it after the decode call. Before releasing a frame, I, change the reference count. I don't think my fix ia really correct, but it does handle the normal case. Hopefully, it helps describe the problem. It would be nice to combine the two field pictures into a single frame at an earlier stage, but the h.264 deccoder would need to handle it as well.
Created attachment 246020 [details] [review] Edited patches for gstavvidec.c to release frames from field pictures.
libav doesn't really care about the framing, so having both fields in a single buffer should be fine.
(In reply to comment #8) > Review of attachment 245998 [details] [review]: > > ::: ext/libav/gstavviddec.c > @@ +1400,3 @@ > + ffmpeg is allocating the output buffer only for complete frames and not > + for the individual fields. */ > + if (frame && !frame->output_buffer) { > > I don't think this is correct. libav is not necessarily allocating an output > buffer for the current input frame immediately Does the following code path in get_buffer() making sure the latest frame buffer allocation? picture->reordered_opaque = context->reordered_opaque; frame = gst_video_decoder_get_frame (GST_VIDEO_DECODER (ffmpegdec), picture->reordered_opaque); > > Also does this mean that every GstVideoCodecFrame was a field here, not a > frame? In that case it would be something that the parser should've fixed, > GstVideoCodecFrame is supposed to be one complete frame with everything related > to it. Yes, for field-encoded stream, every GstVideoCodecFrame is a field. The issue is exactly what I mentioned before. I don't think it is the duty of parser. Because if I understood correctly some decoders can send fields as frames to renderer directly. For eg, vaapi has a feature for this. vaPutsurface() api has flags to specify whether the surface has top fields or bottom fields. But AFAIK this is not yet supported in driver. So in current gstreamer-vaapi we are using the same surface to render both top and bottom fields. I have a fix for this to gstreamer-vaapi also: https://bugzilla.gnome.org/show_bug.cgi?id=701257 We need similar fixes to mpeg2dec also.
(In reply to comment #12) > (In reply to comment #8) > > > Yes, for field-encoded stream, every GstVideoCodecFrame is a field. The issue > is exactly what I mentioned before. I don't think it is the duty of parser. > Because if I understood correctly some decoders can send fields as frames to > renderer directly. For eg, vaapi has a feature for this. vaPutsurface() api > has flags to specify whether the surface has top fields or bottom fields. But > AFAIK this is not yet supported in driver. So in current gstreamer-vaapi we are > using the same surface to render both top and bottom fields. Using same surface but rendering each field separately using the flags in vaPutSurface. > > I have a fix for this to gstreamer-vaapi also: > https://bugzilla.gnome.org/show_bug.cgi?id=701257 > > We need similar fixes to mpeg2dec also.
(In reply to comment #12) > (In reply to comment #8) > > Review of attachment 245998 [details] [review] [details]: > > > > ::: ext/libav/gstavviddec.c > > @@ +1400,3 @@ > > + ffmpeg is allocating the output buffer only for complete frames and not > > + for the individual fields. */ > > + if (frame && !frame->output_buffer) { > > > > I don't think this is correct. libav is not necessarily allocating an output > > buffer for the current input frame immediately > > Does the following code path in get_buffer() making sure the latest frame > buffer allocation? > > picture->reordered_opaque = context->reordered_opaque; > frame = > gst_video_decoder_get_frame (GST_VIDEO_DECODER (ffmpegdec), > picture->reordered_opaque); No, it only makes sure that some frames gets a buffer allocation. Might be the latest, might be any other frame. > > Also does this mean that every GstVideoCodecFrame was a field here, not a > > frame? In that case it would be something that the parser should've fixed, > > GstVideoCodecFrame is supposed to be one complete frame with everything related > > to it. > > Yes, for field-encoded stream, every GstVideoCodecFrame is a field. The issue > is exactly what I mentioned before. I don't think it is the duty of parser. > Because if I understood correctly some decoders can send fields as frames to > renderer directly. For eg, vaapi has a feature for this. vaPutsurface() api > has flags to specify whether the surface has top fields or bottom fields. But > AFAIK this is not yet supported in driver. So in current gstreamer-vaapi we are > using the same surface to render both top and bottom fields. > > I have a fix for this to gstreamer-vaapi also: > https://bugzilla.gnome.org/show_bug.cgi?id=701257 > > We need similar fixes to mpeg2dec also. Then this somehow needs to be handled in the GstVideoDecoder::parse() function to accumulate all fields before claiming something to be a frame. The detection of this could be based on buffer flags, that are put on there by the parser (http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-gstvideo.html#GstVideoBufferFlags ONEFIELD | TFF and ONEFIELD | RFF)
But we have to decode the fields individually. Which means considering each of them as frames. Remember, this is only in case of filed-enocoded interlaced content and not for interlaced frames.
(In reply to comment #15) > But we have to decode the fields individually. Which means considering each of > them as frames. Remember, this is only in case of filed-enocoded interlaced > content and not for interlaced frames. Yes, but libav does not care about that. For other decoders that do care see bug #660770
(In reply to comment #15) > But we have to decode the fields individually. Which means considering each of > them as frames. Remember, this is only in case of filed-enocoded interlaced > content and not for interlaced frames. Yes, but libav does not care if you pass two fields at once or just one. For other decoders that do care see bug #660770
(In reply to comment #17) > (In reply to comment #15) > > But we have to decode the fields individually. Which means considering each of > > them as frames. Remember, this is only in case of filed-enocoded interlaced > > content and not for interlaced frames. > > Yes, but libav does not care if you pass two fields at once or just one. For > other decoders that do care see bug #660770 aha, okay. So you mean, the temporary fix would be to send only complete frame to handle_frame() from base_decoder. right? But how can we make sure that the frame is complete or not from basedecoder if there is no parser in the pipeline? Only the individual decoders which are doing parsing themselves can identify the frame boundaries. Am i missing something?
There should be a parser in the pipeline. We require this for almost all decoders now.
(In reply to comment #19) > There should be a parser in the pipeline. We require this for almost all > decoders now. Okay, So the assumption is that the decoder is always receiving data as complete frames(packetized) if there is a parser in the pipeline. right?
Yes
(In reply to comment #21) > Yes Okay ... so the temporary fix for this mem leak at the moment (with out fixing https://bugzilla.gnome.org/show_bug.cgi?id=660770) is to, --send only complete frame from videoparser elements. right?
I prefer to raise the severity of this bug (may be this one also: 660770) since it is a huge memory leak and I have many samples like this.
Yeah, I think the parsers were supposed to do that anyway. They send everything up to and including a frame... just didn't consider separate fields.
(In reply to comment #24) > Yeah, I think the parsers were supposed to do that anyway. Irrespective of the solution of https://bugzilla.gnome.org/show_bug.cgi?id=660770 ?? They send everything > up to and including a frame... just didn't consider separate fields.
There should be some kind of new "stream-format" for that then IMHO. The only alternative would be that we *require* all decoders that handle a codec with separate-field encoding to combine the fields in GstVideoDecoder::parse() unless they handle partial frames somehow and #660770 is fixed.
For me #660770 is more about slices than fields (I don't think raw video slices is something we need to support anytime soon).
How would you handle fields differently?
(In reply to comment #26) > There should be some kind of new "stream-format" for that then IMHO. > > > The only alternative would be that we *require* all decoders that handle a > codec with separate-field encoding to combine the fields in > GstVideoDecoder::parse() unless they handle partial frames somehow and #660770 > is fixed. The alternative solution might not be good for decoders like gst-libav if they wants to handle the fields separately since it is not overriding the GstVideoDecoder::parse() routine at the moment. And we cannot add it to basedecoder since the decision of handling them as separate fields or not is taken by individual decoders. So the solution would be to combine them from parser. But somehow i don't feel it as a good solution. :)
gst-libav could just override the ::parse vfunc and implement that based on the buffer flags though. I agree that doing it in the parser does not feel right, however currently everything is expecting to get a complete *frame* out of the parser :(
The second patch "Fix some comments" is independent of the issue here. :)
I don't think libav is very happy if you feed it separate fields in frame based multi threaded mode.
(the various comments have gone completely astray and are confusing to say the least) Is this whole bug about people using the libav decoders without a parser ? If so this bug will be closed.
No, it's also about h264parse and others providing one field per buffer instead of one frame per buffer. But that's a h264parse bug
I'm using a parser and have the problem. It's mismatch between the Parser and the Decoder. IMHO gstavviddec's buffer lifetime handling could be improved. It also leaks when skipping frames.
What happen if the periodic pps/sps code insert stuff in the middle of two fields ? Does the spec allow this ?
This seems to belong more to bug #704214, but from vague (quite possibly incomplete) recollection, when it comes to sps/pps, spec is mainly concerned about: * these starting a new AU, but that is possibly already so for the other field coded picture * these having to be present when referred to, but as the inserted ones are repeats (with the same data) of previous ones, not necessarily a problem
Not sure what spec says about it, but I've seen streams with PPS in the second access unit.
Testing shows frame list is being cleaned up now due to fixes for bug #693772, so I would say this is fixed as well. h264(parse) framing might be improved, but being tracked in bug #704214.