GNOME Bugzilla – Bug 646327
h264parse: Drop data before receiving the first PPS/SPS
Last modified: 2016-05-28 19:06:09 UTC
Without this property, it means muxers that rely on codec-data being in caps to write initial header will be broken on input from a byte-stream -> avc h264parse.
Created attachment 184774 [details] [review] patch to add property
Wouldn't it be better to keep the now dropped buffers around and push them before everything else when the codec-data is known... as discussed on IRC yesterday this would be best implemented in baseparse.
Why a property? It sounds like this should be default behaviour, and it should do the right thing based on downstream caps (ie. stream-format=xyz etc.)
One could keep those buffers and then send them (even as baseparse functionality), but they are probably only useful if they happen to start with an IDR-frame, in which case there is also a good chance codec-data is also around from the start (depending on alignment). That is not to say some GST_BASE_PARSE_FLOW_DELAY (and corresponding mechanics) are useful. So, assuming a typical case with codec-data and IDR around from the start, downstream (muxer) caps should already arrange proper behaviour (since arranging for alignment=au implies that SPS/PPS are aggregated into the first unit and therefore full codec-data is in caps by the first output buffer). In remaining more exotic cases, one "should know more what one is doing", though it might not hurt to have this enabled by default. However, having it as a property is useful so h264parse can at least still be used for debugging purposes to see what is happening with NAL units (rather than hard-coded discarding some).
So I know very little about h264, reason I didn't do it the queuing way is that I wasn't sure they buffers before we get nal units with pps/sps necessarily will have the same sps/pps as the buffers after so it is safer to drop them...But as I said, my h264 knowledge is low/non-existent. Reason for a property is Mark said on irc that the intention was to send out buffers asap so I made a property that would drop buffers until sps/pps was gotten and codec-data could be constructed with a default of not doing any dropping.
Does it make sense to send out buffers asap even if you don't have the codec_data yet? What can be done with them downstream?
One use case I can think of is if you are transmuxing to MPEG-TS, and hit a glitch that causes the parser (or anything, really) to reset. If after the reset, you start sending buffers immediately instead of waiting for a sync point (headers, plus IDR. usually happens simultaneously), then a downstream listener to the MPEG-TS stream may be able to glitch less. A little far-fetched, i suppose. My general rule is to not coddle decoders, since they absolutely *must* handle garbage in the stream. Better to test and fix rather than work around. IMO, by default, drop all buffers before you get headers and IDR. If you have an application that can do something useful with pre-header buffers, maybe then add a property.
It looks like the original issue with the codec_data in the caps is fixed now. Not sure about the skip-stuff-until-we-have-PPS/SPS-and/or-IDR
How should we proceed here? I agree with what David said in comment 7. There's also bug #699302 which has a patch to just skip all NALs until SPS and PPS are received.
I think that's what we should do. It's the behaviour that's most compatible with the variety of different use cases. For specialist use cases people can disable it via the property then, if we add one (or we just wait until someone requests one). That's what we are doing for mpegvideoparse too now.
Now we just need to decide which patch looks better :) I would prefer to not add this property until someone actually has a valid use case for that.
By default I wouldn't drop buffers at all. H264 streams using intra refresh instead of regular IDR's does not have IDR's only one at the beginning. I have also see some HLS streams where, contrary to what the spec recommends, fragments starts with an IDR instead of SPS/PPS+IDR and the SPS/PPS is sent later.
How are these frames before SPS/PPS usable then?
There are two parts to this: - not outputting any data before we are able to output proper (complete) caps - whether to drop all data before then, or send it delayed, or send it immediately What we're trying to do here is get some really really basic stuff right that doesn't work properly at all at the moment, like decoding and transmuxing from MPEG-TS/PS or RTP and such. For decoding on the desktop with libav it's less problematic if our caps are incomplete at first, but that doesn't work well in many other scenarios and is also not really right. Andoni: am I correct to assume that even for intra-refresh streams you still get regular SPS/PPS? If so, how regular? If I'm not mistaken we'd just start outputting things a little later, which doesn't seem unreasonable, esp. with intra-refresh. I think it's reasonable to e.g. wait a full refresh cycle before starting to output images - so if we can detect that (can we?) then we could just cache the data and send it delayed. .
(In reply to comment #13) > How are these frames before SPS/PPS usable then? Usually SPS/PPS won't change within the same stream so the first GOP without SPS/PPS can be decoded once you receive them.
(In reply to comment #14) > Andoni: am I correct to assume that even for intra-refresh streams you still > get regular SPS/PPS? If so, how regular? If I'm not mistaken we'd just start > outputting things a little later, which doesn't seem unreasonable, esp. with > intra-refresh. I think it's reasonable to e.g. wait a full refresh cycle before > starting to output images - so if we can detect that (can we?) then we could > just cache the data and send it delayed. . Correct, SPS/PPS will be inserted in-band at the same frequency. The frequency can vary a lot but it's usually in the order of 1 each 10 frames for live streams. Delaying outputting data could for instance fix the case in which you get a complete GOP and the PPS/SPS later instead of having to drop it. But dropping data should be an option in all cases.
Ok, so we're talking more about finetuning here. We can make dropping optional, but it should be the default IMHO. Maybe waiting/dropping should be handled separately.
*** Bug 699302 has been marked as a duplicate of this bug. ***
I prefer the patch from bug #699302 and think we should apply it, but maybe move the if have_foo checks to the beginning of the if() and niceify the GST_WARNING debug string (should it be a warning? It's expected really..) Reasons: (a) KISS, (b) it's what we do in mpegvideoparse. If there's a use case / need to not drop this data, that can be added later. But if you have byte-stream as input, dropping some data is par for the course imho.
Pushed the patch from bug #699302 as it's much simpler and also the same as done in other parsers. commit ef0e051e7d20bcc55a523115510915f5ae7a4b65 Author: Ilya Smelykh <ilya.smelykh@gmail.com> Date: Fri Jun 7 12:10:08 2013 +0200 h264parse: Wait until SPS/PPS before outputting any data https://bugzilla.gnome.org/show_bug.cgi?id=646327 Let's keep this open to fine-tune or just close?
<slomo> __tim: would you agree that this bug can be closed and if anybody wants a property to select this and has a real use case for it can open a new bug? <__tim> yes, please !
*** Bug 704156 has been marked as a duplicate of this bug. ***
Matej, is this still happening for you with latest git master?
Seems to be fixed for me, except this minor thing: 22:32:11 WARN gst.h264parse gsth264parse.c:898 - - - - - - - - no SPS/PPS yet, nal Type: 9, Size: 2 will be dropped I'm not sure if parser should complain about dropping AU delimiter.
Actually, this might not be such a good idea after all. I have stream that has slices that reference other slices from before the SPS. With the parser dropping data before first SPS, the stream starts to play, but after few seconds there are glitches. I think it would be better to buffer the data until we have SPS/PPS, but not throw it away.
Sample video https://s3.amazonaws.com/MatejK/Samples/1.001.ts I'm not sure if it does this with ffmpeg decoder, but with hardware decoder (apple devices) it decodes fine with the frames before first SPS present, but if the data is dropped, there are artifacts at 02s. Sometimes the decoder won't even recover and will just hang.
Matej, Looks like there is a permission restriction, I have a "Permission Denied" xml response.
Sorry, should be fixed.
Created attachment 254498 [details] [review] Patch to queue data before SPS/PPS instead of dropping it Rather experimental, just for testing
Also perhaps there shoud be a cap on baseparse so that really broken streams won't OOM.
I think it would be useful to have a h264parse's param about whether it should drop or buffering.
Upon more testing this file is probably not the best to demostrate the problem. Hardware decoders still seems to have problem with it. So while I still think there could be a case where keeping the frames before SPS helps, I'm not sure this is it.
One of the problems why might be that there seem to be field encoded frames with H264 parser doesn't currently handle very well ( https://bugzilla.gnome.org/show_bug.cgi?id=704214 )
It might make sense to allow that I guess, but then most decoders are going to drop anything before the first keyframe anyway. And like Ilya said there should also be a limit of frames we queue until we error out if no SPS/PPS was received. Not sure if the newly introduced problems are going to be bigger than the problems that are solved by this.
In my experience most software decoders that I tried will happily start decoding before they get IDR frame, even if they output artifacts. However I wasn't able to decode stream without at least single IDR on iOS devices, not even with proper recovery point. Interesting thing is that it is enough to have just one IDR at the beginning of stream and it works (even seeking, etc).
I think we should close this again. Please open a new bug for any specific issues that still exist. For streams where SPS/PPS is in-stream and not signaled out of band, we can't really just assume that the sps/pps will be valid for the data before it (it might, it might not). If you have a case where a slice refers to data before the SPS/PPS then there's clearly not an IDR, maybe that means we have to choose our sync point more strictly. In any case, I don't see us doing anything here after 2.5 years without any more progress.