After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 700537 - oggdemux: Drops frames because it needs a keyframe after a seek
oggdemux: Drops frames because it needs a keyframe after a seek
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.0.6
Other Linux
: Normal normal
: 1.1.3
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks: 701391
 
 
Reported: 2013-05-17 15:38 UTC by Mathieu Duponchelle
Modified: 2013-07-15 08:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed patch to fix the issue (723 bytes, patch)
2013-06-24 22:36 UTC, Mathieu Duponchelle
rejected Details | Review
Proposed patch to fix the issue (2.84 KB, patch)
2013-07-14 00:05 UTC, Mathieu Duponchelle
committed Details | Review
Reversal of the faulty commit. (2.73 KB, patch)
2013-07-14 00:06 UTC, Mathieu Duponchelle
committed Details | Review

Description Mathieu Duponchelle 2013-05-17 15:38:15 UTC
I attached a file but it should do the same with every ogv.

Steps to reproduce :
launch the file with totem GST_DEBUG=3
play some seconds
seek at 1 second

Observed behaviour:
the image freezes, and the terminal outputs a lot of:
0:00:01.524087036  2502 0x7f9110002720 WARN               theoradec gsttheoradec.c:732:theora_handle_data_packet:<theoradec3> dropping frame because we need a keyframe

I investigated a little, and it appears that oggdemux after a seek gets to the nearest keyframe, and pushes its first buffer on a different pad than the next buffers, which happens to be an unlinked pad. That's a shame because we precisely need that first buffer.

I've gotten in the intricacies of why this pad is chosen, but it gave me a headache. 
Basically it seems that it chooses the pad based on a serial number, which is obtained in relation to the ogg_sync_state.

This bug is really critical for correct video editing with GES and ogv files.
Comment 1 Mathieu Duponchelle 2013-05-17 15:42:26 UTC
Couldn't upload the file (1676 kB > 1600 aha)
here is a link : http://antopics.fr/metro2.ogv
Comment 2 Sebastian Dröge (slomo) 2013-05-20 12:01:06 UTC
This seems to happen on any file, and together with this warning there also comes this g_critical() each time:

(gst-launch-1.0:29325): GStreamer-CRITICAL **: gst_segment_to_stream_time: assertion `segment->format == format' failed
Comment 3 Sebastian Dröge (slomo) 2013-05-20 12:02:33 UTC
And the program is not a pad that is not linked (you see this probably because the audio pad is not linked), but instead for some reason there seems to be no segment event after the seek and the first frame is not a keyframe.
Comment 4 Mathieu Duponchelle 2013-05-20 12:05:09 UTC
This comes from the base video decoder, and is not related to ogg / theora IMHO. I had reported a bug for this here : https://bugzilla.gnome.org/show_bug.cgi?id=699972 I have a quick stupid fix that removes it here, but the keyframe problem is still there.
Comment 5 Mathieu Duponchelle 2013-05-20 12:06:24 UTC
(In reply to comment #3)
> And the program is not a pad that is not linked (you see this probably because
> the audio pad is not linked), but instead for some reason there seems to be no
> segment event after the seek and the first frame is not a keyframe.

I have investigated the bug quite deeply and I can indeed be affirmative that the buffer containing the keyframe gets pushed on an unlinked pad.
Comment 6 Mathieu Duponchelle 2013-05-20 12:07:15 UTC
* At least in my rendering test suite in GES.
Comment 7 Sebastian Dröge (slomo) 2013-05-20 12:13:08 UTC
How do you know and on which pad? You can reproduce this without GES quite easily with this pipeline, get rid of all the additional layers that just introduce bugs ;)

gst-launch-1.0 filesrc location=/media/data/Samples/big-buck-bunny/big_buck_bunny_720p_stereo.ogg ! oggdemux name=demux ! queue max-size-bytes=10000000 ! theoradec ! navseek ! xvimagesink    demux. ! queue max-size-bytes=10000000 ! fakesink
Comment 8 Sebastian Dröge (slomo) 2013-06-18 11:14:25 UTC
Bug #702502 has a potential patch for the segment_clip warning.
Comment 9 Mathieu Duponchelle 2013-06-24 22:34:30 UTC
The segment clip warning is indeed gone, but it was unrelated, I had patched video decoder before observing that behaviour.

Upon further inspection, I discovered the sources of the bad behaviour :

First, that patch :

b41cd0428956f3ade9b428149e38be8e788556fe in -base makes a pad be skipped when looking for the earliest position (keyframe) at which to seek.

From what I can see, skeleton and cmml streams are ignored by the function now, so I'm not sure this patch is still necessary, my opinion would be to revert it.

Second, when I revert that patch here, keytarget gets the good value, but packets still get dropped for a while after the seek.

I think that the patch I'll attach next is correct, I'd like your opinion though.
Comment 10 Mathieu Duponchelle 2013-06-24 22:36:58 UTC
Created attachment 247689 [details] [review]
Proposed patch to fix the issue
Comment 11 Sebastian Dröge (slomo) 2013-06-30 16:30:23 UTC
I agree with the first part you mention, it seems that this commit can be reverted nowadays... for exactly that reasoning. And I don't think it was correct at all ;)

For the second part, your new patch, I think that will cause A/V sync issues. So when doing a seek we will jump to the earliest position that contains the seek position for all streams. This means that all but the earliest stream will have non-keyframe packets for some time, that should be dropped. Which happens because current_granule==-1 after a seek until the packet actually contains a new granule position (which will be the case for a keyframe).

Is there any problem with the packets that are dropped after a seek without your patch? Is it enough to just revert b41cd0428956f3ade9b428149e38be8e788556fe and then everything is fine for this bug here?
Comment 12 Sebastian Dröge (slomo) 2013-07-04 09:16:17 UTC
Mathieu?
Comment 13 Mathieu Duponchelle 2013-07-04 19:49:48 UTC
(In reply to comment #12)
> Mathieu?

Sorry :)

(In reply to comment #11)
> I agree with the first part you mention, it seems that this commit can be
> reverted nowadays... for exactly that reasoning. And I don't think it was
> correct at all ;)
> 

cool.

> For the second part, your new patch, I think that will cause A/V sync issues.
> So when doing a seek we will jump to the earliest position that contains the
> seek position for all streams. This means that all but the earliest stream will
> have non-keyframe packets for some time, that should be dropped. Which happens
> because current_granule==-1 after a seek until the packet actually contains a
> new granule position (which will be the case for a keyframe).
> 
> Is there any problem with the packets that are dropped after a seek without
> your patch? Is it enough to just revert
> b41cd0428956f3ade9b428149e38be8e788556fe and then everything is fine for this
> bug here?

Unfortunately not. Without that patch, we still get the issue, also I don't think that would cause stream sync issues afaiu, I can not prove it but from testing I can say I didn't experience unsyncing issues, manual testing is obviously not enough but it's a good hint nevertheless.
Comment 14 Sebastian Dröge (slomo) 2013-07-05 07:42:07 UTC
(In reply to comment #13)

> > For the second part, your new patch, I think that will cause A/V sync issues.
> > So when doing a seek we will jump to the earliest position that contains the
> > seek position for all streams. This means that all but the earliest stream will
> > have non-keyframe packets for some time, that should be dropped. Which happens
> > because current_granule==-1 after a seek until the packet actually contains a
> > new granule position (which will be the case for a keyframe).
> > 
> > Is there any problem with the packets that are dropped after a seek without
> > your patch? Is it enough to just revert
> > b41cd0428956f3ade9b428149e38be8e788556fe and then everything is fine for this
> > bug here?
> 
> Unfortunately not. Without that patch, we still get the issue, also I don't
> think that would cause stream sync issues afaiu, I can not prove it but from
> testing I can say I didn't experience unsyncing issues, manual testing is
> obviously not enough but it's a good hint nevertheless.

Can you try to understand what exactly is the remaining problem without your patch after reverting b41cd0428956f3ade9b428149e38be8e788556fe?


The problem I see with your patch is that it makes oggdemux assume that it currently is at exactly that keyframe granule, and then oggdemux does interpolations of the granulepos based on that. However oggdemux will be *before* this keyframe granule for all stream except for the one that has the earliest keyframe we seeked to.
So for all these streams we would output invalid timestamps until the next keyframe is read.

(Correct me if I'm wrong, you probably know that code better than me now :) )
Comment 15 Mathieu Duponchelle 2013-07-14 00:04:37 UTC
OK so after an in-depth investigation, I've come up with both a headache and the real fix, sorry for proposing this patch but it seemed good to me in the first place, only problem was that it would have considered a little too many packets, defeating the purpose of the check for a valid granuletime in _chain_peer. It couldn't cause stream synchronisations though.

Now the real patch is slightly more complicated, but it fixes the root of the problem.

For now, the following scenario could and would often be taking place:

    a) Find the earliest time for stream X
    b) bisect and find a page which granuletime is indeed < target, but
       contains another stream.
    c) decide to seek at the wrong offset, sometimes inferior to
       the real one, in which case the error was undected or
    d) the offset was superior, and thus the actual target keyframe was
       not processed, and packets were skipped waiting
       for a granulepos.

The attached patch fixes that by introducing two arguments to do_binary_search:

A boolean to specify if we want the bisection to consider only one stream, and the actual (unique) serial number of the stream, ignored if only_serial_no is FALSE.

The second attached patch reverts the previously mentioned commit, and removes a variable that isn't needed anymore.
Comment 16 Mathieu Duponchelle 2013-07-14 00:05:29 UTC
Created attachment 249083 [details] [review]
Proposed patch to fix the issue
Comment 17 Mathieu Duponchelle 2013-07-14 00:06:12 UTC
Created attachment 249085 [details] [review]
Reversal of the faulty commit.
Comment 18 Mathieu Duponchelle 2013-07-14 00:09:18 UTC
>     b) bisect and find a page which granuletime is indeed < target, but
>        contains another stream.

more accurate :
b) bisect and find the good page, but with the offset for another stream here:
gst_ogg_demux_get_next_page (ogg, &og, end - ogg->offset, &result);
Comment 19 Sebastian Dröge (slomo) 2013-07-15 08:34:54 UTC
commit 905fe0f4ca3fe6760fb6b4a77bcc5c982605f120
Author: Mathieu Duponchelle <mathieu.duponchelle@epitech.eu>
Date:   Sun Jul 14 01:42:52 2013 +0200

    oggdemux: Make bisecting fully accurate
    
    When bisecting after an earliest time has been found, we need
    to only consider the stream for which the earliest time was found.
    
    Before, the following scenario could be and was encountered:
    
    a) Find the earliest time for stream X
    b) bisect and find a page which granuletime is indeed < target, but
       contains another stream.
    c) decide to seek at the wrong offset, sometimes inferior to
       the real one, in which case the error was undected or
    d) the offset was superior, and thus the actual target keyframe was
       not processed, and packets were skipped waiting
       for a granulepos.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=700537