Bug 656412 – vorbisdec: discard starting samples according to granpos

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 656412 - vorbisdec: discard starting samples according to granpos


Summary:	vorbisdec: discard starting samples according to granpos


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	unspecified
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-08-12 15:56 UTC by Vincent Penquerc'h
Modified:	2018-11-03 11:19 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
vorbisdec: discard starting samples according to granpos (1.78 KB, patch) 2011-08-12 15:56 UTC, Vincent Penquerc'h	none	Details \| Review
oggdemux: ensure we do not generate a negative timestamp (1.88 KB, patch) 2011-08-12 15:56 UTC, Vincent Penquerc'h	none	Details \| Review
vorbisdec: discard starting samples according to granpos (1.79 KB, patch) 2011-08-12 16:25 UTC, Vincent Penquerc'h	none	Details \| Review
audio: round better when calculating what to clip (1.86 KB, patch) 2011-08-13 10:34 UTC, Vincent Penquerc'h	none	Details \| Review
oggdemux: handle start cropping via granulepos using a segment offset (2.72 KB, patch) 2011-08-13 10:34 UTC, Vincent Penquerc'h	rejected	Details \| Review

Description Vincent Penquerc'h 2011-08-12 15:56:26 UTC

Vorbis allows sample granularity cutting by setting the first
granulepos to move the unwanted samples before timestamp 0.

Comment 1 Vincent Penquerc'h 2011-08-12 15:56:31 UTC

Created attachment 193707 [details] [review]
vorbisdec: discard starting samples according to granpos

Comment 2 Vincent Penquerc'h 2011-08-12 15:56:54 UTC

Created attachment 193708 [details] [review]
oggdemux: ensure we do not generate a negative timestamp

This might happen if the first packet generates more samples
than accounted for by the first granulepos. This is valid for
Vorbis, as this indicates starting samples are to be discarded.

Comment 3 Vincent Penquerc'h 2011-08-12 16:01:49 UTC

From the Vorbis spec, http://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-126000A:

The granule (PCM) position of the first page need not indicate that the stream started at position zero. Although the granule position belongs to the last completed packet on the page and a valid granule position must be positive, by inference it may indicate that the PCM position of the beginning of audio is positive or negative.

    * A positive starting value simply indicates that this stream begins at some positive time offset, potentially within a larger program. This is a common case when connecting to the middle of broadcast stream.
    * A negative value indicates that output samples preceeding time zero should be discarded during decoding; this technique is used to allow sample-granularity editing of the stream start time of already-encoded Vorbis streams. The number of samples to be discarded must not exceed the overlap-add span of the first two audio packets.

Comment 4 Vincent Penquerc'h 2011-08-12 16:25:24 UTC

Created attachment 193711 [details] [review]
vorbisdec: discard starting samples according to granpos

Guard against unknown duration - just in case

Comment 5 Mark Nauwelaerts 2011-08-12 17:14:03 UTC

I am a bit concerned here that such custom hacks will not fare well when moving to base audio decoder.

Also, if possible, clipping should be expressed by segments, so to take care of both of these it may have to be oggdemux that will have to budge and arrange for proper timestamping and segment wizardry to arrange the intended effect.

Comment 6 Tim-Philipp Müller 2011-08-12 17:23:40 UTC

The problem is that GstClockTime is unsigned I guess, so it's impossible to express negative timestamps. Have to hack around that somehow, no? (Either by allowing for small negative ts, but then either ignoring or abusing the case of -1=none, or by doing this at the granulepos level)

Comment 7 Mark Nauwelaerts 2011-08-12 17:36:09 UTC

If oggdemux finds that timestamps would have to start at -D, then it could have them start at 0 (*) and send a segment (start = delta, stop = old_stop + delta ?, 0) which should then have the proper clipping happening downstream while running time (for a/v sync) and position reporting should also come out ok.

(*) evidently by timestamp = timestamp_as_now + delta

Comment 8 Vincent Penquerc'h 2011-08-12 18:11:47 UTC

Why would position reporting work ? I see how the first samples would be cut off, but the timestamp of the first sample that goes through would be offset by that same amount of discarded samples, no ?

It's a Vorbis thing BTW, not an Ogg thing. I guess other stream types would be invalid if they exhibit such a timing oddity, unless they special case it as well.

Comment 9 Mark Nauwelaerts 2011-08-12 18:36:00 UTC

Position reporting works because that's the meaning of the position field in a segment; the start timestamp is set to have the position as given by the position field.  So, in this case, timestamp delta (the first sample to go through) would have position 0 (see basesink's position reporting) (and also running time 0).

Btw, also wondering if there is already some segment/timestamp manipulation in place in oggdemux to handle (accidentally or intentionally) a granulepos that is (large) positive (if not, one would end with a large delay at start if things are otherwise still assuming 0 based).

Comment 10 Vincent Penquerc'h 2011-08-12 19:41:08 UTC

I see.
I'll try that segment idea soon. I'm not too sure how it'll work out seeing that Vorbis has this in the spec but other formats probably don't.

oggdemux does have the ability to detect a non zero start granulepos for streams. This *should* work, but then oggdemux is a tough beast to tame.

Comment 11 Vincent Penquerc'h 2011-08-13 10:34:34 UTC

Created attachment 193751 [details] [review]
audio: round better when calculating what to clip

This makes the round trip between samples -> time -> samples
for start crop offst exact between oggdemux and vorbisdec,
for one example.

Comment 12 Vincent Penquerc'h 2011-08-13 10:34:43 UTC

Created attachment 193752 [details] [review]
oggdemux: handle start cropping via granulepos using a segment offset

Vorbis can crop samples at the start of a stream by setting the granpos
of the first page to bring those samples into negative timestamps.
Update segment start to that crop offset so that segment clipping will
discard those samples.

Comment 13 Vincent Penquerc'h 2011-08-13 10:37:54 UTC

This is a version using segments as suggested.
The clipping code was clipping 103 samples instead of 104 for my test case, fixed by the change to audio.c, also included as a separate patch.
It seems to work, but I always have a bad feeling when dealing with segments, so...
The segment change is made for all non sparse streams.

Comment 14 Sebastian Dröge (slomo) 2011-11-24 09:37:42 UTC

Any news on this?

Comment 15 Vincent Penquerc'h 2011-11-24 15:09:03 UTC

I'd like to get an OK from mnauw, since it was his suggestion as to how to fix this best, just to make sure I got his idea right.

Comment 16 Mark Nauwelaerts 2011-11-25 13:34:42 UTC

There are few concerns regarding latest patch (in increasing order):
* segment manipulation is sent on all streams, while it is meant particularly for only 1 (usually) (which is not to say one should not keep other streams in mind regarding running_time, a/v sync etc)
* the segment is created rather 'casually', especially the position part which seems not to keep in mind subtleties involved in oggdemux (e.g. chain stuff, compare for instance to new segment in handle_page involving chain->begin_time)
* afaics, manipulation/tricks are pretty local, i.e. confined to 1 (first?) packet.  That is, it seems all outgoing data has same time as before, except for the first buffer, which now has 0 (in stead of otherwise a 'negative one'), and a custom segment is sent to chop of the first part of that.  However, if one sticks to details, there will be a discontinuity with subsequent outgoing data (which appears not shifted in any way).

However, if all of the above is "good enough" for the intended here ...

Comment 17 Vincent Penquerc'h 2011-11-28 16:42:44 UTC

The first two points are right, but I think the third point is wrong, though given segments (especially in oggdemux) give me a hard time, I can't say I'm certain. I'd much rather do this in vorbisdec to be honest (and I can check if that's feasible in the new baseaudiodecoder vorbisdec, since it's been ported now AFAIK). Doing this in vorbisdec would avoid having to have yet another layer of stuff piled onto the segments usage here, which is already scary :/

Comment 18 Mark Nauwelaerts 2011-11-28 22:59:10 UTC

FWIW, fairly certain about 3rd point, since only out_timestamp is essentially affected (in addition to some segment stuff), and out_timestamp only affects the current packet, and since that one's timestamp is forcibly clamped/shifted to 0, it would create a ts discont with the subsequent packet [*].  segment data can only "affect by shift" and not modify a discont somewhere in between.

That is, the above is all assuming no trick stuff happens in a subsequent decoder, and there should be no(t much) there.  At least, performing some ts shifting is beyond/besides GstAudioDecoder's purview (or any decoder for that matter) though it seems the semantics here should not require shifting anyway.

So, trying to do this without segment stuff brings us back to Comment #6, with some ts tricks/hacks, which can probably be thought of and might be special-case handled in a vorbisdec pre_push method or so.


[*] well, probably not a ts discont considering ts and duration set on buffers with encoded data, but when decoded into samples the subsequent "real durations" lead to discont

Comment 19 Vincent Penquerc'h 2011-12-02 15:07:52 UTC

> So, trying to do this without segment stuff brings us back to Comment #6, with
> some ts tricks/hacks, which can probably be thought of and might be
> special-case handled in a vorbisdec pre_push method or so.

Why ?
As an idealized example, say you have a Vorbis stream, where the third page (which is the first data page, the first two being headers) consists of a single Vorbis packet. Let this stream be 48000 Hz, and contain 4800 samples. This packet will then be 0.1 seconds long, and that first page has a granpos 4800 if all samples have to be played. This will translate to a timestamp of 0.1 seconds.

Now, if there's a sample perfect cut to remove the first 0.05 seconds (half that first packet), then the granpos of that first page will be set to 2400. This will transate to a timestamp of 0.05, such that the notional ts of the first half of the sampes will be negative.

If the vorbis decoder does not output these sampes, but only the 2400 from the second half of the packet, everything wil match, since those "lost" sampes only have a notional ts. No negative ts to represent. And since the Vorbis stream is that way specificaly to cut off those first samples, it does make sense to me to not output them.

AFAICT, this is a good way to solve this, and certainly less scary than to deal with segments.

Same for Opus by the way - some samples at start are here only to prime the predictor, and should not be output.

Comment 20 Vincent Penquerc'h 2011-12-02 15:13:24 UTC

Oh, yes, oggmux does generate some of those notional ones.
Well, they ought not *leave* oggmux anyway, so should not be much troube.

Comment 21 Mark Nauwelaerts 2011-12-02 15:51:03 UTC

(In reply to comment #19)
> > So, trying to do this without segment stuff brings us back to Comment #6, with
> > some ts tricks/hacks, which can probably be thought of and might be
> > special-case handled in a vorbisdec pre_push method or so.
> 
> Why ?
> As an idealized example, say you have a Vorbis stream, where the third page
> (which is the first data page, the first two being headers) consists of a
> single Vorbis packet. Let this stream be 48000 Hz, and contain 4800 samples.
> This packet will then be 0.1 seconds long, and that first page has a granpos
> 4800 if all samples have to be played. This will translate to a timestamp of
> 0.1 seconds.

Seems to be some confusion going here w.r.t "timestamp" and "notional timestamp" whereas there is only 1 real ts and that's the one put on a buffer.
And in this case, this packet/buffer has ts 0.

> 
> Now, if there's a sample perfect cut to remove the first 0.05 seconds (half
> that first packet), then the granpos of that first page will be set to 2400.
> This will transate to a timestamp of 0.05, such that the notional ts of the
> first half of the sampes will be negative.

So, the only real one would have to be -0.05, and I consider clamping this one to 0 a (minor) ts trick/hack (to start with).  Also, how does vorbisdec detect it should drop these samples (preferably without introducing other problems or having it work only with oggdemux) ?

Comment 22 Mark Nauwelaerts 2011-12-03 11:21:24 UTC

Btw, there seems to be (quite) a difference with the Opus case.

In the Opus case, the samples are really meant/used as "setup data" (to prime the predictor) and afaik the decoder is (also) made well aware of this as the info is contained in the header metadata packets (and perhaps so also used by oggdemux to arrange for correspondingly proper ts).

In this case, however (to quote Comment #3), "this technique is used to allow
sample-granularity editing of the stream start time of already-encoded Vorbis
streams" which makes it very much like a simple version of qtdemux edit lists.  In particular, it is a container directed/run setup and afaik the decoder has no "intrinsic" information about it (as in: header metadata).

So, in that sense it only leaves ts tricks/hacks, like negative ts (or whatever, though doubtful if any other whatever will handle it in any robust way), if one tries to avoid the (qtdemux) segment approach.

Comment 23 Tim-Philipp Müller 2012-10-16 20:42:35 UTC

Out of curiosity: is this actually used in practice ?

Comment 24 Vincent Penquerc'h 2012-10-16 20:45:52 UTC

It is generated by the vorbiscut tool, I think. I don't really know if it's widely used, but on the other hand, the eternal problem is that if players don't grok it noone will use it, and if noone will use it players won't grok it :)

Comment 25 George Chriss 2013-10-15 02:37:49 UTC

CC'ing Bug #709965:
 vorbisdec: Does not put timestamps on first buffer sometimes

Comment 26 Sebastian Dröge (slomo) 2013-10-15 07:35:38 UTC

Vincent, can you check if this is indeed the same as the cause for bug #709965?

Also how should we move forward with this bug here? :) According to the comments above, the oggdemux patch is obsolete now.

Comment 27 Sebastian Dröge (slomo) 2013-10-15 07:38:02 UTC

*** Bug 709965 has been marked as a duplicate of this bug. ***

Comment 28 Sebastian Dröge (slomo) 2013-10-15 07:38:38 UTC

Yes it is the same problem, the other file is cut too and behaves exactly like described here.

Comment 29 GStreamer system administrator 2018-11-03 11:19:19 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/51.