Bug 677535 – isomp4/qtdemux: Add support for MPEG DASH tfdt box

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 677535 - isomp4/qtdemux: Add support for MPEG DASH tfdt box


Summary:	isomp4/qtdemux: Add support for MPEG DASH tfdt box


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	0.10.31
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	0.10.31
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-06-06 08:03 UTC by David Corvoysier
Modified:	2012-08-28 14:35 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
qtdemux DASH tfdt support (5.31 KB, application/octet-stream) 2012-06-06 08:03 UTC, David Corvoysier		Details
Properly formatted patch (with also a fix on pending segment start time) (8.01 KB, patch) 2012-06-22 14:45 UTC, David Corvoysier	none	Details \| Review
Updated-patch-without-irrelevant-modifications (7.17 KB, patch) 2012-06-25 08:23 UTC, David Corvoysier	none	Details \| Review
isomp4: Add DASH tfdt box support (5.77 KB, patch) 2012-08-14 09:58 UTC, David Corvoysier	committed	Details \| Review

Description David Corvoysier 2012-06-06 08:03:59 UTC

Created attachment 215710 [details]
qtdemux DASH tfdt support

MPEG DASH has defined a set of new boxes to specify duration, indexes and offsets of ISOBMFF fragments.

The Track Fragment Base Media Decode Time (tfdt) Box can in particular be included inside a traf box to specify the absolute decode time, measured on the media timeline, of the first sample in decode order in the track fragment. 

This information can be used by the qtdemux to find out the current position of an MP4 fragment in the timeline and assign the correct timestamp to the output buffers (the current code always assumes that the offset is zero).

This patch adds code for parsing and dumping tfdt boxes, and modifies qtdemux fragmented MP4 to add the tfdt decode time to the output buffers timestamps.

Comment 1 David Corvoysier 2012-06-06 08:11:49 UTC

The new MP4 boxes are described in a amendment of ISO/IEC 14496-12:2008 that
can be found there:

www.3gpp.org/ftp/Inbox/LSs_from_external_bodies/ISO_IEC_JTC1_SG29_WG11/29n12310.zip

Comment 2 David Corvoysier 2012-06-22 14:45:06 UTC

Created attachment 217030 [details] [review]
Properly formatted patch (with also a fix on pending segment start time)

Here is a properly formatted patch addressing the same issue.
I have also added a fix to the start time of the segment sent when the fragment starts playing.

Comment 3 David Corvoysier 2012-06-22 14:50:11 UTC

Sorry, two other modifications have crept in to allow event and queries to be forwarded upstream if the qtdemux cannot react/answer.

Comment 4 David Corvoysier 2012-06-25 08:23:40 UTC

Created attachment 217170 [details] [review]
Updated-patch-without-irrelevant-modifications

Here is the updated patch (without the two irrelevant changes)

Comment 5 Mark Nauwelaerts 2012-07-04 12:50:39 UTC

(In reply to comment #4)
> Created an attachment (id=217170) [details] [review]
> Updated-patch-without-irrelevant-modifications
> 
> Here is the updated patch (without the two irrelevant changes)

Looks like the decode time is only used at the very start of the stream (which could make sense since it should be sum of the previous durations according to spec otherwise), though one would expect the initial one to be 0 anyway unless the file in question is a partial one of some sort.

However, there is also a // style comment, and more importantly, the lower level parsing code should not be fiddling with segments, which are quite intricate in qtdemux (in view of possible edit lists etc).  And this segment is one only used in push mode, and is a global one that is being adjusted on stream specific values.  Also, adjusting the segment that way basically mutes the 'decode time' effect altogether; generally it makes no real difference to send buffers with ts starting at 0 (and a segment as well) or sending buffers with ts starting at decode_time (and a segment as well).

All in all, makes it feel this patch has some very specific (push-based) setting/context/use-case in mind above and beyond merely support for tfdt in qtdemux.  If so, then e.g. adjusting the push-based segment elsewhere (e.g. in chain) based on the first buffer's timestamp or so might be a better approach.

Comment 6 David Corvoysier 2012-07-05 10:00:14 UTC

(In reply to comment #5)
> 
> However, there is also a // style comment, and more importantly, the lower
> level parsing code should not be fiddling with segments, which are quite
> intricate in qtdemux (in view of possible edit lists etc).  And this segment is
> one only used in push mode, and is a global one that is being adjusted on
> stream specific values.  Also, adjusting the segment that way basically mutes
> the 'decode time' effect altogether; generally it makes no real difference to
> send buffers with ts starting at 0 (and a segment as well) or sending buffers
> with ts starting at decode_time (and a segment as well).

The current trend (see HLSdemux) for dealing with fragmented (adaptive) content is to create a new decoding chain every time a new bitrate is selected. For DASH, this means that every time a new representation is selected, a new isomp4 demux is instantiated, with a stream composed of a moov box provided by the DASH initialization segment, and followed by several consecutive moof boxes starting at a specific offset (typically _not_ zero).

Without parsing the tfdt box, the qtdemux will indeed start ts and segment as zero, and the playback will not be blocked. However, and it is maybe a bug, the downstream chain will lose the knowledge of the current playback position.     

> 
> All in all, makes it feel this patch has some very specific (push-based)
> setting/context/use-case in mind above and beyond merely support for tfdt in
> qtdemux.  If so, then e.g. adjusting the push-based segment elsewhere (e.g. in
> chain) based on the first buffer's timestamp or so might be a better approach.

Would you mean fixing the segment right after the transition to QTDEMUX_STATE_MOVIE and before actually sending it ?

Comment 7 Mark Nauwelaerts 2012-07-23 14:17:32 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > 
> > However, there is also a // style comment, and more importantly, the lower
> > level parsing code should not be fiddling with segments, which are quite
> > intricate in qtdemux (in view of possible edit lists etc).  And this segment is
> > one only used in push mode, and is a global one that is being adjusted on
> > stream specific values.  Also, adjusting the segment that way basically mutes
> > the 'decode time' effect altogether; generally it makes no real difference to
> > send buffers with ts starting at 0 (and a segment as well) or sending buffers
> > with ts starting at decode_time (and a segment as well).
> 
> The current trend (see HLSdemux) for dealing with fragmented (adaptive) content
> is to create a new decoding chain every time a new bitrate is selected. For
> DASH, this means that every time a new representation is selected, a new isomp4
> demux is instantiated, with a stream composed of a moov box provided by the
> DASH initialization segment, and followed by several consecutive moof boxes
> starting at a specific offset (typically _not_ zero).
> 
> Without parsing the tfdt box, the qtdemux will indeed start ts and segment as
> zero, and the playback will not be blocked. However, and it is maybe a bug, the
> downstream chain will lose the knowledge of the current playback position.     

Note that position tracking is done by the 'time/position' member in a segment (event), so rather than a segment (S, NONE, S) and buffer times starting at S, it is also possible (and maybe easier/cleaner) to use (0, NONE, S) and buffer times starting at 0.

> 
> > 
> > All in all, makes it feel this patch has some very specific (push-based)
> > setting/context/use-case in mind above and beyond merely support for tfdt in
> > qtdemux.  If so, then e.g. adjusting the push-based segment elsewhere (e.g. in
> > chain) based on the first buffer's timestamp or so might be a better approach.
> 
> Would you mean fixing the segment right after the transition to
> QTDEMUX_STATE_MOVIE and before actually sending it ?

(indeed) Somewhere around when _chain has the buffer timestamp at hand (and if it is the one of the first buffer needing a newsegment before sending the buffer).

However, in combination with above remark it should suffice to have atom parsing dump the tfdt decode/position in some state variable, which is then used as the position parameter for the (initial) newsegment event (created in the _STATE_HEADER handling).

Comment 8 David Corvoysier 2012-08-14 09:58:20 UTC

Created attachment 221121 [details] [review]
isomp4: Add DASH tfdt box support

Comment 9 David Corvoysier 2012-08-14 09:59:05 UTC

> Note that position tracking is done by the 'time/position' member in a segment
> (event), so rather than a segment (S, NONE, S) and buffer times starting at S,
> it is also possible (and maybe easier/cleaner) to use (0, NONE, S) and buffer
> times starting at 0.

...

> However, in combination with above remark it should suffice to have atom
> parsing dump the tfdt decode/position in some state variable, which is then
> used as the position parameter for the (initial) newsegment event (created in
> the _STATE_HEADER handling).

Thanks for pointing that out: this is makes indeed more sense adjusting only playback position (and it is alos simpler). I have still however a few questions about how it could be implemented.

First, where would it make more sense to store the parsed decode/position ? In the qtdemux object as a new member ?

Second, the new segment event is currently created when the moov has been parsed, but the tfdt is retrieved from the moof, leaving two options:
A - Move event creation code so that it executes after the moof has been parsed
B - Leave event creation code as is and Adjust time/position when the moov has been parsed.
Which option would you suggest ?

I have attached a temporary updated patch where I implement option B, but still leaving the event adjustment in the low-lvel parsing code (which I agree is bad design).

Comment 10 Mark Nauwelaerts 2012-08-28 14:34:54 UTC

Not particularly overjoyed with having the segment stuff in the lower levels, but I guess it will do rather than messing around with such details ...

commit 3dd40e3127ef0fabe062c5dfbdb7012f4967a82b
Author: David Corvoysier <david.corvoysier@orange.com>
Date:   Tue Jul 3 17:50:24 2012 +0200

    isomp4: add DASH tfdt box support
    
    MPEG DASH has defined a set of new boxes to specify duration, indexes and
    offsets of ISOBMFF fragments.
    
    The Track Fragment Base Media Decode Time (tfdt) Box can in particular be
    included inside a traf box to specify the absolute decode time, measured on the
    media timeline, of the first sample in decode order in the track fragment.
    
    This information can be used by the isomp4 demux to find out the current position of
    an MP4 fragment in the timeline.
    
    This patch adds code to isomp4 to:
    - parse the tfdt box
    - adjust the time/position member of the new segment sent when playback starts
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=677535