After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 777984 - isomp4: initial ftyp/moov streamheader missing
isomp4: initial ftyp/moov streamheader missing
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal enhancement
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on: 668091
Blocks:
 
 
Reported: 2017-01-31 13:20 UTC by Andreas Frisch
Modified: 2018-11-03 15:16 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
WIP isomp4: experiment on comitting streamheaders correctly (3.84 KB, patch)
2017-02-09 09:49 UTC, Andreas Frisch
none Details | Review

Description Andreas Frisch 2017-01-31 13:20:12 UTC
i've tried to randomly enter a stream generated with mp4dashmux from
https://bugzilla.gnome.org/show_bug.cgi?id=668091 to play it in the browser using MSE. unfortunately, this failed, using a tcpserversink element and trying the various sync-methods. looking at the captured streams i discovered that the global headers are missing:

when it should start with:
0000 0000: 00 00 00 1C 66 74 79 70  69 73 6F 35 00 00 00 01  ....ftyp iso5....  
0000 0010: 69 73 6F 35 69 73 6F 32  64 61 73 68 00 00 03 45  iso5iso2 dash...E  
0000 0020: 6D 6F 6F 76 00 00 00 6C  6D 76 68 64 00 00 00 00  moov...l mvhd....  

the randomly entered streams only start with:
0000 0000: 00 00 00 14 73 74 79 70  6D 73 64 68 00 00 00 00  ....styp msdh....  
0000 0010: 6D 73 64 68 00 00 00 60  6D 6F 6F 66 00 00 00 10  msdh...` moof....  
0000 0020: 6D 66 68 64 00 00 00 00  00 00 00 6B 00 00 00 48  mfhd.... ...k...H  


qtmux does implement some streamheader handling, but it seems to be missing something for my use case. i'll investiage.
Comment 1 Andreas Frisch 2017-01-31 13:26:08 UTC
with a GST_DEBUG=*:5 of my test pipeline 
gst-launch-1.0 videotestsrc pattern=ball is-live=1 ! timeoverlay ! video/x-raw,framerate=30/1,width=1280,height=720 ! videoconvert ! x264enc speed-preset=superfast tune=zerolatency key-int-max=1 b-adapt=0 option-string=scenecut=0 ! video/x-h264,profile=main ! mp4dashmux streamable=true faststart=true fragment-duration=500 ! tcpserversink host=127.0.0.1 port=9001 sync-method=1

the only occurences of "streamheader" in the debug output are:
0:00:04.225773980 27975 0x55b2d08d6f20 DEBUG        multihandlesink gstmultihandlesink.c:1042:gst_multi_handle_sink_client_queue_buffer:<tcpserversink0> [socket 0x55b2d08f9bb0] no previous caps for this client, send streamheader
0:00:04.225785091 27975 0x55b2d08d6ca0 DEBUG               basesink gstbasesink.c:2005:gst_base_sink_get_sync_times:<tcpserversink0> got times start: 99:99:99.999999999, stop: 99:99:99.999999999, do_sync 0
0:00:04.225799713 27975 0x55b2d08d6f20 DEBUG        multihandlesink gstmultihandlesink.c:1103:gst_multi_handle_sink_client_queue_buffer:<tcpserversink0> [socket 0x55b2d08f9bb0] no new streamheader, so nothing to send

so apparently qtmux doesn't set them at all.
Comment 2 Andreas Frisch 2017-02-09 09:49:18 UTC
Created attachment 345297 [details] [review]
WIP isomp4: experiment on comitting streamheaders correctly

this is a first-shot attempt of writing all necessary stream headers.
now, the ftyp atom will be also prepended for newly connected tcp clients
hexdump looks already a lot better this way
resulting files are playable with vlc, but the browser doesn't start rendering yet (with MSE) so apparently something else might be missing.
it may have to do with codec data of the h264 stream since that may only get written once in the first fragment (just a quick guess from looking at the hexdump)
Comment 3 Andoni Morales 2017-02-09 10:52:15 UTC
I think that should be handled in this patch https://bugzilla.gnome.org/show_bug.cgi?id=706509
Comment 4 Andreas Frisch 2017-02-09 11:18:26 UTC
unfortunately not, i've already applied that patch (and it did actually fix playback from the very beginning of the stream, but not from random entry points)
Comment 5 Andoni Morales 2017-02-09 12:02:03 UTC
You can't stream the output of mp4dashmux to a client through tcpserversink, that won't be DASH, just a fragmented MP4 stream. For that you need dashsink, that takes several streams and create DASH content (the MPD and files/segments for each stream) that a web client can consume as DASH.
So the correct way would be to use dashsink to produce the DASH content and write it to the filesystem and use a regular HTTP server to serve the content exposed in the MPD to the client.
Comment 6 Andreas Frisch 2017-02-09 12:07:45 UTC
i want to achieve something similar to what dashsink does. but i don't want seperate fragment files or the manifest because it's not supposed to serve as VOD but instead i want to use this as a direct transport of a low latency live stream to one or a few web browser mse clients via websocket.
what else is dashsink doing to allow randomly entering at any fragment, does it prepend an initalization fragment with the necessary codec data and stuff or is every fragment playable by itself?
Comment 7 Andoni Morales 2017-02-09 14:09:50 UTC
You can stream streamable formats like matroska, flv or mpeg-ts with tcpserversink because:
  1) the format provide inband information about the frames boundaries
  2) tcpserversink starts new connections appending the streamehaders
  3) tcpserversink syncs on a keyframe for seeking.
And that's all you need in the client side to start playback with these formats.

With mp4, there is no inband information about frames boundaries, it's only available in the container headers, in the stream index. With fragmented mp4 the whole header is split into parts, the streams description at the beginning than for each segment a moof atom with the stream index for that range. So for an MP4 segment to be autocontained you need the streamheaders cointaining the moov atom (streams info and codec data) to initialize both the demuxer and the decoder, and for each segment, the moof atom whith the index for this particular segment. 

On your client implementation you will probably have to do the demuxing your self, parsing the moof fore each segment to split the input stream in frames to feed into the decoder using the MSE API.
Comment 8 Sebastian Dröge (slomo) 2017-02-09 14:28:47 UTC
With fragmented MP4 you can make it work with tcpserversink if:
a) only the fragment start is marked as keyframe (start of moof)
b) the streamheaders contain the general headers (moov)
c) tcpserversink always sends streamheaders to a new client and then starts from the last (or next) keyframe (i.e. fragment, start of moof)

AFAIU that's what Andreas is trying to do here. That's not too different from how DASH works, there you also have the moov separately (or at the very beginning of the first fragment) and then each segment is moof+mdat
Comment 9 Andreas Frisch 2017-02-09 14:35:43 UTC
right, that's what i want to do :) 
if tcpserversink knows the correct streamheaders to prepend for each clients' new stream entry, no additional parsing/demuxing on the client side is necessary because it won't look any different from the dash stream that starts from the beginning (and those already play correctly)

so now i just need to get these streamheaders right i guess
Comment 10 Andoni Morales 2017-02-09 15:03:32 UTC
If I am correct, DASH implementations in the client like [1] implement the demuxing their self, you can't feed the MSE API with streams, just with encoded frames. So fragmented MP4 playback as-is in a browser is probably not implemented, this is why I was suggesting demuxing and pushing frames to the MSE API.

From what Andreas showed in the first message, tcpserversink is already doing the correct thing, his message was:

the randomly entered streams only start with:
0000 0000: 00 00 00 14 73 74 79 70  6D 73 64 68 00 00 00 00  ....styp msdh....

This means for seeks, it's starting the push segments from the header. If you combine that with my patch that inserts the missing atoms in the muxer streamheaders you should have everything you need to do playback in the client side.

[1] https://github.com/Dash-Industry-Forum/dash.js/wiki
Comment 11 Andreas Frisch 2017-02-09 15:14:15 UTC
Andoni, on the client side, i'm actually using a really simple javascript that does nothing but request the payload from a websocket, then does addSourceBuffer('video/mp4; codecs="avc3.4d401e, mp4a.40.2"'); and then for each chunk of data received do sourceBuffer.appendBuffer without touching the payload at all.
however, this requires the stream to begin with proper headers like a pipeline of gst-launch videotestsrc pattern=ball is-live=1 ! timeoverlay ! video/x-raw,framerate=30/1,width=1280,height=720 ! videoconvert ! x264enc speed-preset=superfast tune=zerolatency key-int-max=1 b-adapt=0 option-string=scenecut=0 ! video/x-h264,profile=main ! mp4dashmux streamable=true faststart=true fragment-duration=500 ! filesink
(on current master) generates it. so the browser eats an mp4 stream and demuxes it itself, but it's very picky about the format. chrome seems to be a bit pickier than firefox
Comment 12 Andoni Morales 2017-02-10 08:52:21 UTC
Understood now, from you firt comment I though that mp4dashmux was already syncing correctly on moof atoms. Like Sebastian already commented, you should probably need to mark the beggining of fragments as key units or add a new sync method in tcpserversink to sync in GstForceKeyUnit, since these in-band events are forwarded from the muxer downstream.

thiago ported most of the commits in that branch: https://cgit.freedesktop.org/~thiagoss/gst-plugins-good/log/?h=dashsink
Looking at it I see there is another patch needed to insert tfdt in moof atoms, make sure you have it too.

Most of the work might have already been merged upstream... who knows all my work is more than 5 years old, at that time no body cared and know I don't care anymore :)
Comment 13 Andreas Frisch 2017-02-10 13:32:51 UTC
i've rebased the whole branch of thiago to current git master, which includes e.g. that tfdt into moof patch and a few other things
i've observed that with my test pipeline of 

gst-launch-1.0 videotestsrc pattern=ball is-live=1 ! timeoverlay ! video/x-raw,framerate=30/1,width=1280,height=720 ! videoconvert ! x264enc speed-preset=superfast tune=zerolatency key-int-max=1 b-adapt=0 option-string=scenecut=0 ! video/x-h264,profile=main ! mp4dashmux streamable=true faststart=true fragment-method=event ! fakesink silent=false -v

there are no buffers going into the sink. with method=time, dataflow is constant
shouldn't the keyframes issued by the encoder every second trigger new fragments?
Comment 14 Andoni Morales 2017-02-10 13:41:10 UTC
You are using fragment-method=event wihtout a downstream element properly issuing GstForceKeyUnit events. This fragmentation method is used when you have a sink like dashsink driving the pipeline to generate new fragments for all upstream elements at the same timestamp. In your case you will have to switch to time-based fragmentation, each second for example. You will also need to configure your encoder to create keyframes at a regular interval, which is what GstForceKeyUnit event does, but since you have it, make sure that your encoder forces a keyframe at regular intervals, ideally matching your fragment duration.
Comment 15 Andoni Morales 2017-02-10 13:44:25 UTC
A good improvement to mp4dashmux would be to add a new fragment-method=keyunit that issues fragments for each input keyframe.
Comment 16 Andreas Frisch 2017-02-10 14:06:20 UTC
yes andoni, that sounds like a good idea. i've been wondering how to properly sync keyframe generation by the encoder and fragment splitting.
but currently the point being is that apparently, the streams issued by tcpserversink aren't self-contained coherently up to the point where my mse browser could independently play them because some headers are still missing.
Comment 17 Sebastian Dröge (slomo) 2017-02-10 14:19:55 UTC
Time-based fragment generation at the next possible keyframe seems more useful. Also this should probably be just somehow be merged with splitmuxsink, which already does all these complications.
Comment 18 Andoni Morales 2017-02-10 14:44:35 UTC
(In reply to Sebastian Dröge (slomo) from comment #17)
> Time-based fragment generation at the next possible keyframe seems more
> useful. Also this should probably be just somehow be merged with
> splitmuxsink, which already does all these complications.

That's exactly what this commit in the same branch was doing :) https://cgit.freedesktop.org/~thiagoss/gst-plugins-good/commit/?h=dashsink&id=2daec4378f14eafaf8ed360f78719ccb3a5ae1dd
Comment 19 Andreas Frisch 2017-02-13 14:12:01 UTC
i've noticed that when checking the directly saved file with mediainfo

General
Complete name                            : smpte-min-max-keyint-30-fragment-duration-1000-direct.mp4
Format                                   : iso5
Codec ID                                 : iso5 (iso5/iso2/dash)
File size                                : 160 KiB
Duration                                 : 33 s 333 ms
Overall bit rate                         : 39.4 kb/s
Encoded date                             : UTC 2017-02-13 13:12:20
Tagged date                              : UTC 2017-02-13 13:12:20

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3.1
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 1 frame
Format settings, GOP                     : M=1, N=30
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 33 s 333 ms
Bit rate                                 : 37.2 kb/s
Nominal bit rate                         : 2 048 kb/s
Width                                    : 1 280 pixels
Height                                   : 720 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 30.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.001
Stream size                              : 151 KiB (94%)
Writing library                          : x264 core 148 r2708 86b7198
Encoding settings                        : cabac=1 / ref=1 / deblock=1:0:0 / analyse=0x1:0x1 / me=dia / subme=1 / psy=1 / psy_rd=1,00:0,00 / mixed_ref=0 / me_range=16 / chroma_me=1 / trellis=0 / 8x8dct=0 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=0 / threads=4 / lookahead_threads=4 / sliced_threads=1 / slices=4 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=0 / weightp=1 / keyint=30 / keyint_min=16 / scenecut=0 / intra_refresh=0 / rc_lookahead=0 / rc=cbr / mbtree=0 / bitrate=2048 / ratetol=1,0 / qcomp=0,60 / qpmin=0 / qpmax=69 / qpstep=4 / vbv_maxrate=2048 / vbv_bufsize=1228 / nal_hrd=none / filler=0 / ip_ratio=1,40 / aq=1:1,00
Language                                 : English
Encoded date                             : UTC 2017-02-13 13:12:20
Tagged date                              : UTC 2017-02-13 13:12:20
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709

against the stream from tcpserversink
General                                                                                                                                                                                   
Complete name                            : min-max-keyint-30-fragment-duration-1000-entry-0.mp4                                                                                           
Format                                   : iso5                                                                                                                                           
Codec ID                                 : iso5 (iso5/iso2/dash)                                                                                                                          
File size                                : 32.9 KiB                                                                                                                                       
Duration                                 : 7 s 0 ms                                                                                                                                       
Overall bit rate                         : 38.5 kb/s                                                                                                                                      
Encoded date                             : UTC 2017-02-13 13:13:31                                                                                                                        
Tagged date                              : UTC 2017-02-13 13:13:31

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Main@L3.1
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 1 frame
Format settings, GOP                     : M=1, N=30
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 7 s 0 ms
Bit rate                                 : 35.5 kb/s
Width                                    : 1 280 pixels
Height                                   : 720 pixels
Display aspect ratio                     : 16:9
Frame rate mode                          : Constant
Frame rate                               : 30.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.001
Stream size                              : 30.4 KiB (92%)
Language                                 : English
Encoded date                             : UTC 2017-02-13 13:13:31
Tagged date                              : UTC 2017-02-13 13:13:31
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709


that the Writing library and the Encoding settings are missing in the latter case. that's what i had meant above when looking at the hexdump. apparently x264enc writes that only once at the beginning of the stream.
could it be that the browser wants to know about that, or some other initialization sequence from the h264 stream before starting to render?
Comment 20 GStreamer system administrator 2018-11-03 15:16:15 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/issues/345.