GNOME Bugzilla – Bug 698853
HTML5 kind attribute (subtitles, captions, metadata, descriptions, chapters)
Last modified: 2018-11-03 12:18:17 UTC
In HTML5, we need to mark text streams as subtitles, captions, metadata, descriptions or chapters, but there doesn't seem to be a way to do this in GStreamer. For audio and video tracks, there are also kinds: alternative, captions (legacy), description, main, main-desc, sign, subtitles (legacy), translation, and commentary. For metadata, the caps will probably be different, but for everything else the caps would be the same. The simplest solution seems to be to add a GST_TAG_KIND, which can contain any of the kinds defined in HTML5. See: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#text-track-kind http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#dom-audiotrack-kind
Created attachment 250142 [details] [review] Add GST_TAG_KIND Step 1 (this patch): Add GST_TAG_KIND to gstreamer. Step 2: Use it in gst-plugins-*.
I'm not too keen on the name, maybe TRACK_KIND or KIND_OF_TRACK - all not pretty. If the list is fairly limited and not freeform, we could just make it stream-start flags (overloaded for audio/video/text), but there's probably some mpeg spec out there which knows at least 64 different types of tracks. Guess the tag is more extensible.
(In reply to comment #2) > I'm not too keen on the name, maybe TRACK_KIND or KIND_OF_TRACK - all not > pretty. TRACK_KIND is reasonable. Maybe HTML_TRACK_KIND? > If the list is fairly limited and not freeform, we could just make it > stream-start flags (overloaded for audio/video/text), I'll look into that. It would be nice if I was enum-y. > there's probably some > mpeg spec out there which knows at least 64 different types of tracks. Guess > the tag is more extensible. I'd say that should be a separate tag, sort of like how there's multiple LANGUAGE_* tags depending on what exactly you want.
(In reply to comment #2) > If the list is fairly limited and not freeform, we could just make it > stream-start flags (overloaded for audio/video/text), I tried some Google searches, but I'm not seeing anything about stream-start flags. What are they?
http://cgit.freedesktop.org/gstreamer/gstreamer/tree/gst/gstevent.h#n371
We could also add that to stream-start as a GQuark or string or an array of these, nothing preventing that. There's some overlap with the flags though. Can a stream have multiple "kinds"? But I also think a tag is better for this, feels more natural.
(In reply to comment #6) > Can a stream have multiple "kinds"? No, just one. > But I also think a tag is better for this, feels more natural. I looked at GstStreamFlags and it seems to fit in one sense, that the stream's "kind" really shouldn't change. I think tags fit better though, since you can't have more than one kind. I was wondering: Would it make sense to make it hold an enum? The only thing I would worry about is that the standard isn't finished, so we may need to add to the enum (or worse, remove values from it?).
Created attachment 250211 [details] [review] Name it GST_TAG_TRACK_KIND and register the new tag Updated the patch to rename it GST_TAG_TRACK_KIND and properly register the tag (I didn't notice this last time). I made it a string for now.
Tim, what do you think? Somehow it fits in stream-start, but as a tag it feels more correct... can't say why though :)
I can live with either. I suppose it also depends a bit on when it's needed. But we can always put tags into the stream-start event as well if needed ;)
Brendan, what exactly are you doing with the track kind? What's your use case?
Review of attachment 250211 [details] [review]: ::: gst/gsttaglist.c @@ +389,3 @@ _("How the image should be rotated or flipped before display"), NULL); + gst_tag_register_static (GST_TAG_TRACK_KIND, GST_TAG_FLAG_META, + G_TYPE_STRING, _("track kind"), _("The HTML5 track kind"), NULL); I think if we name it *HTML5* track kind it should be a tag and inside libgsttag instead. If this should be a more generic way of specifying what a track semantically is it should be in libgstreamer directly. Also, as this has audio/video specific notions, maybe it should be in gst-plugins-base anyway?
I'm not sure if it should be html5-specific. We want almost-identical signalling in other cases too (dvd/blu-ray/mpeg-ts/etc.)
(In reply to comment #11) > Brendan, what exactly are you doing with the track kind? What's your use case? I need it so WebKit can determine the HTML5 track kind, so I would basically read the tag, make sure it's a valid value, and then set it as an attribute on the TextTrack object. The most important part is that it's available to JavaScript, which means it's hard to predict how the attribute will be used. WebKit will use it internally for some things though (there's a "CC" button for turning captions on, metadata tracks don't need to be parsed, "descriptions" should be read by a text-to-speech engine, "chapters" need to be displayed differently). Also, different "kinds" will likely be presented differently in user interfaces.
(In reply to comment #13) > I'm not sure if it should be html5-specific. We want almost-identical > signalling in other cases too (dvd/blu-ray/mpeg-ts/etc.) I think that may be a separate issue, unless it's possible to determine the HTML5 kind from those other signals. Some of the kinds are very similar though, like "captions", "subtitles", "descriptions", "chapters". They're all text, but the user agent needs to be able to tell them apart somehow.
Another option might be to have a more complete set of "kinds" that covers everything we could possibly want, with a gst_tag_list_get_html5_track_kind() function that converts to one of the valid HTML5 track kinds (like the functions in gsttaglanguagecodes).
I think I prefer the last a bit (conversion from tag to HTML5 track kind), it doesn't feel right to add multiple different tags just for different use cases that do basically the same. However I would like to see this in gst-plugins-base then because it's media specific.
(In reply to comment #17) > However I would like to see this in gst-plugins-base then because it's media > specific. The "kind" isn't specific to any media type. Any audio, video or text stream should have one.
Yes, but their values can be media type specific, e.g. subtitle.
I'm willing to help writing a patch, but I'm not sure what the tag should be named or what the possible values should be if this is going to encapsulate more than than just HTML5. :\
IMHO we can have this as a generic "track-kind" tag with the HTML5 values as a start, and others added later if needed. It just all feels not very well defined.
I tried adding GST_TAG_TRACK_KIND to gst-plugins-base/gst-libs/gst/tag/tag.h, but if I try to #include that in gstssaparse.c, it can't find it. gstssaparse.c:29:25: fatal error: gst/tag/tag.h: No such file or directory #include <gst/tag/tag.h> Also, I have two questions: Should the kinds be an enum or a string, and should there be a gst_tag_get_html_track_kind() function? At the moment, all it could do is validate that the kind is one of the listed ones. I'm not sure if we could make it return an answer based on the type of track it's on ("subtitles" isn't valid on an audio track, but how do we know we're looking at an audio track?).
(In reply to comment #22) > I tried adding GST_TAG_TRACK_KIND to gst-plugins-base/gst-libs/gst/tag/tag.h, > but if I try to #include that in gstssaparse.c, it can't find it. > > gstssaparse.c:29:25: fatal error: gst/tag/tag.h: No such file or directory > #include <gst/tag/tag.h> You need to add $(GST_PLUGINS_BASE_CFLAGS) to the CFLAGS, and $(GST_PLUGINS_BASE_LIBS) -lgsttag-@GST_API_VERSION@ to the LIBS. > Also, I have two questions: Should the kinds be an enum or a string, and should > there be a gst_tag_get_html_track_kind() function? At the moment, all it could > do is validate that the kind is one of the listed ones. I'm not sure if we > could make it return an answer based on the type of track it's on ("subtitles" > isn't valid on an audio track, but how do we know we're looking at an audio > track?). I'd make it an enum
> > Also, I have two questions: Should the kinds be an enum or a string > > I'd make it an enum I think we have historically not used enums for tags (for reasons I don't remember exactly)?
(De-)serialization maybe?
Related to this: > should there be a gst_tag_get_html_track_kind() function? It was suggested that there are other kinds, in Bluray for example, but I don't know what these kinds are, or if they should be defined here. If we want this to handle more than just the HTML5 kinds, it would probably need to be done by someone who understands the various formats better than I do.
Ogg has a pretty good list of "roles", which is a superset of this: https://wiki.xiph.org/SkeletonHeaders#Role
Created attachment 255144 [details] [review] Add GST_TAG_TRACK_KIND Here's a patch that just adds the track kind tag. I have conflicting feelings about the Ogg kinds. I'm guessing we would want a separate tag for that anyway, since they can be any string (although it's recommended to use something from the list). I'd like to add support for kinds in Ogg, but it's not clear to me how to do that in gstoggstream.c. I'll look at it more tomorrow..
I'm still not really keen on introducing enum types to tag lists. Please hold off with committing this for now (just in case anyone was going to do that).
Created attachment 255262 [details] [review] Kind tag as a string, and add Ogg Skeleton support as an example Here's a version that uses strings, and I added the code to parse the "Role" header from Ogg Skeleton, so we'll have something that actually uses this. I still feel like putting the kind tag in plugins-base is weird, since it's a similarly generic tag to "language" and "title". While I was writing this, I found it tempting to create a GST_TAG_ROLE instead, and have a special function mapping that to HTML5 roles, but I'm not sure if the various formats agree on the roles that Ogg skeleton uses (especially since Ogg Skeleton roles can be anything, they're just recommended to be one in the list). The patch could probably be more efficient if it used a hash table for the role->kind mapping, but I'm not sure what the GStreamer way of doing that is.
Oh, and to test this you'll need a file with an Ogg Skeleton stream. I couldn't easily find any examples online, but ffmpeg2theora adds it, and you can use this file I transcoded with it: https://dl.dropboxusercontent.com/u/61100892/skeleton.ogv
More complication: Sending chapters in-stream doesn't make any sense, because of GstToc: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/gstreamer-GstToc.html It looks like metadata should also be sent this way: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-bad-libs/html/gst-plugins-bad-libs-Base-MPEG-TS-sections.html
Actually, that patch doesn't seem to work for some reason. If I do `gst_tag_list_get_string(tags, GST_TAG_TRACK_KIND, &tagValue)` in WebKit, I don't get anything, even with the skeleton.ogv file. I can see in the debug output that it's creating the tag, but WebKit doesn't see it.
The problem with my ogg skeleton patch is that it's creating tags for the skeleton stream, but they need to go with the stream they're for.
Yes, there needs to be something in oggdemux to move tags from the skeleton stream to the corresponding pads
Review of attachment 255262 [details] [review]: Please split this into multiple patches. In general looks good I guess, just needs someone to sit down and properly define and scope this tag :/ ::: ext/ogg/gstoggdemux.c @@ +4761,3 @@ #endif + gst_tag_register_musicbrainz_tags (); Maybe this should be deprecated and get a better name :) gst_tag_init() :) ::: gst/subparse/Makefile.am @@ +19,2 @@ libgstsubparse_la_LDFLAGS = $(GST_PLUGIN_LDFLAGS) +libgstsubparse_la_LIBADD = $(GST_LIBS) $(GST_BASE_LIBS) $(GST_PLUGINS_BASE_LIBS) -lgsttag-@GST_API_VERSION@ Why do you link subparse to libgsttag although no changes are in subparse?
Created attachment 256640 [details] [review] gst-plugins-base; rename gst_tag_register_musicbrainz_tags to gst_tag_init
Created attachment 256641 [details] [review] gst-plugins-good; rename gst_tag_register_musicbrainz_tags to gst_tag_init
Created attachment 256642 [details] [review] gst-plugins-bad; rename gst_tag_register_musicbrainz_tags to gst_tag_init And gst-plugins-ugly and gst-libav don't seem to use this function.
Sorry for the bikeshedding, but I'd prefer to get rid of the function entirely (i.e. deprecate it). From tags.c: /* FIXME 0.11: rename this to gst_tag_init() or gst_tag_register_tags() or * even better: make tags auto-register themselves, either by defining them * to a wrapper func that does the initing, or by adding tag factories so * that the core can load+register tags automatically when needed. */
Created attachment 256646 [details] [review] Add GST_TAG_TRACK_KIND Here's this part alone. The more I think about this, the more I think that this is the best solution. It would be nice if we could restrict the "kind" to valid ones, but it's hard to say if the HTML5 WG will add more in the future. It would be nice if we could automatically generate them from other tags. For example, create GST_TAG_MATROSKA_ROLE, and have gst_tag_get_track_kind() use that if it exists, but then that function would be pretty complicated and would need to know about every format ever, which breaks the modularity of GStreamer. We could add a get_track_kind (type, taglist), where "type" is audio, video or text, and it could do the check to see if the kind is valid, but I don't think that's really any simpler for implementers to use.
(In reply to comment #40) > Sorry for the bikeshedding, but I'd prefer to get rid of the function entirely > (i.e. deprecate it). From tags.c: > > /* FIXME 0.11: rename this to gst_tag_init() or gst_tag_register_tags() or > * even better: make tags auto-register themselves, either by defining them > * to a wrapper func that does the initing, or by adding tag factories so > * that the core can load+register tags automatically when needed. */ Wrappers would be pretty easy: #define GST_TAG_TRACK_KIND (gst_tag_init (), "track-kind") But it seems kind of wasteful to do this every time we use a tag. I don't really know how to do tag factories.
(In reply to comment #42) > #define GST_TAG_TRACK_KIND (gst_tag_init (), "track-kind") Actually, this wrapper causes the build to hang for some reason.
You could make it gst_tag_get_tag ("track-kind") which then uses a g_once_init_enter()/leave() or GOnce guarded function that registers all tags from the tag library, and then returns the string.
(In reply to comment #44) > You could make it gst_tag_get_tag ("track-kind") which then uses a > g_once_init_enter()/leave() or GOnce guarded function that registers all tags > from the tag library, and then returns the string. Like this? /* for each tag */ #define GST_TAG_TRACK_KIND gst_tag_get_tag ("track-kind"); const gchar* gst_tag_get_tag (const gchar* tag_name) { static GOnce once = G_ONCE_INIT; g_once (&once, gst_tag_register_tags_internal, NULL); return tag_name; }
Yes
Perhaps it's best to ignore this init stuff for now and focus on the actual API, but we would also need to consider bindings and what gobject-introspection can handle.
The patches are separate, so maybe it's best to just deal with the GST_TAG_TRACK_KIND one and figure out tag auto-registration separately. Is there anything I can do to help with the track kind patch? I'm not really sure what information would be useful.
Some kind of specification I guess... how this maps to different container formats, to the W3C standards, etc. Then we have something that can be decided
The best I have so far is this: http://www.w3.org/community/inbandtracks/wiki/Main_Page And for WebM it looks the kind can be determined from the track's CodecID: > Per the convention (see the Matroska Codec Specifications) used for > flavors of a particular video or audio codec, the CodecID for a WebVTT > track is “D_WEBVTT/kind“, where kind is one of SUBTITLES, CAPTIONS, > DESCRIPTIONS, or METADATA. http://www.webmproject.org/docs/container/
Created attachment 269081 [details] [review] Populate GST_TAG_TRACK_KIND for WebVTT embedded in Matroska files Here's an example of a patch following the WebM spec: http://www.webmproject.org/docs/container/#storing-webvtt-data-in-a-webm-track > Per the convention (see the Matroska Codec Specifications) used for flavors > of a particular video or audio codec, the CodecID for a WebVTT track is > “D_WEBVTT/kind“, where kind is one of SUBTITLES, CAPTIONS, DESCRIPTIONS, or > METADATA.
For Ogg, the HTML spec says specifically that we should use the Role header: > For Ogg files, the Role header field of the track gives the relevant metadata. http://www.w3.org/html/wg/drafts/html/CR/embedded-content-0.html#dom-videotrack-kind I'm trying to figure out what it should be set to when it's not obvious (What does "text/karaoke" map to, for example?), but most of them are straightforward ("video/sign" maps to "sign", "text/caption" maps to "captions").
Created attachment 269616 [details] [review] Add kind for Role header for Ogg files Here's a patch that adds this for the obvious roles ("text/caption" -> "captions", etc.) and doesn't set it for the others.
Created attachment 269617 [details] [review] Add GST_TAG_TRACK_KIND And this removes "description" from the list of valid kinds (both text and audio now use "descriptions"), and uses nicer URLs for the W3C references.
Comment on attachment 269616 [details] [review] Add kind for Role header for Ogg files Actually the Ogg one doesn't work yet..
Created attachment 269618 [details] [review] Read Ogg Skeleton Role header for GST_TAG_TRACK_KIND This patch works. I'm not sure why it matters what tag_list_from_vorbiscomment_packet() does, but it does..
Created attachment 269626 [details] Example Ogg Skeleton file Here's a file you can test with that has an Ogg Skeleton stream. It has three streams with roles: video/main, text/subtitle, and text/subtitle. For some reason the subtitles don't make it to the end of my pipeline for the text ones, but some loggings shows that they're definitely pushed in gst_ogg_demux_activate_chain. The video one works perfectly.
The Kate issue seems to be because katedec overwrites all of the tags from oggdemux: https://bugzilla.gnome.org/show_bug.cgi?id=724699 Personally, I think the way sticky tag events work is broken: If you have multiple tag events, they should be merged, not replaced.
> Personally, I think the way sticky tag events work is broken: If you have > multiple tag events, they should be merged, not replaced. There are some known issues with this in practice, yes (esp. parsers), but it's up to elements to merge their own modifications into upstream tags as they see fit. If there are problems like that, they need to be fixed. I don't think there's a problem conceptually, and blindly merging tags is not the answer either.
(In reply to comment #59) > > Personally, I think the way sticky tag events work is broken: If you have > > multiple tag events, they should be merged, not replaced. > > There are some known issues with this in practice, yes (esp. parsers), but it's > up to elements to merge their own modifications into upstream tags as they see > fit. If there are problems like that, they need to be fixed. I don't think > there's a problem conceptually, and blindly merging tags is not the answer > either. That makes sense. I created a patch to fix Kate's tag handling, and with that, my "track-kind" patch also works perfectly in WebKit: https://bugzilla.gnome.org/show_bug.cgi?id=724699
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/38.