GNOME Bugzilla – Bug 564749
tagreadbin + GstTagReader interface
Last modified: 2018-05-01 09:22:28 UTC
Fast Tag Reading ================ Components involved: * TagReader: interface that is used for setting the desired operation for an element that implements it. Very simple API (3 functions). Location: plugins-base/gst-libs/gst/interfaces/tagreader.[ch] * TagReader elements: Elements that implement tagreading interface and are able to skip data / metadata according to selected mode. Location: See "elements implementing tagreader interface" section below * TagReadBin: Bin element heavily based on Decodebin2. Autoplugs elements that implement tagreader interface and uses fakesinks as sink elements. Application needs to plug a source element to this bin and put the pipeline in PAUSED mode. Tagreading is finished when pipeline is prerolled. Location: plugins-base/gst/tagread/tagreadbin.[ch] * PlayBin2, Decodebin2, UriDecodeBin: provides "skip_metadata" property that can be used to speed up the media playback startup time. It sets all autoplugged elements that implement tagreader interface to GST_TAGREAD_MODE_DATA mode, a.k.a. tell them to skip all metadata parsing if possible. Location: plugins-base/gst/playback/... * Tagread test application: A tool for measuring tag reading speed. Basically it supports both tagreadbin and decodebin elements in tag reading (for comparison purposes), but decodebin-based reading mechanism has proven to be unstable - it gets stuck easily. Location: plugins-base/test/benchmarks/tagread.c Principle of operation ====================== * Tagreader interface defines three tag reading modes: GST_TAGREAD_MODE_OFF - Normal functionality, parse both metadata and data GST_TAGREAD_MODE_TAGS - Report only metadata and quit as soon as possible GST_TAGREAD_MODE_DATA - Skip metadata to increase playback startup speed * Tagreader interface methods: * gst_tagreader_set_mode() is a vmethod that needs to be implemented by the tag reader element. TagReadBin uses it to set the tag reading mode. * gst_tagreader_done() can be used by tagreader elements to inform that they have finished reading the tags. This only needs to be used in GST_TAGREAD_MODE_TAGS mode. * gst_is_tagreading_done_message() is an utility function to check if GstMessage is a "tag-reading-done" message sent by some of the tagreader elements. * Tagreadbin autoplugs elements in same way as decodebin2. The only difference in this operation is element filtering; only elements that implement tagreader interface are selected. * When a demux element is found, queue2 elements are linked to its each srcpad to prevent preroll deadlock. Queue size is set to 1 buffers. * When autoplugging doesn't find any more tagreader elements to be plugged, fakesink is added in the end of the pipeline. * Tagreadbin keeps track of how many tagreaders have been added to the pipeline (n_parsers), because it wants all tagreaders to send a "done" message. * Tagreadbin counts how many parsers are finished (n_done) * Tagreadbin counts how many fakesinks are prerolled (n_asyncs_done) * Tagreadbin works asynchronously. When it is put into PAUSED mode, it returns ASYNC_START. State change completes when certain condition is met, and ASYNC_STOP messageis then sent. This finishes prerolling and whole pipeline finally goes into PAUSED state. * Tag reading process is considered finished, when ALL of the following conditions are met: * All tagreader elements MUST have finished their operation (n_done == n_parsers) * At least one fakesink has prerolled (n_asyncs_done > 0). This rule prevents tag reading process not to finish too early, a.k.a. when the pipeline is not yet totally built. Elements implementing TagReader interface ========================================= * Aviparse - Implements only GST_TAGREAD_MODE_TAGS mode. When tagreading mode is set, aviparse refuses to work in PULL mode and thus skips the time-consuming index parsing. * Wavparse - Implements only GST_TAGREAD_MODE_TAGS mode. In tagreader mode it sends "done" message immediately after wav header has been parsed. * Mpegaudioparse - Sends done message immediately after stream information has been detected and sent as a tag. * TagDemux - base class that are used by two parser elements: * id3demux * apedemux * Mpegstream - Because of the nature of MPEG system stream, parser needs to go through certain amount of data until it can be sure that it has found all streams from the container. Therefore mpegstream need to demux multiple buffers and push audio/video frames to srcpads. Ongoing discussion & problems/issues ==================================== * Wim Taymans suggested that element could use GST_FLOW_UNEXPECTED return value to inform that it has finished tag reading. However, this may not work properly when there are multiple tagreaders in a row, because tag finding order cannot be guaranteed. * One version of tagreadbin didn't use queues after demux element. This configuration seemed to work fine with AVI, WAV and ID3 tags, since their parsers are able to extract the tags before pushing any data forward. Therefore tag readers sent "done" message before the pipeline was even prerolled. However, mpegstream doesn't work this way, and it caused a preroll-deadlock situation; demuxer wasn't able to demux more data, since audio/video stream was already prerolled and there wasn't room to store extra buffers. Adding the leaky queues after demux helped, but it also slowed down the process about 5 seconds when parsing 200 AVI clips. * It is not good idea to upgrade any decoders to implement tag reader interface, since it would cause a lot of slowdown because of decoding process. * The GST_TAGREAD_MODE_DATA was designed to increase playback startup speed. For example ID3 tag may contain album cover art, that is normally parsed into GstBuffer and sent to application. If app is not interested in the tags at this point (it may have a metadata database already) so it can tell Playbin2 to skip the metadata parsing entirely and just start playing it as quickly as possible. As you have seen above, not many components currently implement this.
Created attachment 124802 [details] [review] TagReader interface implementation
Created attachment 124803 [details] [review] Patch that adds tagreader interface implementation into TagDemux class
Created attachment 124804 [details] [review] TagReadBin element implementation
Created attachment 124805 [details] [review] Patch that adds tagreader interface to AVI demuxer
Created attachment 124806 [details] [review] Patch that adds tagreader interface to WAV parser
Created attachment 124807 [details] [review] Patch that adds tagreader interface to mpeg audio parser
Sounds good in general (without thinking about it too much) but I'm a bit worried about the code duplication between decodebin2 and tagreadbin. Maybe this can be generalized somehow? :)
I'll have a look at this and see if it can be integrated with decodebin2.
These patches^Wbits-of-code don't compile/work properly. Tim will be coming up soon with working/usable patches.
The patches worked when they were made :) One more thought about the mode of operation. The demuxers most probably know about * width, height and framerate for video * depth, samplingarte for audio from the headers. So they should probably add them to the caps in fast-tag reading mode.
Created attachment 134859 [details] [review] GstTagReader interface, tagreadbin, and gst-tagread test application First stab at fast tag reading implementation based on the make-it-work-first-optimise-later school of thought. It mostly works great already (excl. ogg files maybe), but will only really shine once elements are fixed up a bit more. I think the main priority is to settle on the API and design now and do the micro-tinkering later. Rationale and design goals: - people often only want the tags and metadata of a file; as fast as possible and with a convenient API; they don't have to get the information from different places (tags, caps, queries etc.) - people don't want to deal with dynamic pads, "prerolling", queries, caps, taglist merging etc. just to get some tags. - people want global info and per-stream info and stream types (audio/video/subtitle) - implementation detail: we don't want to plug any decoders to get to this info, for two reasons: (a) speed / overhead, (b) to make it possible to extract tags on embedded/mobile devices without starting up or interfering with or be blocked by hardware decoders - the design should allow for optional thumbnailing (currently not implemented) ========================================== Main components of this patch (I don't think single patches make this easier to review, so I made a big one), in the order they appear in the patch: - docs - GstTagReader interface, in libgsttag-0.10: basically a dummy interface elements can implement so their tag-reading behaviour can be fine-tuned (e.g. demuxers can skip index processing). One *could* do without this interface, but the alternatives are not very nice from an API/discoverability perspective, or involves unnecessary overhead like double plugin feature registration or so. - make decodebin2 and uridecodebin implement the tag reader interface and propagate the tag reading mode set to any tag reader elements they auto-plug into the pipeline - tagreadbin: this is a stand-alone pipeline element similar to playbin, only that it reads tags given a URI. You set the "uri" property, set it to PLAYING state and will get a TAG message on the bus for the global tags, and then one tag messages for each stream, followed by an EOS message. In terms of implementation, tagreadbin will internally plug a uridecodebin and do special filtering of what elements get plugged via the "autoplug-continue" and "autoplug-factories" signals. It will stop autoplugging when it gets a pad with width/height or rate/channels caps and a duration query succeeds, or if there are no other elements to plug. The exposed uridecodebin pads are linked to an internal tagsink element with N inputs (request pads), which will collect the tags via the stream events, and merge and filter the various tags, and decide when it has enough information from all streams to stop. It will also extract things like width/height rate/channels from the caps and do duration queries on all streams etc. - last, the gst-tagread example, which uses tagreadbin to extract tags and prints them as they come in. It takes filenames and directories and will recurse into directories. The output will look something like this: URI: file:///home/tpm/samples/m101758-push-mode-mjpeg.avi CONTAINER TAGS: duration: 20817981859 container format: AVI STREAM #1 TAGS: audio codec: Uncompressed 24-bit PCM audio channels: 2 rate: 44100 duration: 20817981859 STREAM #2 TAGS: video codec: Motion JPEG width: 320 height: 240 framerate: 25/1 duration: 20800000000 URI: file:///home/tpm/samples/dirac-sage-640x360.mov CONTAINER TAGS: duration: 14888333333 container format: Quicktime STREAM #1 TAGS: video codec: Dirac description: audiotest wave width: 640 height: 360 framerate: 24/1 duration: 14888333333 STREAM #2 TAGS: audio codec: MPEG-4 AAC audio description: audiotest wave channels: 2 rate: 48000 duration: 14888333333 URI: file:///home/tpm/samples/577468-id3-synchronisation-working.mp3 CONTAINER TAGS: duration: 10224000000 container format: ID3 tag STREAM #1 TAGS: audio codec: MPEG 3 Audio, Layer 3 (MP3) has crc: FALSE channel mode: mono image: buffer of 9928 bytes, type: image/gif, image-type=(GstTagImageType)GST_TAG_IMAGE_TYPE_UNDEFINED album: Album genre: Pop title: ARTIST artist: 藝人 channels: 1 rate: 8000 duration: 10224000000 (Yes, the info from the ID3 tag should be in the global tags section, which it will be after a fix to id3demux.)
Could you attach the output file(s) of git format-patch? It's easier to handle than a plain diff (for example, you don't have to add the new files manually).
I'd like to get this into the upcoming 0.10.24 release, for which we freeze on July 6th according to the release schedule. Also, I'd like to commit this to -base as per the patch, with a 'still unstable' caveat like we use for playbin2, the rationale being that: - it relies closely on decodebin2/uridecodebin behaviour/API which isn't 100% stable yet - it covers a completely new use case and as such belongs into -base - the functionality provided is equal in importance to that of playbin/decodebin and as such it belongs into -base in its own right - it's well-documented and the API is unlikely to change There's one more thing that should be done before this is committed though, and that is to make sure it works adequately for ogg/theora/vorbis streams (either by making the parsers implement the tagread interface and read the codec headers, or provide a special tagread element that just parses the codec headers and comment headers).
Created attachment 137622 [details] [review] libgsttag: add new GstTagReader interface to allow fine-tuning of tag-reading behaviour
Created attachment 137623 [details] [review] playback: make uridecodebin and decodebin2 implement the tagreader interface
Created attachment 137624 [details] [review] tagreading: add tagreadbin element for fast and easy metadata extraction
Created attachment 137625 [details] [review] tagreading: add gst-tagread test app to examples Update to previous patch (and split up into 4 separate patches). No big changes, just a leak fix and tagreading mode propagation in uridecodebin.
The patch for GstTagDemux is not technically correct as far as I can tell (even though it will work just fine in practice), since the subclass may only figure out / determine the exact tag size while parsing the tag. The patch assumes that size doesn't change after the initial determination. This means this functionality likely needs to go into the subclasses (apedemux, id3demux). The avidemux tagreader implementation should be a bit more sophisticated: we should really activate in pull mode and then only do minimal index parsing to figure out the correct duration.
Created attachment 137734 [details] [review] Patch that adds tagreader interface to AVI demuxer As noted above, the newer GstTagReader interfaces obsoletes (all) earlier provided implementations. This provides an updated implementation for avidemux. As suggested above, it activates in pull mode, and performs minimal index parsing or skips index reading altogether if header durations look sane. In either case, (some) data is still pushed downstream as so expected by tagreadbin.
Please provide git format-patch generated patches. This git diff stuff is not supposed to be passed around and it's a pain to apply correctly.
Created attachment 137919 [details] [review] Patch that adds tagreader interface to AVI demuxer Updates previous patch to current git master (and requested format). Comments/notes apply as given before.
Created attachment 140425 [details] [review] Patch that adds tagreader interface to AVI demuxer Previous patch contained a few changes that were not strictly tagreading related. They have now been incorporated into current git avidemux, and tagread interface patch has been updated to current git.
Quick update: a few comments and what's left to do: - First off: tagreadbin vs. gst-discoverer: both do sort of similar things, and even use a similar approach to how they do it, but they are targetted at different use cases: gst-discoverer tries to get the full picture, with full reliability, for transmuxing/coding purposes; it will usually plug decoders. Tagreadbin on the other hand tries to get just the metadata as fast as possible. It aims to work without decoders at all, so that metadata can be extracted whether the appropriate codecs are installed or not. Both approaches could be combined, but I don't think this is desirable or useful at this point. We should proceed with both separately, optimise them for their respective use cases and then look at combining them at a later point (or 0.11). There's quite a bit of potential for, err, synergy though, see the interface discussion below. - left to do: oggdemux needs to extract tags from the stream headers (or vorbisparse/theorparse etc. need to implement the interface, but I think oggdemux should just extract the tags - that makes it behave more like a normal container, and it shouldn't be hard to do since the stream stuff is already in place now after the recent changes in oggdemux). Fairly minor task. - I'm not 100% happy with the GstTagReader interface yet. As a reminder, it serves two purposes: a) to configure elements into tag reading mode or tag skipping mode, so they can avoid expensive operations that aren't needed (like index parsing, for example). decodebin2 and uridecodebin implement this interface and will proxy the settings to children they plug. tagreadbin is a top-level pipeline that internally uses a uridecodebin that is configured in tag reading mode. This aspect is less important now that the major demuxers (avidemux, qtdemux) have been optimised for fast start-up, but see below for how this can still be useful. b) to selectively plug parsers or other elements that are deemed suitable for tagreading purposes, but that usually wouldn't be plugged for playback, for whatever reason (not needed, messes up sync, not trusted/tested enough yet). The registry stores the interfaces an element implements, and tagreadbin will (via decodebin2) prioritise a tagreader element over any decoder/other element during the autoplug-select process. One of the problems we have in GStreamer in general when reading metadata is that the application has no idea how reliable the various bits of information are, nor has it got a way to tell the plugins what it cares about or not. This means it's hard for an application to get a good/reliable duration reading or bitrate estimation for VBR files, for example, if it cares about such things. This is also a problem that gst-discoverer and its users face, if I'm not mistaken. The tagreader interface doesn't solve that (yet), but I think it could go in that general direction, ie. a way for an app to configure plugins what they care about and what not, a bit like an ACCURATE flag for duration estimation or bitrate estimation, for example (there's also the DONTCARE case of course). In any case, I'm currently leaning towards separating the two above-mentioned functions of the tagreader interface, so that it could be a more general GstMetaDataAndOrTagExtractionModeConfigurationInterface kind of thing (I'm sure we can think of something more snappy). Elements (e.g. parsers) could then implement this if they want to, without that affecting their autoplug-or-not status in tagreadbin. I believe this might also be useful for gst-discoverer. The autoplug aspect could then be handled either by an entry in the elementfactory klass (Tagreader or somesuch) or a separate interface. Should we go that way (more generic interface + separate shibboleth for tagreadbin), we don't need to decide on all the API now, just the name of the interface and how to set the element into tag-reading/skipping mode, although even that we don't absolutely *need*, we could just go ahead with just the autoplug klass marker and not do the interface for now. Having said all this, the current implementation seems fine too, and the estimation/reliability problems that I mentioned above could be handled entirely client side e.g. by additional query API like a _query_duration_full() that would also return the reliability or somesuch, then the caller can decide to continue parsing until it gets a better estimate based on that. Thoughts? Favoured approaches?
Having the GstTagReaderIface as a generic GstMetaDataAndOrTagExtractionModeConfigurationInterface or GstFormatParserModeIface would be nice to have. Another usecase where turning off the tagreading and seeking would be to one shot playback (used for e.g. ringtones on phone). Should we track that as a separate bug?
I filed a separate ticket to discuss the GstElement:hint idea under Bug #634687. Comments very welcome.
In Tracker we are using Tagreadbin to extract the metadata from the files and it is working fine so far. Any chance to get this into upstream gstreamer soonish? it would make our life easier when shipping in different distributions.
So, as suggested in Comment 23, [*] has a version of tagreading/tagreadbin without using GstTagReader interface, and that considers an element (regardless of rank) for tagreader if the klass description mentions Tagreader. Not having a GstTagReader interface in place should have little to no practical impact at present (when it comes to optimization), since AFAICS there are no elements implementing it so far. Also, bug #634687 holds a possible alternative implementation for advising elements how they could optimize behaviour (and that one, or another one, could be added at later time). [*] http://git.collabora.co.uk/?p=user/manauw/gst-plugins-base.git;a=shortlog;h=refs/heads/tagreading
(In reply to comment #27) > So, as suggested in Comment 23, [*] has a version of tagreading/tagreadbin > without using GstTagReader interface, and that considers an element (regardless > of rank) for tagreader if the klass description mentions Tagreader. > > Not having a GstTagReader interface in place should have little to no practical > impact at present (when it comes to optimization), since AFAICS there are no > elements implementing it so far. Also, bug #634687 holds a possible > alternative implementation for advising elements how they could optimize > behaviour (and that one, or another one, could be added at later time). The implementations are so far obviously only attached to this report as patches as they depend on the interface. But I am fine with going for the alternative.
This bug got linked from an LWN article about the recent GStreamer Conf. Is it still current? How will it integrate with the (very API of) GstDiscoverer?
I don't know why it got linked from the LWN article, I'm pretty sure I didn't mention it in my talk, and the context was also not quite right (it was developed for Nokia ages ago, and not for any MeeGo IVI stuff, though of course I'm happy if they find it useful). It's still current in the sense that tag-reading is still not as good as it can be, and tagreadbin's approach is better and faster for most cases than what discoverer does currently. There are other open issues which neither tagreadbin nor discoverer solve. However, for the time being discoverer is the only game in town and I don't expect this to change anytime soon, there are more interesting things to do. It shouldn't be too hard to teach discoverer the approach tagreadbin uses (e.g. don't plug decoders unless absolutely needed), all the groundwork for that has been done (oggdemux, plugging parsers, etc.). Tagreadbin is also much smarter about tag aggregation (stream tags vs. global/demuxer tags), but that's something that should be much easier in 1.0 now that we make those explicit. The main issue on the metadata front is reliability of the duration (and/or bitrate) estimates, IMHO, so a metadata reader can know whether to stop or whether to wait a bit longer for a better estimte, in cases where the duration can't be determined in a reliable way from the headers and/or index. So anyway, nothing to see here, move along :)
This bug has an assigned developer but has not received activity in almost a year. Is the assigned person still working on this ?
I would like to keep this open until the actual underlying problems are solved. (GstDiscoverer solves none of them.)
5 years with no activity. Ping again ?
Well, I think we still need a better tag reading story, but these patches are not the solution, so let's close this.