GNOME Bugzilla – Bug 702724
first-class miniobjects/API for mpeg-ts related SI (Service Information)
Last modified: 2013-07-03 07:19:18 UTC
Several issues arise right now in regards to handling mpeg-ts/DVB SI. 1) performance In order for the various SI to be usable by applications, we currently parse and store them in GstStructures using generic GValue (uint, string, arrays, lists, ...). This represents over 95% of the cost of parsing those SI ! And the parsing is equally as expensive. 2) Using glib deprecated items. GValueArray is deprecated in glib (sigh) and is ill-suited for the kind of arrays/collection we want to expose/use. 3) Usability If we want to later allow apps to "inject" SI into mpegtsmux, they need to know exactly the topology of the structure to inject. Instead of that, there should be a clear API with structures to fill in. In addition, apps need to know what field they should extract, what the type should be, .... ====== To this end, I think we should switch to a public API for the various mpeg-ts SI along with registered mini-objects. Ideally this should be supported by g-i also (which should work if we register boxed GType for each of them). The "hierarchy" could look something like this: GstMiniObject +-- MpegTSSection +-- MpegTSPAT +-- MpegTSNIT +-- MpegTSEIT ..... MpegTSSection { GstMiniObject parent; gint16 pid; guint8 table_id; guint16 subtable_extension; guint8 version_number; guint8 current_next_indicator; guint8 section_number; guint8 last_section_number; guint32 crc; } MpegTSPAT { MpegTSSection parent; guint16 transport_stream_id; guint16 nb_programs; .... } In addition, we could expose descriptors in a cleaner way (avoiding string copies also) GstMpegTSDescriptors { const guint8 *backing_string; guint nb_descriptors; GstMpegTSDescriptor descriptors[]; /* Null-terminated array */ } GstMpegTSDescriptor { guint8 descriptor_tag; guint8 descriptor_length; const guint8* descriptor_content; /* pointer to descriptors->backing_string */ } and expose the various macros to parse them.
*** Bug 702725 has been marked as a duplicate of this bug. ***
This could maybe be added to gst-libs/gst/codecparsers/ Importantly: 1) We still want apps to be able to get as much as information from possible 2) We want apps to be able to easily use that information 3) We want apps to be able to create/inject SI 4) Way way way better performance I know this might break behaviour (no longer using structures) but it's (luckily) in -bad so ABI/API can change.
Note that GstMiniObject doesn't allow deep inheritance (you can only have a subclass of GstMiniObject, not one of the subclass). And bindings are not absolutely happy with GstMiniObject yet either. Otherwise this sounds like a good idea, just don't add yet another new library for this. codecparsers sounds good but should maybe be renamed to something more generic then... libgstparsers?
(In reply to comment #3) > Note that GstMiniObject doesn't allow deep inheritance (you can only have a > subclass of GstMiniObject, not one of the subclass). And bindings are not > absolutely happy with GstMiniObject yet either. Yah, but I'd like to avoid the overhead of GObject. Some sections have really little information (TDT/TOT comes to mind, which only has ... a date (in addition to the generic SI headers) and they will appear very often. > > > Otherwise this sounds like a good idea, just don't add yet another new library > for this. codecparsers sounds good but should maybe be renamed to something > more generic then... libgstparsers? I'll work on it in codecparsers without renaming first.
I think the proposed solution is fine, but wanted to know if in other cases it wouldn't be good to use GVariant ?
what "other cases" ?
Maybe even in this case? There are two things here I guess: a) how to make the data available b) how applications would use/parse the data For a) we want to avoid inefficient GValue usage at a large scale - options are exposing structs in some way, or perhaps even GVariant, or just exposing the data unparsed? For b) the question is would applications want to use structs, or maybe it would be easier/better to provide helper functions to get/parse the info out instead and expose the data in a more raw/rough form?
(In reply to comment #7) > Maybe even in this case? > > There are two things here I guess: > > a) how to make the data available > b) how applications would use/parse the data > > For a) we want to avoid inefficient GValue usage at a large scale - options are > exposing structs in some way, or perhaps even GVariant, or just exposing the > data unparsed? > > For b) the question is would applications want to use structs, or maybe it > would be easier/better to provide helper functions to get/parse the info out > instead and expose the data in a more raw/rough form? So it's essentially the question of : how much do we pre-parse. 0) No parsing at all, we just provide a gchar *data and let the app deal with it 1) Only do generic SI parsing (i.e. what is common to all SI) 2) Do table_type (i.e. PAT, PMT, NIT, ...) specific parsing of the fields only (and do not touch the descriptors) 3) Do full and extensive parsing of everything, including descriptors I think we can find a right balance somewhere between 2) and 3) so that we keep the cpu usage at a reasonably low-level in the element, yet do not impose too much work on the application. The non-descriptor fields are straightforward and trivial to parse, so it's not a big burden. As for the descriptors, which ones do we decide to parse and expose explicitly in the structure ? I'd say there are two ways to go forward for this: 1) descriptors that are needed anyway for proper tsparse/tsdemux usage should be extracted explicitly. 2) descriptors which are *expected* to be present for a certain SI type should also be extracted explicitly. So that would boil down ... to the descriptors we are already parsing in mpegtspacketizer.c. The logic is also there and I would prefer not to lose it. That still leaves plenty of descriptors we don't parse (because they are not essential to core usage or core expectations) and they would still be available for the application to parse. The one descriptor I might be keen on moving away... are the various string ones (due to overhead of GIConv). But it doesn't cost a lot to leave them right now (we would already gain a massive amount of overhead by no longer using structure/gvalue/...) , and would allow existing apps to trivially switch (and would be available via g-i). that being said ... I still don't see the point about GVariant (tbh, I'm a bit clueless about it, but I fear using anything glib-originated in hotspots). I welcome an explanation though :)
GVariant is just a natural replacement for stuff using GstStructure/GValue, it's a super-set of this. The other cases, refer to cases (in general) where we don't want an API but need to propagate recusrive data structure. GVariant is for my experience more convenient, but still more difficult then a specialized data structure. I just asked because I was curious to see if GVariant has been illimated for some reason, or just never came up in recent design/redesign. The big down side of GVariant (from my point of view), is that there is no way to change the binary parser, so you cannot simply store the binary and parse on demand (unless your binary format is dbus format, the one current used). In few projects I've been involved (and including this case) this would have been optimal.
I've put some WIP in this git branch (top 3 commits): http://cgit.freedesktop.org/~bilboed/gst-plugins-bad/log/?h=mpegts-si Includes: * The new library * mpegtsbase/tsdemux/tsparse ported * dvb ported I tested tsparse and tsdemux on a whole bunch of ts files (including obscure ones) and didn't see any regression in behaviour The only regression right now are that some descriptor info are not parsed/extracted. PMT: * DESC_DVB_AC3 * DESC_DVB_DATA_BROADCAST_ID * DESC_DVB_DATA_BROADCAST * DESC_DVB_CAROUSEL_IDENTIFIER * DESC_DVB_STREAM_IDENTIFIER * DESC_ISO_639_LANGUAGE * DESC_DVB_SUBTITLING NIT: * DESC_DVB_SATELLITE_DELIVERY_SYSTEM * DESC_DVB_TERRESTRIAL_DELIVERY_SYSTEM * DESC_DVB_CABLE_DELIVERY_SYSTEM * DESC_DVB_FREQUENCY_LIST * DESC_DTG_LOGICAL_CHANNEL EIT: * DESC_DVB_EXTENDED_EVENT * DESC_DVB_COMPONENT The reason why I didn't parse those descriptors above are: 1) Because they are not needed for tsdemux/tsparse/dvb usage 2) I did parse out some not-absolutely needed descriptors (EIT DESC_DVB_EVENT and NIT network name for example) because i wanted to see more realistically the gain of switching to non-Gvalue/GstStructure usage. Right now the sections are mini-objects. As other people mentioned... introspection support for them are crap, but they are the most lightweight refcounted solution around (sections never get copied, even when put in messages on the bus). In order to cope with introspection support while keeping the performance gain, I'm wondering if we couldn't: 1) Add section-specific type registration (i.e. a new GType for each) 2) Add section-specific methods to extract from messages (gst_message_parse_pat() for example). Another item is (better) support for descriptors. Right now they are an array (which obviously g-i in all its glory don't detect as arrays despite provided annotations *sigh*), which is fast to create, and fast to parse (tag/length/extended_tag are extracted for convenience). They don't duplicate the tag contents, and instead point to the location of that descriptor in the section (private) data field. 1) Should we expose the list of all "known" descriptor ids ? Essentially a big GstMpegTSDescriptorType enum with description. 2) Should we expose the macros from gst/mpegtsdemux/gstmpegdesc.h ? 3) Should we expose the method 4) Should we create new functions (therefore introspectable) and struct to extract information from (some) descriptors ? And in a (much) longer run, how do we deal with creating section/descriptors from the app level ? Right now everything is hidden nicely in regards to allocation/freeing, but I'm wondering if that's sufficient to be able to add later on that feature with the proposed API. Anyway, comments welcome :)
Silly me, I forgot to report the speedups :) valgrind --tool=callgrind --trace-children=yes gst-launch-1.0 pushfile:///data/tf1-rip.ts ! <element> ! fakesink With tsparse: * Overall (including registry loading) : 60% faster * PSI handling (Called 755 times): 9.5 times faster With tsdemux: * Overall (including registry loading) : almost the same * PSI handling *Called 4 times): 60% faster The numbers are therefore coherent with the expected speedup. Notes: * tsdemux only detects and parses the PAT/PMT/TOT tables. That is expected (and therefore gets very low gain from all of this). * the PSI handling was still parsing/converting all the same chains (via g_convert_with_iconv) as before. It represents 13% of the overall PSI handling. So it could be worth moving those to later ... but not that massive. * The SI pre-parsing/-detection code (mpegts_packetizer_push_section) costs as much as the new optimized PSI handling. At a glance it could be made almost 2 times faster (by avoiding usage of lists for example). * The "PSI handling" code includes creation of the GstMessage (and GstStructure to contain the GstMpegTSSection) so is comparing the same final behaviour and codepath.
Some updates. On the side, another use-case came to mind lately: supporting some 3rd party systems that will do the PID/SECTION filtering elsewhere (like with the DVB kernel system) and want to have a system by which they can parse those sections. And I've been brainstorming the whole "when to parse" issue, "supporting g-i" issue, etc... To support all of that I've come up with an updated plan (still not done yet in the git repo but won't involve that much work) which is a bit similar to how we deal with non-subclassable miniobjects (like GstEvent) but with a twist: 1) GstMpegTSSection remains a GstMiniObject but is no longer "subclassable" (in spirit). It contains the underlying section (unparsed) string. 2) The various "parsed" structures are registered boxed objects, which you access by calling a method on GstMpegTSSection. The twist is that the (refcounted) GstMpegTSSection will do the parsing of the type-specific section only on-demand and will cache that result: struct _GstMpegTSSection { .... /* Generic section fields */ /< private >/ guint8 *section_data; gpointer *cached_parsed; /* If parsing already happened, will contain the type-specific data */ } const GstMpegTSPMT *gst_mpeg_ts_section_get_pmt (GstMpegTSSection * section); const GstDateTime * gst_mpeg_ts_section_get_tot (GstMpegTSSection * section); .... This will result in moving all the type-specific parsing from gst/mpegtsdemux/mpegtspacketizer.c to the helper library. Bonus: *) Still usable by mpegtsbase/tsparse/tsdemux elements where needed *) Parsing of data only happens once *) I can't make section parsing much cheaper (if nobody uses it, no parsing is involved except for the base section headers and no useless memory is used) :) *) Easier to use for bindings ===== Another item on the list is ... dealing with g-i (sic). I thought the boxed subclass issue would be the biggest, but the plan above solves that. The remaining one is dealing with (efficient) arrays (Example of the issue in bug #678663). Right now all arrays of contents in the structures (be it arrays of programs, events, descriptors, ...) are of the following form: { ... guint nb_elements; ElementType *array; /* (array length=nb_elements) */ ... } The reason why they are done as such are because I wanted to minimize memory and cpu overhead to a minimum (array is one allocation of (nb_element * sizeof(ElementType) and accessing any member content is a one op). This is not supported with g-i, therefore we need to find alternatives: 1) Use a different container structure By making it a GList (no direct member access, requires allocating 2 * nb_elements memory blocks (one for GList, one for ElementType) An alternative is making it a GArray or derivate (provides direct member access, number of allocation is a bit lower, but still requires allocating nb_elements memory blocks (for ElementType) 2) Provide custom getters Requires a lot of extra code, introducing some cpu overhead.... With the new plan of doing on-demand/if-needed parsing of sections (in the first part), maybe the overhead of using GArray would be acceptable now, and some macros could make it trivial to use. ===== For much later, I'm wondering also if extracting the PES parsing code into that helper library wouldn't be of some use for other plugins/apps. This would essentially make the mpeg-ts elements simple filters which then use the lib to make sense of the data... Food for thought. Comments welcome
Pushed a clean rebased update here (that one will stay clean and updated, as opposed to the mpegts-si branch) : http://cgit.freedesktop.org/~bilboed/gst-plugins-bad/log/?h=mpegts-rebased Changes include: * introspection support for everything * descriptor helper methods (descriptors are no longer automatically parsed) * documentation Works fine (both tsparse and tsdemux) on all streams I have lying around. comments/reviews welcome !
More updates: * new test example (tests/examples/mpeg/tsparser <gst-launch syntax>) which grabs section messages and provides some info. Gives a simple usage of the library. * based on that utility ... spotted and fixed a couple of issues (whoops) * Added more descriptor parsing (cable and satellite delivery descriptors)
And all in ! commit dc160e7ca70c4e6985be55c1b850fb829f8dcb03 Author: Edward Hervey <edward@collabora.com> Date: Mon Jul 1 08:35:26 2013 +0200 examples: Add an example of a mpeg-ts SI extractor Serves as an example of usage of the new mpegts library from an application. Will parse/dump all sections received on a bus. Usage is ./tsparse <any gst-launch line using tsdemux or tsparse> Examples: ./tsparse file:///some/mpegtsfile ! tsparse ! fakesink ./tsparse dvb://CHANNEL ! tsparse ! fakesink ./tsparse playbin uri=dvb://CHANNEL ./tsparse playbin uri=file:///some/mpegtsfile ... https://bugzilla.gnome.org/show_bug.cgi?id=702724 commit 10c929c7958fba1e33b3cfa5aa4d112f77c8f618 Author: Edward Hervey <edward@collabora.com> Date: Sun Jun 23 08:44:08 2013 +0200 dvb: Switch to MPEG-TS SI library Also serves as an example of using mpegts library from a plugin https://bugzilla.gnome.org/show_bug.cgi?id=702724 commit 92edd82c86c6f947a6f9dda3a24913040f4a0363 Author: Edward Hervey <edward@collabora.com> Date: Sun Jun 23 08:43:23 2013 +0200 mpegtsdemux: Switch to MPEG-TS SI library * Only mpeg-ts section packetization remains. * Improve code to detect duplicated sections as early as possible * Add FIXME for various issues that need fixing (but are not regressions) https://bugzilla.gnome.org/show_bug.cgi?id=702724 commit 3665e8543a7be01060db834501d99750bda1cec0 Author: Edward Hervey <edward@collabora.com> Date: Sun Jun 23 08:41:12 2013 +0200 gst-libs: New Mpeg-TS support library Exposes various MPEG-TS (ISO/IEC 13818-1) and DVB (EN 300 468) Section Information as well as descriptors for usage by plugins and applications. This replaces entirely the old GstStructure-based system for conveying mpeg-ts information to applications and other plugins. Parsing and validation is done on a "when-needed" basis. This ensures the minimal overhead for elements and applications creating and using sections and descriptors. Since all information is made available, this also allows applications to parse custom sections and descriptors. Right now the library is targeted towards parsing, but the structures could be used in the future to allow applications to create and inject sections and descriptors (for usage by various mpeg-ts elements). https://bugzilla.gnome.org/show_bug.cgi?id=702724