Bug 702724 – first-class miniobjects/API for mpeg-ts related SI (Service Information)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 702724 - first-class miniobjects/API for mpeg-ts related SI (Service Information)


Summary:	first-class miniobjects/API for mpeg-ts related SI (Service Information)


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	1.1.2
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Duplicates:	702725 (view as bug list)
Depends on:
Blocks:

Reported:	2013-06-20 09:48 UTC by Edward Hervey
Modified:	2013-07-03 07:19 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Edward Hervey 2013-06-20 09:48:08 UTC

Several issues arise right now in regards to handling mpeg-ts/DVB SI.

1) performance

  In order for the various SI to be usable by applications, we currently parse and store them in GstStructures using generic GValue (uint, string, arrays, lists, ...). This represents over 95% of the cost of parsing those SI !
  And the parsing is equally as expensive.

2) Using glib deprecated items.

  GValueArray is deprecated in glib (sigh) and is ill-suited for the kind of arrays/collection we want to expose/use.

3) Usability

  If we want to later allow apps to "inject" SI into mpegtsmux, they need to know exactly the topology of the structure to inject.
  Instead of that, there should be a clear API with structures to fill in.
  In addition, apps need to know what field they should extract, what the type should be, ....

======

 To this end, I think we should switch to a public API for the various mpeg-ts SI along with registered mini-objects.
 Ideally this should be supported by g-i also (which should work if we register boxed GType for each of them).

The "hierarchy" could look something like this:

GstMiniObject
  +-- MpegTSSection
     +-- MpegTSPAT
     +-- MpegTSNIT
     +-- MpegTSEIT
      .....

MpegTSSection {
  GstMiniObject parent;

  gint16   pid;
  guint8   table_id;
  guint16  subtable_extension;
  guint8   version_number;
  guint8   current_next_indicator;
  guint8   section_number;
  guint8   last_section_number;
  guint32  crc;
}

MpegTSPAT {
  MpegTSSection parent;

  guint16  transport_stream_id;
  guint16  nb_programs;
  ....
}


In addition, we could expose descriptors in a cleaner way (avoiding string copies also)

GstMpegTSDescriptors {
  const guint8 *backing_string;
  guint nb_descriptors;
  GstMpegTSDescriptor descriptors[]; /* Null-terminated array */
}

GstMpegTSDescriptor {
  guint8 descriptor_tag;
  guint8 descriptor_length;
  const guint8* descriptor_content; /* pointer to descriptors->backing_string */
}

and expose the various macros to parse them.

Comment 1 Edward Hervey 2013-06-20 09:52:49 UTC

*** Bug 702725 has been marked as a duplicate of this bug. ***

Comment 2 Edward Hervey 2013-06-20 09:56:22 UTC

This could maybe be added to gst-libs/gst/codecparsers/ 

Importantly:
1) We still want apps to be able to get as much as information from possible
2) We want apps to be able to easily use that information
3) We want apps to be able to create/inject SI
4) Way way way better performance

I know this might break behaviour (no longer using structures) but it's (luckily) in -bad so ABI/API can change.

Comment 3 Sebastian Dröge (slomo) 2013-06-20 09:58:29 UTC

Note that GstMiniObject doesn't allow deep inheritance (you can only have a subclass of GstMiniObject, not one of the subclass). And bindings are not absolutely happy with GstMiniObject yet either.


Otherwise this sounds like a good idea, just don't add yet another new library for this. codecparsers sounds good but should maybe be renamed to something more generic then... libgstparsers?

Comment 4 Edward Hervey 2013-06-20 10:30:17 UTC

(In reply to comment #3)
> Note that GstMiniObject doesn't allow deep inheritance (you can only have a
> subclass of GstMiniObject, not one of the subclass). And bindings are not
> absolutely happy with GstMiniObject yet either.

  Yah, but I'd like to avoid the overhead of GObject. Some sections have really little information (TDT/TOT comes to mind, which only has ... a date (in addition to the generic SI headers) and they will appear very often.

> 
> 
> Otherwise this sounds like a good idea, just don't add yet another new library
> for this. codecparsers sounds good but should maybe be renamed to something
> more generic then... libgstparsers?

  I'll work on it in codecparsers without renaming first.

Comment 5 Nicolas Dufresne (ndufresne) 2013-06-20 15:57:31 UTC

I think the proposed solution is fine, but wanted to know if in other cases it wouldn't be good to use GVariant ?

Comment 6 Edward Hervey 2013-06-20 16:32:33 UTC

what "other cases" ?

Comment 7 Tim-Philipp Müller 2013-06-20 16:42:23 UTC

Maybe even in this case?

There are two things here I guess:

a) how to make the data available
b) how applications would use/parse the data

For a) we want to avoid inefficient GValue usage at a large scale - options are exposing structs in some way, or perhaps even GVariant, or just exposing the data unparsed?

For b) the question is would applications want to use structs, or maybe it would be easier/better to provide helper functions to get/parse the info out instead and expose the data in a more raw/rough form?

Comment 8 Edward Hervey 2013-06-20 17:22:59 UTC

(In reply to comment #7)
> Maybe even in this case?
> 
> There are two things here I guess:
> 
> a) how to make the data available
> b) how applications would use/parse the data
> 
> For a) we want to avoid inefficient GValue usage at a large scale - options are
> exposing structs in some way, or perhaps even GVariant, or just exposing the
> data unparsed?
> 
> For b) the question is would applications want to use structs, or maybe it
> would be easier/better to provide helper functions to get/parse the info out
> instead and expose the data in a more raw/rough form?

So it's essentially the question of : how much do we pre-parse.

0) No parsing at all, we just provide a gchar *data and let the app deal with it
1) Only do generic SI parsing (i.e. what is common to all SI)
2) Do table_type (i.e. PAT, PMT, NIT, ...) specific parsing of the fields only (and do not touch the descriptors)
3) Do full and extensive parsing of everything, including descriptors

I think we can find a right balance somewhere between 2) and 3) so that we keep the cpu usage at a reasonably low-level in the element, yet do not impose too much work on the application.

The non-descriptor fields are straightforward and trivial to parse, so it's not a big burden.

As for the descriptors, which ones do we decide to parse and expose explicitly in the structure ?

I'd say there are two ways to go forward for this:
1) descriptors that are needed anyway for proper tsparse/tsdemux usage should be extracted explicitly.
2) descriptors which are *expected* to be present for a certain SI type should also be extracted explicitly.

So that would boil down ... to the descriptors we are already parsing in mpegtspacketizer.c. The logic is also there and I would prefer not to lose it.

That still leaves plenty of descriptors we don't parse (because they are not essential to core usage or core expectations) and they would still be available for the application to parse.

The one descriptor I might be keen on moving away... are the various string ones (due to overhead of GIConv). But it doesn't cost a lot to leave them right now (we would already gain a massive amount of overhead by no longer using structure/gvalue/...) , and would allow existing apps to trivially switch (and would be available via g-i).

that being said ... I still don't see the point about GVariant (tbh, I'm a bit clueless about it, but I fear using anything glib-originated in hotspots). I welcome an explanation though :)

Comment 9 Nicolas Dufresne (ndufresne) 2013-06-20 18:53:43 UTC

GVariant is just a natural replacement for stuff using GstStructure/GValue, it's a super-set of this. The other cases, refer to cases (in general) where we don't want an API but need to propagate recusrive data structure. GVariant is for my experience more convenient, but still more difficult then a specialized data structure.

I just asked because I was curious to see if GVariant has been illimated for some reason, or just never came up in recent design/redesign. The big down side of GVariant (from my point of view), is that there is no way to change the binary parser, so you cannot simply store the binary and parse on demand (unless your binary format is dbus format, the one current used). In few projects I've been involved (and including this case) this would have been optimal.

Comment 10 Edward Hervey 2013-06-23 07:51:08 UTC

I've put some WIP in this git branch (top 3 commits): http://cgit.freedesktop.org/~bilboed/gst-plugins-bad/log/?h=mpegts-si

Includes:
* The new library
* mpegtsbase/tsdemux/tsparse ported
* dvb ported

I tested tsparse and tsdemux on a whole bunch of ts files (including obscure ones) and didn't see any regression in behaviour

The only regression right now are that some descriptor info are not parsed/extracted.
PMT:
* DESC_DVB_AC3
* DESC_DVB_DATA_BROADCAST_ID
* DESC_DVB_DATA_BROADCAST
* DESC_DVB_CAROUSEL_IDENTIFIER
* DESC_DVB_STREAM_IDENTIFIER
* DESC_ISO_639_LANGUAGE
* DESC_DVB_SUBTITLING

NIT:
* DESC_DVB_SATELLITE_DELIVERY_SYSTEM
* DESC_DVB_TERRESTRIAL_DELIVERY_SYSTEM
* DESC_DVB_CABLE_DELIVERY_SYSTEM
* DESC_DVB_FREQUENCY_LIST
* DESC_DTG_LOGICAL_CHANNEL

EIT:
* DESC_DVB_EXTENDED_EVENT
* DESC_DVB_COMPONENT

The reason why I didn't parse those descriptors above are:
1) Because they are not needed for tsdemux/tsparse/dvb usage
2) I did parse out some not-absolutely needed descriptors (EIT DESC_DVB_EVENT and NIT network name for example) because i wanted to see more realistically the gain of switching to non-Gvalue/GstStructure usage.

Right now the sections are mini-objects. As other people mentioned... introspection support for them are crap, but they are the most lightweight refcounted solution around (sections never get copied, even when put in messages on the bus).

In order to cope with introspection support while keeping the performance gain, I'm wondering if we couldn't:
1) Add section-specific type registration (i.e. a new GType for each)
2) Add section-specific methods to extract from messages (gst_message_parse_pat() for example).

Another item is (better) support for descriptors. Right now they are an array (which obviously g-i in all its glory don't detect as arrays despite provided annotations *sigh*), which is fast to create, and fast to parse (tag/length/extended_tag are extracted for convenience). They don't duplicate the tag contents, and instead point to the location of that descriptor in the section (private) data field.

1) Should we expose the list of all "known" descriptor ids ? Essentially a big GstMpegTSDescriptorType enum with description.
2) Should we expose the macros from gst/mpegtsdemux/gstmpegdesc.h ?
3) Should we expose the method
4) Should we create new functions (therefore introspectable) and struct to extract information from (some) descriptors ?

And in a (much) longer run, how do we deal with creating section/descriptors from the app level ? Right now everything is hidden nicely in regards to allocation/freeing, but I'm wondering if that's sufficient to be able to add later on that feature with the proposed API.

Anyway, comments welcome :)

Comment 11 Edward Hervey 2013-06-23 12:53:32 UTC

Silly me, I forgot to report the speedups :)

valgrind --tool=callgrind --trace-children=yes gst-launch-1.0 pushfile:///data/tf1-rip.ts ! <element> ! fakesink

With tsparse:
 * Overall (including registry loading) : 60% faster
 * PSI handling (Called 755 times): 9.5 times faster

With tsdemux:
 * Overall (including registry loading) : almost the same
 * PSI handling *Called 4 times): 60% faster

The numbers are therefore coherent with the expected speedup.

Notes:
 * tsdemux only detects and parses the PAT/PMT/TOT tables. That is expected (and therefore gets very low gain from all of this).
 * the PSI handling was still parsing/converting all the same chains (via g_convert_with_iconv) as before. It represents 13% of the overall PSI handling. So it could be worth moving those to later ... but not that massive.
 * The SI pre-parsing/-detection code (mpegts_packetizer_push_section) costs as much as the new optimized PSI handling. At a glance it could be made almost 2 times faster (by avoiding usage of lists for example).
 * The "PSI handling" code includes creation of the GstMessage (and GstStructure to contain the GstMpegTSSection) so is comparing the same final behaviour and codepath.

Comment 12 Edward Hervey 2013-06-26 06:27:43 UTC

Some updates.

On the side, another use-case came to mind lately: supporting some 3rd party systems that will do the PID/SECTION filtering elsewhere (like with the DVB kernel system) and want to have a system by which they can parse those sections.

And I've been brainstorming the whole "when to parse" issue, "supporting g-i" issue, etc...

To support all of that I've come up with an updated plan (still not done yet in the git repo but won't involve that much work) which is a bit similar to how we deal with non-subclassable miniobjects (like GstEvent) but with a twist:

1) GstMpegTSSection remains a GstMiniObject but is no longer "subclassable" (in spirit). It contains the underlying section (unparsed) string.
2) The various "parsed" structures are registered boxed objects, which you access by calling a method on GstMpegTSSection.

The twist is that the (refcounted) GstMpegTSSection will do the parsing of the type-specific section only on-demand and will cache that result:

struct _GstMpegTSSection {
  .... /* Generic section fields */
  /< private >/
  guint8 *section_data;
  gpointer *cached_parsed; /* If parsing already happened, will contain the type-specific data */
}

const GstMpegTSPMT *gst_mpeg_ts_section_get_pmt (GstMpegTSSection * section);
const GstDateTime * gst_mpeg_ts_section_get_tot (GstMpegTSSection * section);
....


This will result in moving all the type-specific parsing from gst/mpegtsdemux/mpegtspacketizer.c to the helper library.

Bonus:
*) Still usable by mpegtsbase/tsparse/tsdemux elements where needed
*) Parsing of data only happens once
*) I can't make section parsing much cheaper (if nobody uses it, no parsing is involved except for the base section headers and no useless memory is used) :)
*) Easier to use for bindings

=====

Another item on the list is ... dealing with g-i (sic).

I thought the boxed subclass issue would be the biggest, but the plan above solves that. The remaining one is dealing with (efficient) arrays (Example of the issue in bug #678663).

Right now all arrays of contents in the structures (be it arrays of programs, events, descriptors, ...) are of the following form:
{
 ...
 guint nb_elements;
 ElementType *array; /* (array length=nb_elements) */
 ...
}

The reason why they are done as such are because I wanted to minimize memory and cpu overhead to a minimum (array is one allocation of (nb_element * sizeof(ElementType) and accessing any member content is a one op).

This is not supported with g-i, therefore we need to find alternatives:

1) Use a different container structure
  By making it a GList (no direct member access, requires allocating 2 * nb_elements memory blocks (one for GList, one for ElementType)
  An alternative is making it a GArray or derivate (provides direct member access, number of allocation is a bit lower, but still requires allocating nb_elements memory blocks (for ElementType)

2) Provide custom getters
  Requires a lot of extra code, introducing some cpu overhead....

With the new plan of doing on-demand/if-needed parsing of sections (in the first part), maybe the overhead of using GArray would be acceptable now, and some macros could make it trivial to use.

=====

For much later, I'm wondering also if extracting the PES parsing code into that helper library wouldn't be of some use for other plugins/apps. This would essentially make the mpeg-ts elements simple filters which then use the lib to make sense of the data... Food for thought.

Comments welcome

Comment 13 Edward Hervey 2013-06-30 10:34:18 UTC

Pushed a clean rebased update here (that one will stay clean and updated, as opposed to the mpegts-si branch) : http://cgit.freedesktop.org/~bilboed/gst-plugins-bad/log/?h=mpegts-rebased

Changes include:
* introspection support for everything
* descriptor helper methods (descriptors are no longer automatically parsed)
* documentation

Works fine (both tsparse and tsdemux) on all streams I have lying around.

comments/reviews welcome !

Comment 14 Edward Hervey 2013-07-01 07:50:38 UTC

More updates:

* new test example (tests/examples/mpeg/tsparser <gst-launch syntax>) which grabs section messages and provides some info. Gives a simple usage of the library.
* based on that utility ... spotted and fixed a couple of issues (whoops)
* Added more descriptor parsing (cable and satellite delivery descriptors)

Comment 15 Edward Hervey 2013-07-03 07:19:18 UTC

And all in !

commit dc160e7ca70c4e6985be55c1b850fb829f8dcb03
Author: Edward Hervey <edward@collabora.com>
Date:   Mon Jul 1 08:35:26 2013 +0200

    examples: Add an example of a mpeg-ts SI extractor
    
    Serves as an example of usage of the new mpegts library from an
    application.
    
    Will parse/dump all sections received on a bus.
    
    Usage is ./tsparse <any gst-launch line using tsdemux or tsparse>
    
    Examples:
      ./tsparse file:///some/mpegtsfile ! tsparse ! fakesink
      ./tsparse dvb://CHANNEL ! tsparse ! fakesink
      ./tsparse playbin uri=dvb://CHANNEL
      ./tsparse playbin uri=file:///some/mpegtsfile
      ...
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702724

commit 10c929c7958fba1e33b3cfa5aa4d112f77c8f618
Author: Edward Hervey <edward@collabora.com>
Date:   Sun Jun 23 08:44:08 2013 +0200

    dvb: Switch to MPEG-TS SI library
    
    Also serves as an example of using mpegts library from a plugin
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702724

commit 92edd82c86c6f947a6f9dda3a24913040f4a0363
Author: Edward Hervey <edward@collabora.com>
Date:   Sun Jun 23 08:43:23 2013 +0200

    mpegtsdemux: Switch to MPEG-TS SI library
    
    * Only mpeg-ts section packetization remains.
    * Improve code to detect duplicated sections as early as possible
    * Add FIXME for various issues that need fixing (but are not regressions)
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702724

commit 3665e8543a7be01060db834501d99750bda1cec0
Author: Edward Hervey <edward@collabora.com>
Date:   Sun Jun 23 08:41:12 2013 +0200

    gst-libs: New Mpeg-TS support library
    
    Exposes various MPEG-TS (ISO/IEC 13818-1) and DVB (EN 300 468) Section
    Information as well as descriptors for usage by plugins and applications.
    
    This replaces entirely the old GstStructure-based system for conveying
    mpeg-ts information to applications and other plugins.
    
    Parsing and validation is done on a "when-needed" basis. This ensures
    the minimal overhead for elements and applications creating and using
    sections and descriptors.
    
    Since all information is made available, this also allows applications
    to parse custom sections and descriptors.
    
    Right now the library is targeted towards parsing, but the structures
    could be used in the future to allow applications to create and inject
    sections and descriptors (for usage by various mpeg-ts elements).
    
    https://bugzilla.gnome.org/show_bug.cgi?id=702724