Bug 590014 – Future of parsers

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 590014 - Future of parsers


Summary:	Future of parsers


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	git master
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	NONE
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-07-28 13:58 UTC by Thiago Sousa Santos
Modified:	2012-08-13 23:18 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Thiago Sousa Santos 2009-07-28 13:58:16 UTC

While discussing asf remuxing with Christian, we started talking about requiring parsed input on all muxers, and forcing demuxers to mark their output as parsed (when suitable) but we currently don't have a definition of what a parsed stream is. Another problem is that different containers might have different "parsed" definitions.

It would be nice if we could come up with a reasonable definition of what a parsed stream is and how it would interoperate it with different container formats. Also, having a roadmap would help.

Christian mentioned he had already discussed this with other people, I'll ask him put it down here as a starting point.

Comment 1 Sebastian Dröge (slomo) 2009-07-29 11:29:02 UTC

About a year ago David Schleef proposed something for the future of muxers and parsers (look at the gstreamer-devel list). In short the plan was:
a) Let parsers output different formats and convert between them. There would be one format that contains all information needed for putting the codec into MP4, one format for Matroska, etc.
b) Let muxers only accept it's special format.

This would solve this bug, would remove specific codec handling inside the muxers and would simplify a lot of code while making the parsers only slightly more complex.


Also parsers could support a "simple" format, which would be like the format that most parsers nowadays output: A framed and timestamped stream of buffers + useful caps.

Comment 2 Wim Taymans 2009-08-04 13:04:27 UTC

I would like to get rid of parsers.

Comment 3 David Schleef 2009-08-04 22:14:28 UTC

and replace parsing functionality with what?

Comment 4 Sebastian Dröge (slomo) 2009-08-05 04:40:57 UTC

(In reply to comment #3)
> and replace parsing functionality with what?

A library in -base that handles parsing of different formats, similar to the tags library that handles parsing & writing of vorbiscomments and ID3 tags. That was Wim's idea last time this was dicussed a few days ago.

Comment 5 Thiago Sousa Santos 2009-12-11 02:20:16 UTC

I've been thinking about this lately and had an idea (I haven't looked at this vorbiscomments and ID3 tags).

I think of this parsing library to be implemented below GstAdapter, something like GstParsingAdapter, and the elements would push buffers to it (just like a regular adapter) but then would do gst_parsing_adapter_pull_parsed to get parsed buffers with caps set.

The real parsing would be done by an internal object that would be held by the GstParsingAdapter and implement some set of functions. The user would do something like gst_parsing_adapter_set_format to set it (using caps? or passing this internal object instance?).

so far, I can imagine gst_parsing_adapter_pull_parsed returning OK, PARSE_ERROR or NEED_MORE_DATA

Note that it would also be possible to integrate it with typefinding functions and auto-instantiate this internal object, resulting in a generic typefind-parser adapter.

Muxers would benefit from using those parsers instead of regular adapters and not require parsed inputs anymore. Another good side-effect is that we could wrap up those different 'parsing formats' in a single plugin and have parser elements (maybe someone would still want them and it would be simple to maintain), including the all-so-powerfull typefind-parser.

Hope I made my idea understandable and that it is a good one.

Note that I haven't thought of different requirements of parsing for the same format by different muxers as I haven't seen this anywhere (not saying it doesn't exist out there, just that I've never handled it in my still short time around here)

Comment 6 Rob Clark 2010-01-16 08:14:01 UTC

btw, a random musing:  the handful of parsers that I've looked at so far suffer from bad performance if the size of the input buffer is much smaller than the size of the packet they are trying to parse..  ie. in the NEED_MORE_DATA they keep re-parsing the accumulated data each time a new buffer arrives.  (And each new input buffer gets pushed to an adapter, and subsequent gst_adapter_peek() copies it into a growing buffer.. and so on.)  For an extreme example, try running a 20mpix image thru jpegparse with default filesrc 4kb input buffers.

A couple things that would be helpful..  (1) something like GstByteReader that could deal with a list of non-contiguous buffers to avoid the memcpy currently happening in adapter on each new input buffers, and (2) parsers that could operate incrementally, saving their state in the NEED_MORE_DATA case and resuming parsing on next input buffer without having to backtrack and reparse.  (But this second point, I'm not really sure offhand of any way to do this in a generic way..)

anyways, I'm not really sure if that is on or off topic for this bug.. but maybe something to keep in mind if re-thinking how parsers should work.

Comment 7 Tim-Philipp Müller 2010-01-16 12:40:11 UTC

I do think this is a bit off-topic :-)  This bug is supposed to be about the framework-wide design aspects really, not so much about element-specific implementation issues.

If you have some new APIs in mind that would benefit parsers, please file feature request bugs, so we can add them. If you have encountered and/or identified performance issues with specific parsers, please also file bugs about them against those elements.

The jpeg parsing issue is somwhat known (see bug #583047), at least for the parser in jpegdec (and should be easy enough to fix IIRC). jpegparse only exists in bugzilla so far as far as I know.

Comment 8 Stefan Sauer (gstreamer, gtkdoc dev) 2010-01-28 09:15:02 UTC

I think this also suffers from a terminology problem. We have this draft
http://cgit.freedesktop.org/gstreamer/gstreamer/tree/docs/design/draft-klass.txt
It defines Encoders/Decoders/Muxers/Demuxer and even Extracter/Formatter.

Right now we use Parsers to packetize incoming streams into units for decoding. E.g. standalone bitstreams that are not in a container format are parsed and converted into a format that codecs expect. This allows us to spare all codec implementations to reimplement that.

We also use parsers to to packetize raw audio/video in case we know the format from somewhere.

One benefit of having parsers is that one can find out more details from media, without plugin codecs. On platforms where codecs are run on dsp/gpu instantiating a codec can be slow or can have limmitations.

But then we also use the Parser terminology for 'demuxers' that only have single media streams (wavparse, auparse, ...).


So I actualy like the original plan of having parsers and also make them more madatory. I don't think we need the counterparts in many cases (e.g. rawparse obviously does not need them). For bitstream formats or metadata adding and the single stream format they make sense.

I admit I failed to get the point in comment #3 for the muxers - which of our muxers uses GstAdapater?

Comment 9 Sebastian Dröge (slomo) 2011-05-20 07:17:31 UTC

Is there a plan now how we should move forward?

Comment 10 Tim-Philipp Müller 2012-08-13 23:18:12 UTC

Not sure there's anything in here that warrants keeping this bug open, I think we're covering most of the practical aspects in our parsers-cum-converters these days, and I think it is a fairly intuitive system (unlike Dave's suggestion, IMHO). We also have better signalling in place now both for what muxers require and for what stream-format various streams are.

So let's close this. But please re-open or clone a new bug if there's still something we should look into.