GNOME Bugzilla – Bug 303975
Add tar support
Last modified: 2018-05-04 08:43:51 UTC
GStreamer can't decode tar archives.
Created attachment 46382 [details] tar plugin This plugin enables GStreamer to read tar archives: filesrc location=file.tar.bz2 ! bz2dec ! tardec ! decodebin ! ... The NEW_MEDIA event disappeared in 0.9. The plugin posts a GST_TAG_LOCATION-message instead. How can I intercept this message in the filesink plugin?
This is kinda cool. I suppose you don't have the option of seeking between tracks yet, do you? (I know that part of event handling is broken... :-( ).
Not yet. How can I get tar and bz2 into CVS? The screenshot source is also waiting.
Wait for gst-plugins-unmaintained, I think. Should make a bug on that...
Also see bug #303167 for the bunzip2 support.
There is also a gst-sandbox module in CVS for experimental things and work-in-progress, not quite sure if it fits the bill though. To pass messages from upstream elements to downstream elements you'd probably use events rather than bus messages, unless you want the application to act upon the locations.
Created attachment 58881 [details] [review] Patch to hook up the tar plugin
Created attachment 58882 [details] Updated tar plugin
you rock :) Can you make the patch+tarball as a whole patch against gst-plugins-bad ? Also, I was wondering how you the multiple files. I had a look at the tarball but couldn't figure out what was going on with the GST_TAG_LOCATION. If I have a tarball with 3 files, will it only play the first file ?
How do I make the patch+tarball...? The files in the tarball will be played one after another. The pad is configured every time before a new file is passed along. It works. I haven't implemented seeking yet. Do you know if I can interpret the individual files as chapters/menus like on DVDs? Is there a plugin where I can start looking at some code? Saving is also an open question. Right now, the filename is only stored if it is passed along as location-event before new buffers arrive in. I've patched the filesrc plugin to get at least saving of 1 file in a tar file working.
Created attachment 59721 [details] Updated tar plugin You can rip now whole CDs and write the result to a tar file: gst-launch cdiocddasrc mode=continuous ! audioconvert ! vorbisenc ! oggmux ! tarenc ! filesink location=test.tar
That's quite cool, I wonder though ... how does tarenc know when to start a new file in this scenario? Glancing at the code it looks like it starts a new file whenever there's a TAG event. If that's the case, then that's not really entirely right, even if it happens to work in this scenario. Tag events can be sent at any time, sometimes multiple tag events are sent for different information, and sometimes tag events are sent to update information (e.g. mad _could_ send a tag event whenever the averate bitrate of an VBR mp3 file changes, that would be perfectly legitimate). I'm not sure what the solution to this problem is though. Maybe a new event that signals the start of a new stream unit? We don't really have a good abstraction for multiple streams yet in GStreamer, be it on the input side or on the sink side. That should probably be solved first. A great thing to bring up on the mailing list with different use-cases that need to be handled. (On a side note, 'continuous mode' in cdiocddasrc is supposed to treat the entire CD as one single stream, so one might even argue that it would be wrong if tarenc split this up into separate files; I also imagine there might be issues with missing ogg headers for the files that are created after the first one; could easily be solved by introducing a third mode of course once we know how to signal these things downstream properly). Writing a temporary file isn't really great either, it kind of breaks the whole concept of GStreamer pipelines as I see it. If you just need to fill the total number of bytes into some header when you're done, you could send a NEWSEGMENT event downstream at the end with a seek position as start and then send a small buffer containing the final length in bytes. That will update the header with the correct information. Finally, allow me some minor nitpicks about the code: * all files should have a line with your copyright in the header * elements usually store the sink and source pads they create in their element structure, so they can easily be accessed; it tends to make code more readable and avoids foo = gst_element_get_pad (GST_ELEMENT (b), "sink") all over the place (incl. the refcounting involved with that). * assertions like g_return_if_fail (GST_IS_TARENC (b)); aren't really necessary for internal functions. Some people like to use them for external ones, but I have yet to see a single case where any such assertion was ever triggered. * in the event function you do (stylized): gst_tarenc_event (GstPad * pad, GstEvent * e) { GstTarenc *b = GST_TARENC (gst_pad_get_parent (pad)); gst_object_unref (b); ... function code ... } That kind of defeats the purpose of refcounting. The reference should be held until the end of the function, especially in event and query functions. * you use 'GstTarenc *b' everywhere - IMHO something like 'GstTarenc *enc' or 'GstTarenc *tar' would be a tad nicer and would increase code readability, but that might just be me. * parent_class = g_type_class_ref (GST_TYPE_ELEMENT); should be parent_class = g_type_class_peek_parent (klass); even if you'll still find the _ref() in hundreds of old plugins. You can also use GST_BOILERPLATE, then you don't have to do the parent_class thing or the _get_type() function at all :) * there's a gst_structure_has_name() call now in 0.10 :) * it should be GstTarDec and gst_tar_dec_* and GST_TAG_DEC instead of GstTardec, gst_tardec and GST_TARDEC (very minor, but makes things more consistent with the rest of gst).
Created attachment 59872 [details] Updated tar plugin I updated the plugin according to above suggestions. The only issue remaining is determining/publishing the start of a new track/stream/file/whatever else you would call it. The plugin currently uses GST_TAG_TRACK_NUMBER. I'll ask on the mailing list.
Created attachment 67448 [details] Makefile.am Updated tar plugin (remainder be uploaded separately). There are 2 elements inside the tar plugin: tardemux and tarmux.
Created attachment 67449 [details] gsttar.c
Created attachment 67450 [details] gsttardemux.c
Created attachment 67451 [details] gsttardemux.h
Created attachment 67452 [details] gsttarmux.h
Created attachment 67453 [details] gsttarmux.c
The underlying concepts: A stream cannot be broken down without further decoding (i.e. 1 track on a CD, one file in a tarball). Each pad supplies exactly one stream. Hence: The tardemux element creates a separate pad for each file in the tarball. The tarmux element combines streams from different request pads. If you use gst-launch to test the element, playback stops after the first stream, because playbin stops playing after the first EOS has been received (even if there are more pads that are still providing data). I am going to resolve this problem separately (see #336951).
Created attachment 67468 [details] gsttardemux.c Tiny clean up.
This is a nice plugin. If you could put it all into a patch that applies with latest gst-plugins-bad I'm going to commit it :)
As a sidenote, there's a FIXME about how to get the filename extension from the caps. Nowadays you can use the URI query to get the URI used by the source and can extract the extension from that.
Created attachment 139982 [details] [review] Patch to hook up the tar plugin
Created attachment 139983 [details] gsttardemux.c Updated. It works with "filesrc location=test.tar ! tardemux ! filesink location=test" when test.tar contains exactly one file. I didn't have time to do further testing. Regarding the suggestion above (get the extenstion from the URI used by the source): I don't want the extension used by the source file. I need the extension of each file in the tar file. I use typefind to identify the caps of each file, but how can I get the corresponding extension? can extract the extension
Created attachment 140237 [details] gsttardemux.c Simplification, clean up etc. Seeking still doesn't work. And I need to guard against empty blocks in order to avoid the creation of 1 empty source pad at the end for each 512-bytes-block of padding.
Created attachment 140385 [details] gsttardemux.c Still to be done: Seeking.
Created attachment 140591 [details] gsttardemux.c Added a piece of documentation. Still to complete. And seeking is still to be implemented.
Thanks, I've committed everything so far, complete patches will be attached soon. Please provide the next patches as incremental ones on top of the two following ones. One note on the muxer. You should probably wait in the sink's chain functions until the previous sinks have finished, can probably best be done by a GCond somehow.
And another note on the muxer. You can get the extension for caps by finding a typefind factory for this type and getting the extensions that it handles. That's not easy but it should work well. Also you could add a "filename" property on the sinkpads IMHO. And a note on the demuxer, don't push tags events downstream but use gst_element_found_tags_for_pad() but only call it after you have caps on the pad and sent a newsegment event. That's actually the second thing that needs to be changed, you need to provide proper newsegment events in both elements. The muxer should send a single one in BYTES format and drop the ones from upstream, the demuxer should send one for every srcpad in BYTES format and also drop the upstream ones.
Created attachment 140759 [details] [review] 0001-tar-Add-a-tar-muxer-and-demuxer-plugin.patch
Created attachment 140760 [details] [review] 0002-tarmux-Make-the-sink-pads-request-pads-and-don-t-add.patch
Bug #563828 must be fixed first because tardemux output stream usually requires other demuxers
Now that everything is committed, there is probably no reason to keep this bug open.
Hm, it really isn't committed yet.
The decodebin2 problem is fixed but the tar demuxer is not yet committed. I've added some comments about the tar stuff in comment #29 and comment #30.
(In reply to comment #29) > Thanks, I've committed everything so far, complete patches will be attached > soon. > Please provide the next patches as incremental ones on top of the two following > ones. > sebastian, you say you commited it ... yet later you say you didn't ?
I had it committed locally but lost that branch. I think all the changes are in this bug though. Anyway, what should happen with this?
I think this belongs to a vfs layer. If we want to add something like tis, can we use libarchive or something, so that we don't end up with a bunch of plugins for each format?
Yes, let's close this for the time being then. Instead of having our own implementation for everything, this should at least use libarchive but ideally directly live in the VFS layer of the OS