After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 303975 - Add tar support
Add tar support
Status: RESOLVED WONTFIX
Product: GStreamer
Classification: Platform
Component: gst-plugins-bad
git master
Other All
: Normal enhancement
: 0.10.15
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on: 303167 311167 563828
Blocks:
 
 
Reported: 2005-05-12 22:00 UTC by Lutz Mueller
Modified: 2018-05-04 08:43 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement


Attachments
tar plugin (9.65 KB, application/x-compressed-tar)
2005-05-12 22:06 UTC, Lutz Mueller
  Details
Patch to hook up the tar plugin (1.21 KB, patch)
2006-02-07 20:34 UTC, Lutz Mueller
needs-work Details | Review
Updated tar plugin (11.50 KB, application/x-compressed-tar)
2006-02-07 20:35 UTC, Lutz Mueller
  Details
Updated tar plugin (11.56 KB, application/x-compressed-tar)
2006-02-19 17:36 UTC, Lutz Mueller
  Details
Updated tar plugin (12.31 KB, application/x-compressed-tar)
2006-02-21 19:00 UTC, Lutz Mueller
  Details
Makefile.am (263 bytes, text/plain)
2006-06-15 22:17 UTC, Lutz Mueller
  Details
gsttar.c (2.34 KB, text/plain)
2006-06-15 22:18 UTC, Lutz Mueller
  Details
gsttardemux.c (20.26 KB, text/plain)
2006-06-15 22:18 UTC, Lutz Mueller
  Details
gsttardemux.h (1.35 KB, text/plain)
2006-06-15 22:19 UTC, Lutz Mueller
  Details
gsttarmux.h (1.32 KB, text/plain)
2006-06-15 22:19 UTC, Lutz Mueller
  Details
gsttarmux.c (18.33 KB, text/plain)
2006-06-15 22:19 UTC, Lutz Mueller
  Details
gsttardemux.c (20.22 KB, text/plain)
2006-06-16 06:15 UTC, Lutz Mueller
  Details
Patch to hook up the tar plugin (541 bytes, patch)
2009-08-05 21:37 UTC, Lutz Mueller
committed Details | Review
gsttardemux.c (20.65 KB, text/plain)
2009-08-05 21:44 UTC, Lutz Mueller
  Details
gsttardemux.c (16.67 KB, text/plain)
2009-08-08 22:15 UTC, Lutz Mueller
  Details
gsttardemux.c (18.53 KB, text/plain)
2009-08-10 22:09 UTC, Lutz Mueller
  Details
gsttardemux.c (18.87 KB, text/plain)
2009-08-12 21:26 UTC, Lutz Mueller
  Details
0001-tar-Add-a-tar-muxer-and-demuxer-plugin.patch (46.28 KB, patch)
2009-08-14 10:45 UTC, Sebastian Dröge (slomo)
committed Details | Review
0002-tarmux-Make-the-sink-pads-request-pads-and-don-t-add.patch (1.33 KB, patch)
2009-08-14 10:51 UTC, Sebastian Dröge (slomo)
committed Details | Review

Description Lutz Mueller 2005-05-12 22:00:17 UTC
GStreamer can't decode tar archives.
Comment 1 Lutz Mueller 2005-05-12 22:06:29 UTC
Created attachment 46382 [details]
tar plugin

This plugin enables GStreamer to read tar archives:

filesrc location=file.tar.bz2 ! bz2dec ! tardec ! decodebin ! ...

The NEW_MEDIA event disappeared in 0.9. The plugin posts a
GST_TAG_LOCATION-message instead. How can I intercept this message in the
filesink plugin?
Comment 2 Ronald Bultje 2005-05-13 11:27:35 UTC
This is kinda cool. I suppose you don't have the option of seeking between
tracks yet, do you? (I know that part of event handling is broken... :-( ).
Comment 3 Lutz Mueller 2005-05-16 21:34:07 UTC
Not yet. How can I get tar and bz2 into CVS? The screenshot source is also waiting.
Comment 4 Andy Wingo 2005-07-16 13:37:12 UTC
Wait for gst-plugins-unmaintained, I think. Should make a bug on that...
Comment 5 Andy Wingo 2005-07-16 15:45:15 UTC
Also see bug #303167 for the bunzip2 support.
Comment 6 Tim-Philipp Müller 2005-11-11 09:40:01 UTC
There is also a gst-sandbox module in CVS for experimental things and
work-in-progress, not quite sure if it fits the bill though.

To pass messages from upstream elements to downstream elements you'd probably
use events rather than bus messages, unless you want the application to act upon
the locations.
Comment 7 Lutz Mueller 2006-02-07 20:34:25 UTC
Created attachment 58881 [details] [review]
Patch to hook up the tar plugin
Comment 8 Lutz Mueller 2006-02-07 20:35:51 UTC
Created attachment 58882 [details]
Updated tar plugin
Comment 9 Edward Hervey 2006-02-07 21:24:22 UTC
you rock :)

Can you make the patch+tarball as a whole patch against gst-plugins-bad ?

Also, I was wondering how you the multiple files. I had a look at the tarball but couldn't figure out what was going on with the GST_TAG_LOCATION.

If I have a tarball with 3 files, will it only play the first file ?
Comment 10 Lutz Mueller 2006-02-08 18:36:35 UTC
How do I make the patch+tarball...?

The files in the tarball will be played one after another. The pad is configured every time before a new file is passed along. It works.

I haven't implemented seeking yet. Do you know if I can interpret the individual files as chapters/menus like on DVDs? Is there a plugin where I can start looking at some code?

Saving is also an open question. Right now, the filename is only stored if it is passed along as location-event before new buffers arrive in. I've patched the filesrc plugin to get at least saving of 1 file in a tar file working.
Comment 11 Lutz Mueller 2006-02-19 17:36:15 UTC
Created attachment 59721 [details]
Updated tar plugin

You can rip now whole CDs and write the result to a tar file:

gst-launch cdiocddasrc mode=continuous ! audioconvert ! vorbisenc ! oggmux ! tarenc ! filesink location=test.tar
Comment 12 Tim-Philipp Müller 2006-02-19 19:43:35 UTC
That's quite cool, I wonder though ... how does tarenc know when to start a new file in this scenario? Glancing at the code it looks like it starts a new file whenever there's a TAG event. If that's the case, then that's not really entirely right, even if it happens to work in this scenario. Tag events can be sent at any time, sometimes multiple tag events are sent for different information, and sometimes tag events are sent to update information (e.g. mad _could_ send a tag event whenever the averate bitrate of an VBR mp3 file changes, that would be perfectly legitimate).

I'm not sure what the solution to this problem is though. Maybe a new event that signals the start of a new stream unit? We don't really have a good abstraction for multiple streams yet in GStreamer, be it on the input side or on the sink side. That should probably be solved first. A great thing to bring up on the mailing list with different use-cases that need to be handled.

(On a side note, 'continuous mode' in cdiocddasrc is supposed to treat the entire CD as one single stream, so one might even argue that it would be wrong if tarenc split this up into separate files; I also imagine there might be issues with missing ogg headers for the files that are created after the first one; could easily be solved by introducing a third mode of course once we know how to signal these things downstream properly).

Writing a temporary file isn't really great either, it kind of breaks the whole concept of GStreamer pipelines as I see it. If you just need to fill the total number of bytes into some header when you're done, you could send a NEWSEGMENT event downstream at the end with a seek position as start and then send a small buffer containing the final length in bytes. That will update the header with the correct information.


Finally, allow me some minor nitpicks about the code:

 * all files should have a line with your copyright in the header

 * elements usually store the sink and source pads they create in their
   element structure, so they can easily be accessed; it tends to make
   code more readable and avoids
     foo = gst_element_get_pad (GST_ELEMENT (b), "sink")
   all over the place (incl. the refcounting involved with that).  

 * assertions like 

     g_return_if_fail (GST_IS_TARENC (b));

   aren't really necessary for internal functions. Some people like
   to use them for external ones, but I have yet to see a single case
   where any such assertion was ever triggered.

 * in the event function you do (stylized):

     gst_tarenc_event (GstPad * pad, GstEvent * e)
     {
       GstTarenc *b = GST_TARENC (gst_pad_get_parent (pad));

       gst_object_unref (b);

       ... function code ...
     }
       
   That kind of defeats the purpose of refcounting. The reference
   should be held until the end of the function, especially in
   event and query functions.

 * you use 'GstTarenc *b' everywhere - IMHO something like
   'GstTarenc *enc' or 'GstTarenc *tar' would be a tad nicer and
   would increase code readability, but that might just be me.


 *   parent_class = g_type_class_ref (GST_TYPE_ELEMENT);
   should be
     parent_class = g_type_class_peek_parent (klass);
   even if you'll still find the _ref() in hundreds of old
   plugins. You can also use GST_BOILERPLATE, then you don't
   have to do the parent_class thing or the _get_type()
   function at all :)

 * there's a gst_structure_has_name() call now in 0.10 :)

 * it should be GstTarDec and gst_tar_dec_* and GST_TAG_DEC instead
   of GstTardec, gst_tardec and GST_TARDEC (very minor, but makes
   things more consistent with the rest of gst).
Comment 13 Lutz Mueller 2006-02-21 19:00:18 UTC
Created attachment 59872 [details]
Updated tar plugin

I updated the plugin according to above suggestions. 

The only issue remaining is determining/publishing the start of a new track/stream/file/whatever else you would call it. The plugin currently uses GST_TAG_TRACK_NUMBER. I'll ask on the mailing list.
Comment 14 Lutz Mueller 2006-06-15 22:17:42 UTC
Created attachment 67448 [details]
Makefile.am

Updated tar plugin (remainder be uploaded separately). There are 2 elements inside the tar plugin: tardemux and tarmux.
Comment 15 Lutz Mueller 2006-06-15 22:18:10 UTC
Created attachment 67449 [details]
gsttar.c
Comment 16 Lutz Mueller 2006-06-15 22:18:37 UTC
Created attachment 67450 [details]
gsttardemux.c
Comment 17 Lutz Mueller 2006-06-15 22:19:09 UTC
Created attachment 67451 [details]
gsttardemux.h
Comment 18 Lutz Mueller 2006-06-15 22:19:34 UTC
Created attachment 67452 [details]
gsttarmux.h
Comment 19 Lutz Mueller 2006-06-15 22:19:56 UTC
Created attachment 67453 [details]
gsttarmux.c
Comment 20 Lutz Mueller 2006-06-15 22:27:54 UTC
The underlying concepts:

A stream cannot be broken down without further decoding (i.e. 1 track on a CD, one file in a tarball). Each pad supplies exactly one stream.

Hence: The tardemux element creates a separate pad for each file in the tarball. The tarmux element combines streams from different request pads.

If you use gst-launch to test the element, playback stops after the first stream, because playbin stops playing after the first EOS has been received (even if there are more pads that are still providing data). I am going to resolve this problem separately (see #336951).
Comment 21 Lutz Mueller 2006-06-16 06:15:23 UTC
Created attachment 67468 [details]
gsttardemux.c

Tiny clean up.
Comment 22 Sebastian Dröge (slomo) 2009-07-29 13:39:10 UTC
This is a nice plugin. If you could put it all into a patch that applies with latest gst-plugins-bad I'm going to commit it :) 
Comment 23 Sebastian Dröge (slomo) 2009-07-29 13:47:05 UTC
As a sidenote, there's a FIXME about how to get the filename extension from the caps. Nowadays you can use the URI query to get the URI used by the source and can extract the extension from that.
Comment 24 Lutz Mueller 2009-08-05 21:37:12 UTC
Created attachment 139982 [details] [review]
Patch to hook up the tar plugin
Comment 25 Lutz Mueller 2009-08-05 21:44:03 UTC
Created attachment 139983 [details]
gsttardemux.c

Updated. It works with "filesrc location=test.tar ! tardemux ! filesink location=test" when test.tar contains exactly one file. I didn't have time to do further testing. 

Regarding the suggestion above (get the extenstion from the URI used by the source): I don't want the extension used by the source file. I need the extension of each file in the tar file. I use typefind to identify the caps of each file, but how can I get the corresponding extension?
can extract the extension
Comment 26 Lutz Mueller 2009-08-08 22:15:45 UTC
Created attachment 140237 [details]
gsttardemux.c

Simplification, clean up etc. Seeking still doesn't work. And I need to guard against empty blocks in order to avoid the creation of 1 empty source pad at the end for each 512-bytes-block of padding.
Comment 27 Lutz Mueller 2009-08-10 22:09:02 UTC
Created attachment 140385 [details]
gsttardemux.c

Still to be done: Seeking.
Comment 28 Lutz Mueller 2009-08-12 21:26:28 UTC
Created attachment 140591 [details]
gsttardemux.c

Added a piece of documentation. Still to complete. And seeking is still to be implemented.
Comment 29 Sebastian Dröge (slomo) 2009-08-14 10:31:24 UTC
Thanks, I've committed everything so far, complete patches will be attached soon.
Please provide the next patches as incremental ones on top of the two following ones.

One note on the muxer. You should probably wait in the sink's chain functions until the previous sinks have finished, can probably best be done by a GCond somehow.
Comment 30 Sebastian Dröge (slomo) 2009-08-14 10:39:23 UTC
And another note on the muxer. You can get the extension for caps by finding a typefind factory for this type and getting the extensions that it handles. That's not easy but it should work well. Also you could add a "filename" property on the sinkpads IMHO.

And a note on the demuxer, don't push tags events downstream but use gst_element_found_tags_for_pad() but only call it after you have caps on the pad and sent a newsegment event.

That's actually the second thing that needs to be changed, you need to provide proper newsegment events in both elements. The muxer should send a single one in BYTES format and drop the ones from upstream, the demuxer should send one for every srcpad in BYTES format and also drop the upstream ones.
Comment 31 Sebastian Dröge (slomo) 2009-08-14 10:45:21 UTC
Created attachment 140759 [details] [review]
0001-tar-Add-a-tar-muxer-and-demuxer-plugin.patch
Comment 32 Sebastian Dröge (slomo) 2009-08-14 10:51:16 UTC
Created attachment 140760 [details] [review]
0002-tarmux-Make-the-sink-pads-request-pads-and-don-t-add.patch
Comment 33 Sebastian Dröge (slomo) 2009-09-24 12:35:43 UTC
Bug #563828 must be fixed first because tardemux output stream usually requires other demuxers
Comment 34 Lutz Mueller 2011-02-22 20:24:18 UTC
Now that everything is committed, there is probably no reason to keep this bug open.
Comment 35 Sebastian Dröge (slomo) 2011-02-24 12:34:44 UTC
Hm, it really isn't committed yet.
Comment 36 Sebastian Dröge (slomo) 2011-02-24 12:36:02 UTC
The decodebin2 problem is fixed but the tar demuxer is not yet committed. I've added some comments about the tar stuff in comment #29 and comment #30.
Comment 37 Edward Hervey 2013-07-17 09:27:55 UTC
(In reply to comment #29)
> Thanks, I've committed everything so far, complete patches will be attached
> soon.
> Please provide the next patches as incremental ones on top of the two following
> ones.
> 

sebastian, you say you commited it ... yet later you say you didn't ?
Comment 38 Sebastian Dröge (slomo) 2013-07-17 10:23:29 UTC
I had it committed locally but lost that branch. I think all the changes are in this bug though.

Anyway, what should happen with this?
Comment 39 Stefan Sauer (gstreamer, gtkdoc dev) 2014-01-03 20:54:23 UTC
I think this belongs to a vfs layer. If we want to add something like tis, can we use libarchive or something, so that we don't end up with a bunch of plugins for each format?
Comment 40 Sebastian Dröge (slomo) 2018-05-04 08:43:51 UTC
Yes, let's close this for the time being then. Instead of having our own implementation for everything, this should at least use libarchive but ideally directly live in the VFS layer of the OS