GNOME Bugzilla – Bug 442034
[avi] add support for subtitle streams (GAB2)
Last modified: 2007-12-18 10:35:36 UTC
GStreamer refuses to decode alot of my AVI-Files( without usefull error message), namely those which have embedded subtitles. I couldnt get usefull debug-info from gst-launch, but it should be easy to reproduce with the attached file. Note that even the first AVI-Specification defines "txts" Streams, so those should be ignored at worst
Created attachment 88996 [details] Small AVI to reproduce the bug.
Thanks for the bug report and the sample file. This should be partially fixed in CVS: 2007-05-30 Tim-Philipp Müller <tim at centricular dot net> * gst/avi/gstavidemux.c: (gst_avi_demux_base_init), (gst_avi_demux_reset), (gst_avi_demux_parse_stream): * gst/avi/gstavidemux.h: Parse subtitle text streams instead of erroring out (#442034). Still needs a parser for the subtitles to actually show up. Still need a parser, as it says in the ChangeLog.
Thanks for the fast response and fix (I dint ask for more ;) ). Regarding parser, the GAB2 "datachunk" (there is only one, preferably interleaved before the data is needed - but the tool I use inserts them as last movi-chunk for no arguable reason) is nothing but the contents of whole .srt or .ssa File. AFAIK .srt is already supported by GStreamer, so this shouldnt be much work? Heres a simple description of the header, nothing really suprising. As said, theres also a single movi-chunk containing a whole subtitle in .srt/.ssa format. (Dont see why supporting multiple chunks should be a problem though) --------------------------------------------------------- *Stream header chunk typedef struct { FOURCC fccType; // "txts" FOURCC fccHandler; // 00 00 00 00 DWORD dwFlags; WORD wPriority; WORD wLanguage; DWORD dwInitialFrames; DWORD dwScale; DWORD dwRate; // dwRate / dwScale == duration in seconds DWORD dwStart; DWORD dwLength; // In units above..., should be 1 DWORD dwSuggestedBufferSize; DWORD dwQuality; DWORD dwSampleSize; // = 0 -> treated as VBR RECT rcFrame; // 0, 0, 0, 0 } AVIStreamHeader; *Stream format chunk This chunk has a size of 0. --------------------------------------------------------- Sorry if I said "simple stuff" a couple times, I`d do it myself but GStreamer is as alien to me as GObject. I guess its not much work if you are familar with those things ;)
I was wrong as the movi-chunk contains a header before the actual .srt/.ssa data. Heres the description: ----------------------------------- One subtitle stream is stored in one single chunk. That chunk contains header data, followed by an entire SRT or SSA file. If that file is using UTF- 8 encoding, the BOM should be included as well. The header is defined as follows: char[4]; // ’GAB2’ BYTE 0x00; WORD 0x02; // unicode DWORD dwSize_name; // length of stream name in bytes char name[dwSize_name]; // zero-terminated subtitle stream name encoded in UTF-16 WORD 0x04; DWORD dwSize; // size of SRT/SSA text file char data[dwSize]; // entire SRT/SSA file -----------------------------
Created attachment 100954 [details] [review] basic implementation of avi subtitle parsing This patch allow you to view this avi file with subtitles. gst-launch filesrc location=subtitle.avi ! avidemux name=demux ! queue ! avisubtitle ! subparse ! overlay. demux. ! queue ! decodebin ! ffmpegcolorspace ! textoverlay name=overlay ! autovideosink But some questions: - What do we need to do with the name of the subtitle (UTF-16)? - What do we need to do with the BOM of the srt file ? Now, the subparse element does not accept this header. IMHO we peed to push it and subparse must handle this. In this patch the BOM is fixed cut off on 3 bytes so it will only work with UTF-8 and BOM provided...
Cool stuff! Looks mostly fine, but here are the usual nitpicks: gst_avi_subtitle_chain: - one shouldn't use g_assert() to check input really; it's more meant to be used as a guard to ensure internal consistency (make sure internal decision-making or calculations ended up as they should, or maybe to clarify implied preconditions in certain bits of code to make it easier to grok). I you don't get as much data as you expect to, you could either return FLOW_UNEXPECTED or a goto broken_gap2_chunk; where you post a STREAM DECODE error message on the bus and return FLOW_ERROR (I'd do the latter). - if you check for ASCII characters, it increases the readability of the code if you use e.g. 'A' rather than 0x41 (or even memcmp or strncmp in places that are not particularly performance critical) - should use GST_WARNING_OBJECT to print debug lines for broken bitstream (IMHO) - if the bitstream is broken, post a STREAM DECODE error on the bus using the GST_ELEMENT_ERROR macro and return FLOW_ERROR. Most GStreamer code puts error goto labels at the end of the chain function for this kind of stuff FWIW :) - you don't consistently check whether you're reading beyond the size of the buffer (esp. with name_length, but also with the other reads) - all variables need to be declared at the beginning of a block (this is re. the 'file' subbuffer) - gst_element_get_pad() returns a reference which you're leaking here - you should always pass the flow return from gst_pad_push() back upstream gst_avi_subtitle_class_init: - the boilerplate macro already does the parent_class stuff for you gst_avi_subtitle_init: - never put code that does something into g_assert() or g_return_if_fail() or the like statements, otherwise those function calls will just disappear if someone compiles the plugin against a GLib with assertions disabled. - save the pads you create in the element structure (and use that in your chain function instead of the _get_pad) - use GST_DEBUG_FUNCPTR(gst_avi_subtitle_chain) in _set_chain_function header: - not used: GstAviSubtitlePrivate - not used: GstAviSubtitlesubtitle_header_name - not used: GstAviSubtitlesubtitle_file As for the BOM: - yes, ideally subparse should recognise that and convert its input on-the-fly as approriate (I think there's even a bug open for this) - if you don't want to fix subparse right now, you can of course just parse the BOM yourself and convert the output to UTF-8 yourself as required, and then push that down to subparse. - what to do with the subtitle name depends a bit on what it contains; we could just ignore it for now, or post it on the bus (but then the app might mistake that as the main title of the movie or so, that wouldn't be good either). Last question: - does this work right with seeking yet?
Created attachment 100985 [details] [review] subtitle parsing 2 Thanks for the long review ;-) This is a new patch with all the remarks that you mentioned solved. I didn't implemented UTF-16 yet because I don't have any test files of it. Now it checks if there is a UTF-8 BOM and if available removes it. Also tried to add the beginnings of a unit test for this new element. seeking support is still on the TODO list ;-)
Created attachment 101125 [details] [review] avi subtitle parse Added full unit testing and fixed some memory leaks.
Looks good enough to commit (I'd keep it at rank NONE until the last issues are ironed out and seeking works right), provided the unit test passes and is valgrind-clean (make elements/avisubtitle.valgrind). Just a few more comments: - in gst_avi_subtitle_utf8_file: you probably meant file[0] == 0xef, not file[0] = 0x0ef (best to get rid of the parentheses around the individual comparisons, so things like this get caught by the compiler). - I'd probably do some kind of detect_encoding() function that skips any BOM and returns an encoding string, and then use g_convert() with that. Alternatively, it's worth using g_utf8_validate() when assuming UTF-8 encoding (this can be done after committing) - in gst_avi_subtitle_chain: won't g_utf16_to_utf8() assume the byte order of the UTF-16 input is native byte order? In other words: will this work right on big endian systems given that the input is (I guess) UTF-16LE?
Committed. 2007-12-18 Thijs Vermeir * gst/avi/Makefile.am: * gst/avi/gstavi.c: * gst/avi/gstavisubtitle.c: * gst/avi/gstavisubtitle.h: * tests/check/Makefile.am: * tests/check/elements/avisubtitle.c: * win32/common/config.h: Add avi subtitle element for bug #442034. Need seeking support and more support for character conversion.