After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 442034 - [avi] add support for subtitle streams (GAB2)
[avi] add support for subtitle streams (GAB2)
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal enhancement
: 0.10.7
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2007-05-29 10:55 UTC by Norbert Lange
Modified: 2007-12-18 10:35 UTC
See Also:
GNOME target: ---
GNOME version: 2.17/2.18


Attachments
Small AVI to reproduce the bug. (9.54 KB, application/octet-stream)
2007-05-29 10:56 UTC, Norbert Lange
  Details
basic implementation of avi subtitle parsing (8.66 KB, patch)
2007-12-14 13:58 UTC, Thijs Vermeir
needs-work Details | Review
subtitle parsing 2 (15.76 KB, patch)
2007-12-15 00:49 UTC, Thijs Vermeir
none Details | Review
avi subtitle parse (18.81 KB, patch)
2007-12-17 14:03 UTC, Thijs Vermeir
committed Details | Review

Description Norbert Lange 2007-05-29 10:55:31 UTC
GStreamer refuses to decode alot of my AVI-Files( without usefull error message), namely those which have embedded subtitles.
I couldnt get usefull debug-info from gst-launch, but it should be easy to reproduce with the attached file. Note that even the first AVI-Specification defines "txts" Streams, so those should be ignored at worst
Comment 1 Norbert Lange 2007-05-29 10:56:34 UTC
Created attachment 88996 [details]
Small AVI to reproduce the bug.
Comment 2 Tim-Philipp Müller 2007-05-30 14:41:57 UTC
Thanks for the bug report and the sample file. This should be partially fixed in CVS:

 2007-05-30  Tim-Philipp Müller  <tim at centricular dot net>

        * gst/avi/gstavidemux.c: (gst_avi_demux_base_init),
        (gst_avi_demux_reset), (gst_avi_demux_parse_stream):
        * gst/avi/gstavidemux.h:
          Parse subtitle text streams instead of erroring out (#442034). Still
          needs a parser for the subtitles to actually show up.

Still need a parser, as it says in the ChangeLog.
Comment 3 Norbert Lange 2007-06-01 21:04:08 UTC
Thanks for the fast response and fix (I dint ask for more ;) ).
Regarding parser, the GAB2 "datachunk" (there is only one, preferably interleaved before the data is needed - but the tool I use inserts them as last movi-chunk for no arguable reason) is nothing but the contents of whole .srt or .ssa File.
AFAIK .srt is already supported by GStreamer, so this shouldnt be much work?

Heres a simple description of the header, nothing really suprising. As said, theres also a single movi-chunk containing a whole subtitle in .srt/.ssa format. (Dont see why supporting multiple chunks should be a problem though)

---------------------------------------------------------

*Stream header chunk
typedef struct {
FOURCC fccType; // "txts"
FOURCC fccHandler; // 00 00 00 00
DWORD dwFlags;
WORD wPriority;
WORD wLanguage;
DWORD dwInitialFrames;
DWORD dwScale;
DWORD dwRate; // dwRate / dwScale == duration in seconds
DWORD dwStart;
DWORD dwLength; // In units above..., should be 1
DWORD dwSuggestedBufferSize;
DWORD dwQuality;
DWORD dwSampleSize; // = 0 -> treated as VBR
RECT rcFrame; // 0, 0, 0, 0
} AVIStreamHeader;

*Stream format chunk
This chunk has a size of 0.

---------------------------------------------------------


Sorry if I said "simple stuff" a couple times, I`d do it myself but GStreamer is as alien to me as GObject. I guess its not much work if you are familar with those things ;)
Comment 4 Norbert Lange 2007-06-01 21:10:44 UTC
I was wrong as the movi-chunk contains a header before the actual .srt/.ssa data. Heres the description:

-----------------------------------

One subtitle stream is stored in one single chunk. That chunk contains
header data, followed by an entire SRT or SSA file. If that file is using UTF-
8 encoding, the BOM should be included as well. The header is defined as
follows:

char[4]; // ’GAB2’
BYTE 0x00;
WORD 0x02; // unicode
DWORD dwSize_name; // length of stream name in bytes
char name[dwSize_name]; // zero-terminated subtitle stream
name encoded in UTF-16
WORD 0x04;
DWORD dwSize; // size of SRT/SSA text file
char data[dwSize]; // entire SRT/SSA file

-----------------------------
Comment 5 Thijs Vermeir 2007-12-14 13:58:31 UTC
Created attachment 100954 [details] [review]
basic implementation of avi subtitle parsing

This patch allow you to view this avi file with subtitles.
gst-launch filesrc location=subtitle.avi ! avidemux name=demux ! queue ! avisubtitle ! subparse ! overlay.  demux. ! queue ! decodebin ! ffmpegcolorspace ! textoverlay name=overlay ! autovideosink

But some questions:
- What do we need to do with the name of the subtitle (UTF-16)?
- What do we need to do with the BOM of the srt file ? Now, the subparse element does not accept this header. IMHO we peed to push it and subparse must handle this. In this patch the BOM is fixed cut off on 3 bytes so it will only work with UTF-8 and BOM provided...
Comment 6 Tim-Philipp Müller 2007-12-14 18:01:57 UTC
Cool stuff!  Looks mostly fine, but here are the usual nitpicks:

gst_avi_subtitle_chain:

    - one shouldn't use g_assert() to check input really; it's more meant
      to be used as a guard to ensure internal consistency (make sure
      internal decision-making or calculations ended up as they should,
      or maybe to clarify implied preconditions in certain bits of code
      to make it easier to grok). I you don't get as much data as you
      expect to, you could either return FLOW_UNEXPECTED or a
      goto broken_gap2_chunk; where you post a STREAM DECODE error
      message on the bus and return FLOW_ERROR (I'd do the latter).

    - if you check for ASCII characters, it increases the readability
      of the code if you use e.g. 'A' rather than 0x41 (or even
      memcmp or strncmp in places that are not particularly performance
      critical)

    - should use GST_WARNING_OBJECT to print debug lines for broken
      bitstream (IMHO)

    - if the bitstream is broken, post a STREAM DECODE error on the bus
      using the GST_ELEMENT_ERROR macro and return FLOW_ERROR.  Most
      GStreamer code puts error goto labels at the end of the chain
      function for this kind of stuff FWIW :)

    - you don't consistently check whether you're reading beyond the
      size of the buffer (esp. with name_length, but also with the
      other reads)

    - all variables need to be declared at the beginning of a block
      (this is re. the 'file' subbuffer)

    - gst_element_get_pad() returns a reference which you're leaking here

    - you should always pass the flow return from gst_pad_push()
      back upstream

gst_avi_subtitle_class_init:

    - the boilerplate macro already does the parent_class stuff for you

gst_avi_subtitle_init:

    - never put code that does something into g_assert() or g_return_if_fail()
      or the like statements, otherwise those function calls will just
      disappear if someone compiles the plugin against a GLib with
      assertions disabled.

    - save the pads you create in the element structure (and use that
      in your chain function instead of the _get_pad)

    - use GST_DEBUG_FUNCPTR(gst_avi_subtitle_chain) in _set_chain_function

header:

    - not used: GstAviSubtitlePrivate
    - not used: GstAviSubtitlesubtitle_header_name
    - not used: GstAviSubtitlesubtitle_file


As for the BOM:

     - yes, ideally subparse should recognise that and convert its
       input on-the-fly as approriate (I think there's even a bug
       open for this)

     - if you don't want to fix subparse right now, you can of
       course just parse the BOM yourself and convert the output to
       UTF-8 yourself as required, and then push that down to
       subparse.

      - what to do with the subtitle name depends a bit on what it
        contains; we could just ignore it for now, or post it on
        the bus (but then the app might mistake that as the main
        title of the movie or so, that wouldn't be good either).

Last question:

     - does this work right with seeking yet?
Comment 7 Thijs Vermeir 2007-12-15 00:49:36 UTC
Created attachment 100985 [details] [review]
subtitle parsing 2

Thanks for the long review ;-)

This is a new patch with all the remarks that you mentioned solved. I didn't implemented UTF-16 yet because I don't have any test files of it. Now it checks if there is a UTF-8 BOM and if available removes it. Also tried to add the beginnings of a unit test for this new element. seeking support is still on the TODO list ;-)
Comment 8 Thijs Vermeir 2007-12-17 14:03:16 UTC
Created attachment 101125 [details] [review]
avi subtitle parse

Added full unit testing and fixed some memory leaks.
Comment 9 Tim-Philipp Müller 2007-12-17 14:32:25 UTC
Looks good enough to commit (I'd keep it at rank NONE until the last issues are ironed out and seeking works right), provided the unit test passes and is valgrind-clean (make elements/avisubtitle.valgrind).

Just a few more comments:

 - in gst_avi_subtitle_utf8_file: you probably meant file[0] == 0xef,
   not file[0] = 0x0ef (best to get rid of the parentheses around the
   individual comparisons, so things like this get caught by the compiler).

 - I'd probably do some kind of detect_encoding() function that skips
   any BOM and returns an encoding string, and then use g_convert()
   with that.  Alternatively, it's worth using g_utf8_validate()
   when assuming UTF-8 encoding (this can be done after committing)

 - in gst_avi_subtitle_chain: won't g_utf16_to_utf8() assume the
   byte order of the UTF-16 input is native byte order? In other
   words: will this work right on big endian systems given that
   the input is (I guess) UTF-16LE?
Comment 10 Thijs Vermeir 2007-12-18 09:23:54 UTC
Committed.

2007-12-18  Thijs Vermeir

	* gst/avi/Makefile.am:
	* gst/avi/gstavi.c:
	* gst/avi/gstavisubtitle.c:
	* gst/avi/gstavisubtitle.h:
	* tests/check/Makefile.am:
	* tests/check/elements/avisubtitle.c:
	* win32/common/config.h:
		Add avi subtitle element for bug #442034. Need seeking support
		and more support for character conversion.