After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 169936 - [subparse] support for SAMI subtitles
[subparse] support for SAMI subtitles
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other All
: Normal enhancement
: 0.10.8
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 307871 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2005-03-11 12:42 UTC by Young-Ho Cha
Modified: 2006-05-15 15:32 UTC
See Also:
GNOME target: ---
GNOME version: 2.9/2.10


Attachments
regular SAMI subtitle (61.44 KB, application/smil)
2005-03-14 12:10 UTC, Young-Ho Cha
  Details
complex SAMI subtitle. (141.54 KB, application/smil)
2005-03-14 12:17 UTC, Young-Ho Cha
  Details
simple sami parse patch (5.76 KB, patch)
2005-09-22 16:42 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (6.25 KB, patch)
2005-09-22 17:34 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (6.87 KB, patch)
2005-09-22 17:44 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (8.35 KB, patch)
2005-09-23 08:41 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (9.95 KB, patch)
2005-09-24 08:09 UTC, Young-Ho Cha
reviewed Details | Review
simple sami parse patch (updated) (8.44 KB, patch)
2006-04-22 08:00 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (10.57 KB, patch)
2006-04-22 19:41 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (11.34 KB, patch)
2006-04-23 05:28 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (11.82 KB, patch)
2006-04-23 12:10 UTC, Young-Ho Cha
needs-work Details | Review
simple sami parse patch (updated) (13.05 KB, patch)
2006-04-24 15:37 UTC, Young-Ho Cha
needs-work Details | Review
simple sami parse patch (updated) (13.48 KB, patch)
2006-04-25 13:07 UTC, Young-Ho Cha
accepted-commit_after_freeze Details | Review
64270: simple sami parse patch (updated) (13.52 KB, patch)
2006-04-26 16:07 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (13.79 KB, patch)
2006-05-01 13:35 UTC, Young-Ho Cha
none Details | Review
simple sami parse patch (updated) (14.16 KB, patch)
2006-05-13 21:01 UTC, Young-Ho Cha
committed Details | Review
quick fix for detecting sami format. (694 bytes, patch)
2006-05-15 12:38 UTC, Young-Ho Cha
committed Details | Review
fix for detecting sami format (1.85 KB, patch)
2006-05-15 14:30 UTC, Young-Ho Cha
reviewed Details | Review

Description Young-Ho Cha 2005-03-11 12:42:15 UTC
Please describe the problem:
gst-plugins' subparse plugin is not support SAMI subtitle type(these file named
xxx.smi)



Steps to reproduce:
1. open some movie file with sami subtitle using totem.
2. 
3. 


Actual results:
print this error message.
ERROR (0x8688910 - 308484:15:53.418517000)        subparse(28884)
gstsubparse.c(653):gst_subparse_format_autodetect:<subparse0> The input is not a
valid/supported subtitle file

Expected results:
render movie file with subtitle

Does this happen every time?
yes

Other information:
SAMI subtitle parser can get in sub_read_line_sami() at
http://www1.mplayerhq.hu/cgi-bin/cvsweb.cgi/main/subreader.c?rev=1.141
Comment 1 Ronald Bultje 2005-03-11 20:41:28 UTC
Just as a note, we cannot copy the mplayer implementation directly. We need our
own implementation, from scratch, for copyright reason (GPL vs. LGPL).
Comment 2 Trent Waddington 2005-03-14 11:35:08 UTC
Well it's a html format so it shouldn't be hard to parse, but I havn't actually
been able to find any .smi files with anything in them that gstreamer would
actually be interested in.  Can you attach your .smi file for us?
Comment 3 Young-Ho Cha 2005-03-14 12:10:17 UTC
Created attachment 38694 [details]
regular SAMI subtitle

ragular SAMI subtitle type.
It contains an english subtitle.
Comment 4 Young-Ho Cha 2005-03-14 12:17:16 UTC
Created attachment 38695 [details]
complex SAMI subtitle.

it has two subtitles stream in one file, english and korean.

SAMI subtitle has no charset encoding information.

so it uses legacy charset like id3tag.

it'll need some workaround like gst_tag_extract_id3v1_string() from
gst-plugins/gst/tags/gstid3tag.c or need new function described bug #169943
Comment 5 Ronald Bultje 2005-06-16 06:52:29 UTC
*** Bug 307871 has been marked as a duplicate of this bug. ***
Comment 6 Young-Ho Cha 2005-09-22 16:42:49 UTC
Created attachment 52515 [details] [review]
simple sami parse patch

i made a patch that parsing SAMI subtitle simply.

currently, it just parse subtitle string and timestamps.
Comment 7 Young-Ho Cha 2005-09-22 17:34:25 UTC
Created attachment 52519 [details] [review]
simple sami parse patch (updated)

updated patch (process &nbsp; properly.)
Comment 8 Young-Ho Cha 2005-09-22 17:44:43 UTC
Created attachment 52520 [details] [review]
simple sami parse patch (updated)

missed header diff, and fix some mistake in escape_amp()
Comment 9 Ronald Bultje 2005-09-22 20:10:58 UTC
Doesn't glib have some escaping functions?
Comment 10 Tim-Philipp Müller 2005-09-22 21:17:58 UTC
AFAIK it only has escaping functions (g_markup_escape_text() etc.). If you use
GMarkupParser it will un-escape those entities for you, but I don't think the
unescaping code is actually exported.

Dunno if we want to support &#NNN; and &#xNN; type of escapes as well (unicode
code point in decimal/hex was it?) ... ;)

Cheers
 -Tim
Comment 11 Young-Ho Cha 2005-09-23 08:41:54 UTC
Created attachment 52537 [details] [review]
simple sami parse patch (updated)

avode g_string_append() when get_next_line() returns NULL, and parse <font
color=> tag. (but, can't show colored subtitle, is it pango-gsttextoverlay
plugin issue?)
Comment 12 Young-Ho Cha 2005-09-24 08:09:21 UTC
Created attachment 52583 [details] [review]
simple sami parse patch (updated)

change to use g_markup_xxx functions()
Comment 13 Andy Wingo 2006-01-12 12:38:06 UTC
Wouldn't it be better to use libxml's HTML parser? I would think you could use htmlCreatePushParserCtxt from http://xmlsoft.org/html/libxml-HTMLparser.html#htmlCreatePushParserCtxt to make a SAX reader. Then you would use htmlParseChunk on sucessive blocks of the input stream to push data into the parser. A SAX handler would be installed (the first arg of htmlCreatePushParserCtxt), that would push out buffers. Given the complexity of this format I'd be very suspicious of a homemade parser.
Comment 14 Young-Ho Cha 2006-04-22 08:00:07 UTC
Created attachment 64092 [details] [review]
simple sami parse patch (updated)

update for gstreamer 0.10 and parse with libxml2 by andy's advice.
Comment 15 Young-Ho Cha 2006-04-22 19:41:23 UTC
Created attachment 64117 [details] [review]
simple sami parse patch (updated)

add <ruby> tag support. (it can work after apply patch in #339405)
add <br> tag support. (it can work after apply patch in #339405)
Comment 16 Young-Ho Cha 2006-04-23 05:28:22 UTC
Created attachment 64128 [details] [review]
simple sami parse patch (updated)

skip text in <title> tag.
layout ruby text properly. (but multiline ruby text can't handle yet.)
Comment 17 Christian Persch 2006-04-23 10:53:48 UTC
if (!g_strncasecmp("title", aname, 5)) {
[and many other places]

g_strncasecmp is deprecated and doesn't do what you want here. You should use libxml's xmlStrncasecmp instead.
Comment 18 Young-Ho Cha 2006-04-23 12:10:29 UTC
Created attachment 64141 [details] [review]
simple sami parse patch (updated)

g_strncasecmp -> xmlStrncmp
when dispose, clean samiContext
Comment 19 Andy Wingo 2006-04-24 09:36:34 UTC
0) You are awesome!

1) Your patch should use gstreamer coding style; you can guarantee this by running the gst-indent script, from gstreamer/tools/, on your file before making the diff.

2) You should not be using statically-linked data, it prevents two instances of the plugin from being run at once; you should put that data into the ParserState->user_data.

3) Consider using GString for accumulating data, it makes the code clearer

4) You don't do any error handling; what happens when you feed it a malformed subtitle?

Thanks!
Comment 20 Young-Ho Cha 2006-04-24 15:37:28 UTC
Created attachment 64213 [details] [review]
simple sami parse patch (updated)

indent with gst-indent.
dynamic allocated samiContext.
change to using GString.

and I tested some malformed subtitles. and It worked well too.
(malformed tag pair, invalid color value, etc)
Comment 21 Andy Wingo 2006-04-25 10:07:31 UTC
OK at this point just a few nitpicks:

1) Avoid declaring variables in the middle of a scope, e.g. has_italic in pop_state(), value in start_sync -- we still compile on sun's forte

2) The idiom of iterating over attributes would be more clearly written as:
    for (i = 0; (atts[i] != NULL); i+=2) {
      const xmlChar *key, *value;
      key = atts[i];
      value = atts[i+1];

3) According to http://www.jamesh.id.au/articles/libxml-sax/libxml-sax.html#characters, the string passed to characters_sami is not guaranteed to be null-terminated; you need to use the g_string_append_len.

4) When configuring GStreamer without XML support, the sami parser shouldn't be built because there is no libxml. Please surround the sami code in #ifndef GST_DISABLE_LOADSAVE_REGISTRY.

Other than that, looking very good. This will be nice to have :-)
Comment 22 Young-Ho Cha 2006-04-25 13:07:56 UTC
Created attachment 64270 [details] [review]
simple sami parse patch (updated)

applied andy's suggestions.
Comment 23 Andy Wingo 2006-04-25 14:37:00 UTC
OK, looks good to me. Thanks, Young-Ho! I've marked it for committing after the -base release tomorrow.
Comment 24 Young-Ho Cha 2006-04-26 16:07:35 UTC
Created attachment 64332 [details] [review]
64270: simple sami parse patch (updated)

I found a bug which can't detect some sami file because that is starting with whitespace characters.

so I replace strncmp() to xmlStrcasestr().
Comment 25 Young-Ho Cha 2006-05-01 13:35:58 UTC
Created attachment 64607 [details] [review]
simple sami parse patch (updated)

avoid append next node text in result buf
Comment 26 Tim-Philipp Müller 2006-05-11 21:29:21 UTC
Nice work, will finish reviewing it tomorrow if possible.

Could you enlighten me on two things:

 (a) there are several /* FIXME */ comments in the code - could you
     expand on those - what exactly needs fixing there?
     (like /* FIXME: does not work on Wednesdays */)

 (b) could you add some comments for the various members of the 
     samiContext structure, for posterity, so it's clearer how
     this is all supposed to work? :)

Comment 27 Young-Ho Cha 2006-05-12 01:24:13 UTC
this is answers for above questions.

(a) there is two FIXME for ruby markup rendering.
they will be reimplemented after close bug 322391

(b) each member means..

buf : buffer for collect content

rubybuf : buffer for collect ruby content

resultbuf :
when open next 'sync' tag, move from buf for avoid to append next content 

has_result : flag for ready to push 

state: in many sami files, many tags are not closed.
so when open tag, parser will append tag flag in state member, 
and when open next "sync" tag, close opened tag force.
see sami_context_push_state() and sami_context_pop_state()

htmlctxt: html parser context

in_title : flag for avoid to append title content to buf

time1: previous start attribute in sync tag

time2: current start attribute in sync tag
Comment 28 Young-Ho Cha 2006-05-13 21:01:03 UTC
Created attachment 65401 [details] [review]
simple sami parse patch (updated)

expand FIXME comment that depend bug 322391.
many sami subtitle assign silver color, but X rgb db has no such color. so replace to #c0c0c0 value.
Comment 29 Tim-Philipp Müller 2006-05-15 09:43:00 UTC
Committed to CVS, thanks for the patch and sorry everything took so long:

 2006-05-15  Tim-Philipp Müller  <tim at centricular dot net>

       Patch by: Young-Ho Cha <ganadist at chollian net>

       * gst/subparse/samiparse.c: (handle_start_font):
         Need to map "silver" colour explicitly.

 2006-05-15  Tim-Philipp Müller  <tim at centricular dot net>

        Patch by: Young-Ho Cha  <ganadist at chollian net>

        * gst/subparse/Makefile.am:
        * gst/subparse/gstsubparse.c: (gst_sub_parse_dispose),
        (parser_state_dispose), (gst_sub_parse_data_format_autodetect),
        (gst_sub_parse_format_autodetect), (feed_textbuf),
        (gst_subparse_type_find), (plugin_init):
        * gst/subparse/gstsubparse.h:
        * gst/subparse/samiparse.c:
        * gst/subparse/samiparse.h:
          Add support for SAMI subtitles (#169936).

I've taken the liberty to split out the SAMI parsing code into a separate file (I think in the long run it's more maintainable like that). Also, I've changed the code to use a new media type application/x-subtitle-sami for SAMI subtitles (this gives us more flexibility and among other things keeps open the option to split the code into a separate plugin in case anyone ever thinks that that should be done because of the libxml2 dependency etc.).
Comment 30 Young-Ho Cha 2006-05-15 12:11:27 UTC
thank you very much for accept patch.

but I found you missed one thing.

As I noticed comment #24, some sami files can start with whitespace characters.

so I changed to use xmlStrcasestr() instead of strncmp().

should you fix it please?
Comment 31 Young-Ho Cha 2006-05-15 12:38:26 UTC
Created attachment 65497 [details] [review]
quick fix for detecting sami format.

some sami files can start with whitespace. so if use strncmp(), it can't detect.

so change to xmlStrcasestr() instead of strncmp()
Comment 32 Young-Ho Cha 2006-05-15 14:30:43 UTC
Created attachment 65505 [details] [review]
fix for detecting sami format

move check function to samiparse.[ch]
Comment 33 Tim-Philipp Müller 2006-05-15 15:32:07 UTC
The type detection should still work even if we don't actually compile with SAMI support enabled. However, the typefinder can't use libxml2 functions either (like with the previous patch), so:


 2006-05-15  Tim-Philipp Müller  <tim at centricular dot net>

       * gst/subparse/gstsubparse.c:
        (gst_sub_parse_data_format_autodetect):
         Don't use libxml functions for typefinding.


Thanks for noticing.