After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 530962 - [subparse] parses only every second line of TMPlayer subtitle
[subparse] parses only every second line of TMPlayer subtitle
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
0.10.19
Other Linux
: Normal minor
: 0.10.20
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2008-05-01 22:12 UTC by Tomasz Sałaciński
Modified: 2008-05-04 13:20 UTC
See Also:
GNOME target: ---
GNOME version: 2.21/2.22


Attachments
Patch (1.82 KB, application/x-compressed-tar)
2008-05-03 14:09 UTC, Tomasz Sałaciński
  Details
TMPlayer subtitle speedup (444 bytes, patch)
2008-05-04 13:20 UTC, Tomasz Sałaciński
none Details | Review

Description Tomasz Sałaciński 2008-05-01 22:12:26 UTC
Try to play a movie with this subtitle file (TMPlayer format):

00:00:01:WILL BE PARSED1
00:00:03:WILL NOT BE PARSED1
00:00:06:WILL BE PARSED2
00:00:09:WILL NOT BE PARSED2
00:00:12:WILL BE PARSED3
00:00:15:WILL NOT BE PARSED3
00:00:18:WILL BE PARSED4
00:00:21:WILL NOT BE PARSED4
00:00:24:WILL BE PARSED5
00:00:27:WILL NOT BE PARSED5

The lines that say WILL NOT BE PARSED won't show up. The file that is responsible for this is gst-plugins-base-x.xx.xx/gst/subparse/tmplayerparse.c.
Comment 1 Tomasz Sałaciński 2008-05-03 14:08:59 UTC
Ok, I've made a patch. TMPlayer subs are working correctly right now.

This patch adds italic text support for subtitles (in Poland, a video player called SubEdit player is very popular - it can be used to create subs - the name SubEdit - and it shows text starting with slashes "/" as italics).

Right now, subs are played correctly. I can't really guarantee if this patch is not breaking subtitles made outside of Poland (maybe someone is using slashes for a different reason?). Diff to gstsubparse.c is made agains gst-plugins-base-0.10.19. TMPlayer is almost rewritten from scratch (left only few lines). The gzipped patch and tmplayer.c files are located in the attachment below.
Comment 2 Tomasz Sałaciński 2008-05-03 14:09:39 UTC
Created attachment 110322 [details]
Patch

GZipped patch file + tmplayer subtitle decoder.
Comment 3 Tomasz Sałaciński 2008-05-03 16:16:16 UTC
Eh, I think I'll give up.

There's an issue when parsing line:

00:14:40:SUB1 long

00:14:46:SUB2 short
00:14:48:SUB3 normal

TMPlayer subtitle decoder needs to have the second line (to check the duration of subtitle). For example, duration is 5 seconds. So, first sub starts, lasts to 00:14:45, one second break and it shows SUB2. But, if the duration is set to 5, it will show SUB3 after five seconds, not two, so instead of doing 00:14:48, sub3 will be displayed at 00:14:53.

Original engine is checking for this (but displaying every second line), my engine is displaying every line with italic support, but breaks timing if the duration is set to 6. If it's set to for example 2, it works ok, but subtitles are disappearing after 2 seconds.

Sorry for spamming bugzilla, I'm trying to get this to work.
Comment 4 Tim-Philipp Müller 2008-05-03 16:18:02 UTC
The parsing issue should be fixed in CVS now:

2008-05-03  Tim-Philipp Müller  <tim.muller at collabora co uk>

        * tests/check/elements/subparse.c: (do_test),
          (test_tmplayer_style3b), (subparse_suite):
          Add unit test for the tmplayer variant from bug #530962.

2008-05-03  Tim-Philipp Müller  <tim.muller at collabora co uk>

        * gst/subparse/gstsubparse.c: (handle_buffer),
          (gst_sub_parse_sink_event):
        * gst/subparse/tmplayerparse.c: (tmplayer_process_buffer),
          (tmplayer_parse_line):
          Fix parsing of tmplayer subtitle variant where every single line contains
          text and there isn't an empty line after each line to determine the
          duration (#530962). Improve EOS handling for tmplayer subtitles a bit by
          making sure that we push out the last line of text without a duration if
          there's still text left in the buffer at the end.

I've tried your patch, but it makes all the tmplayer checks in the unit test fail, and I couldn't really be bothered to investigate since I already had a fix myself, sorry.

If you want support for italics, please file a separate bug for that (and if you attempt a patch it would be preferably to base it on the current code or at least code that passes all the unit tests; also, text within pango markup will need to be escaped).

Comment 5 Tim-Philipp Müller 2008-05-03 16:21:00 UTC
I think the current code handles the example in comment #3 fine too. Please let me know if that's not the case.
Comment 6 Tomasz Sałaciński 2008-05-03 16:54:19 UTC
(In reply to comment #5)
> I think the current code handles the example in comment #3 fine too. Please let
> me know if that's not the case.
> 

It works properly right now:) The only issue right now is italic text (but it's only a minor enhancement, so it can be left alone). I've noticed one little thing: when user is seeking, it's seeking very slow when user reaches end of the movie, for example seeking from 1:39:00 to 1:40:00 is quite slow and hdd light blinks a lot, but when he seeks from 0:05:00 to 0:06:00 it's fast. Disabling the subtitles fixes the problem for me (hdd is blinking slightly to read movie data), so it seems it's not a hdd fragmentation problem. I believe, that when the movie is about to end, GStreamer on every seek is parsing the whole subtitle file from the beginning (but why HDD LED is flashing and I just hear my HDD?).

Anyway, it is working fine:) Thanks!!
Comment 7 Tomasz Sałaciński 2008-05-03 17:07:17 UTC
There is one more issue I've noticed:

When you put every second line of text without duration, it will display the sub until the next one will come. If there is a for example 20 second part of the movie without speech, the sub will remain on the screen (maybe there should be a timeout, for example 6 seconds?)
Comment 8 Tim-Philipp Müller 2008-05-03 18:54:57 UTC
Right, there's probably something inefficient we're doing when parsing. The segment check you added was probably the right thing to do, or something along those lines at least. We always read the entire file from the start when a seek happens, yes - but that shouldn't lead to too much disk churning, subtitle files are usually small enough for this not to matter too much. I'll have a look one of these days. Feel free to file a bug about it if you want to make sure it's not forgotten.


> There is one more issue I've noticed:
> 
> When you put every second line of text without duration, it will display the
> sub until the next one will come. If there is a for example 20 second part of
> the movie without speech, the sub will remain on the screen (maybe there should
> be a timeout, for example 6 seconds?)

This is basically a deficiency of the subtitle format - it doesn't allow you to specify an end time or duration.

The way this is usually done with this format is by having the file look like:

00:00:20:Ok, I'm off to the pub.
00:00:23:
00:00:50:Two pints of lager please
00:00:53:
00:01:04:...

(This is probably also the reason why no one has noticed this bug before :))

Comment 9 Tomasz Sałaciński 2008-05-04 13:08:09 UTC
I've made a little change as you suggested, check the attachment. There is little overhead right now (it can be noticed, but it works much faster).

It should work like this:

00:14:40:SUB1 long          <- 1
00:14:46:SUB2 short         <- 2
00:14:48:SUB3 normal        <- 3
00:14:51:SUB3 normal        <- 4

Let's say, maximum subtitle duration is hardcoded to 5 seconds (in some players there is an option to change that, but we are talking about a library, not player). So, the parser should display the sub no. 1 from 00:14:40 to 00:14:45. Then one second of rest, then normally show the rest. Before returning the subtitle, it should check for italics (beginning and trailing slash indicates the text is italic. If there is slash only at the end, it shouldn't be removed).

This is basically how should it work.

I need to know how to check if the file will pass tests, as you mentioned. Should it just compile or what?
Comment 10 Tomasz Sałaciński 2008-05-04 13:20:18 UTC
Created attachment 110347 [details] [review]
TMPlayer subtitle speedup