GNOME Bugzilla – Bug 530962
[subparse] parses only every second line of TMPlayer subtitle
Last modified: 2008-05-04 13:20:18 UTC
Try to play a movie with this subtitle file (TMPlayer format): 00:00:01:WILL BE PARSED1 00:00:03:WILL NOT BE PARSED1 00:00:06:WILL BE PARSED2 00:00:09:WILL NOT BE PARSED2 00:00:12:WILL BE PARSED3 00:00:15:WILL NOT BE PARSED3 00:00:18:WILL BE PARSED4 00:00:21:WILL NOT BE PARSED4 00:00:24:WILL BE PARSED5 00:00:27:WILL NOT BE PARSED5 The lines that say WILL NOT BE PARSED won't show up. The file that is responsible for this is gst-plugins-base-x.xx.xx/gst/subparse/tmplayerparse.c.
Ok, I've made a patch. TMPlayer subs are working correctly right now. This patch adds italic text support for subtitles (in Poland, a video player called SubEdit player is very popular - it can be used to create subs - the name SubEdit - and it shows text starting with slashes "/" as italics). Right now, subs are played correctly. I can't really guarantee if this patch is not breaking subtitles made outside of Poland (maybe someone is using slashes for a different reason?). Diff to gstsubparse.c is made agains gst-plugins-base-0.10.19. TMPlayer is almost rewritten from scratch (left only few lines). The gzipped patch and tmplayer.c files are located in the attachment below.
Created attachment 110322 [details] Patch GZipped patch file + tmplayer subtitle decoder.
Eh, I think I'll give up. There's an issue when parsing line: 00:14:40:SUB1 long 00:14:46:SUB2 short 00:14:48:SUB3 normal TMPlayer subtitle decoder needs to have the second line (to check the duration of subtitle). For example, duration is 5 seconds. So, first sub starts, lasts to 00:14:45, one second break and it shows SUB2. But, if the duration is set to 5, it will show SUB3 after five seconds, not two, so instead of doing 00:14:48, sub3 will be displayed at 00:14:53. Original engine is checking for this (but displaying every second line), my engine is displaying every line with italic support, but breaks timing if the duration is set to 6. If it's set to for example 2, it works ok, but subtitles are disappearing after 2 seconds. Sorry for spamming bugzilla, I'm trying to get this to work.
The parsing issue should be fixed in CVS now: 2008-05-03 Tim-Philipp Müller <tim.muller at collabora co uk> * tests/check/elements/subparse.c: (do_test), (test_tmplayer_style3b), (subparse_suite): Add unit test for the tmplayer variant from bug #530962. 2008-05-03 Tim-Philipp Müller <tim.muller at collabora co uk> * gst/subparse/gstsubparse.c: (handle_buffer), (gst_sub_parse_sink_event): * gst/subparse/tmplayerparse.c: (tmplayer_process_buffer), (tmplayer_parse_line): Fix parsing of tmplayer subtitle variant where every single line contains text and there isn't an empty line after each line to determine the duration (#530962). Improve EOS handling for tmplayer subtitles a bit by making sure that we push out the last line of text without a duration if there's still text left in the buffer at the end. I've tried your patch, but it makes all the tmplayer checks in the unit test fail, and I couldn't really be bothered to investigate since I already had a fix myself, sorry. If you want support for italics, please file a separate bug for that (and if you attempt a patch it would be preferably to base it on the current code or at least code that passes all the unit tests; also, text within pango markup will need to be escaped).
I think the current code handles the example in comment #3 fine too. Please let me know if that's not the case.
(In reply to comment #5) > I think the current code handles the example in comment #3 fine too. Please let > me know if that's not the case. > It works properly right now:) The only issue right now is italic text (but it's only a minor enhancement, so it can be left alone). I've noticed one little thing: when user is seeking, it's seeking very slow when user reaches end of the movie, for example seeking from 1:39:00 to 1:40:00 is quite slow and hdd light blinks a lot, but when he seeks from 0:05:00 to 0:06:00 it's fast. Disabling the subtitles fixes the problem for me (hdd is blinking slightly to read movie data), so it seems it's not a hdd fragmentation problem. I believe, that when the movie is about to end, GStreamer on every seek is parsing the whole subtitle file from the beginning (but why HDD LED is flashing and I just hear my HDD?). Anyway, it is working fine:) Thanks!!
There is one more issue I've noticed: When you put every second line of text without duration, it will display the sub until the next one will come. If there is a for example 20 second part of the movie without speech, the sub will remain on the screen (maybe there should be a timeout, for example 6 seconds?)
Right, there's probably something inefficient we're doing when parsing. The segment check you added was probably the right thing to do, or something along those lines at least. We always read the entire file from the start when a seek happens, yes - but that shouldn't lead to too much disk churning, subtitle files are usually small enough for this not to matter too much. I'll have a look one of these days. Feel free to file a bug about it if you want to make sure it's not forgotten. > There is one more issue I've noticed: > > When you put every second line of text without duration, it will display the > sub until the next one will come. If there is a for example 20 second part of > the movie without speech, the sub will remain on the screen (maybe there should > be a timeout, for example 6 seconds?) This is basically a deficiency of the subtitle format - it doesn't allow you to specify an end time or duration. The way this is usually done with this format is by having the file look like: 00:00:20:Ok, I'm off to the pub. 00:00:23: 00:00:50:Two pints of lager please 00:00:53: 00:01:04:... (This is probably also the reason why no one has noticed this bug before :))
I've made a little change as you suggested, check the attachment. There is little overhead right now (it can be noticed, but it works much faster). It should work like this: 00:14:40:SUB1 long <- 1 00:14:46:SUB2 short <- 2 00:14:48:SUB3 normal <- 3 00:14:51:SUB3 normal <- 4 Let's say, maximum subtitle duration is hardcoded to 5 seconds (in some players there is an option to change that, but we are talking about a library, not player). So, the parser should display the sub no. 1 from 00:14:40 to 00:14:45. Then one second of rest, then normally show the rest. Before returning the subtitle, it should check for italics (beginning and trailing slash indicates the text is italic. If there is slash only at the end, it shouldn't be removed). This is basically how should it work. I need to know how to check if the file will pass tests, as you mentioned. Should it just compile or what?
Created attachment 110347 [details] [review] TMPlayer subtitle speedup