GNOME Bugzilla – Bug 337076
Problem with broken matroska files containing non-UTF8 subtitles
Last modified: 2006-07-04 13:55:07 UTC
When i play (using totem) a .mkv video there is often a very disturbing problem with the display of the subtitles. Let's say, for example, than the following subtitles sequence is defined in the srt file: A B C D E Sometimes, it will displayed: A A A D E I have no problem if i play the avi with the srt file separately.
Just so I understand this correctly: in both cases the subtitles come from the same external .srt file, right? Is this reproducable using a certain .mkv + .srt, or a more spurious phenomenon?
If you can reproduce it somehow, it would be great if you could make a log file with $ export GST_DEBUG_NO_COLOR=1 $ export GST_DEBUG=totem:5,textoverlay:5,pango:5,matroskademux:5,subparse:5 $ totem 2> dbg.log $ bzip2 dbg.log that logs the problem and then attach it here.
Yes, if the subtitles are in a external .srt file there is no problem but when i merges it in a mkv this bug occurs. I do my mkv using this command: mkvmerge -o file.mkv --language 1:eng --default-track 1 -a 1 -d 0 -S file.avi --language 0:fre --default-track 0 -s 0 -D -A file.srt --track-order 0:0,0:1,1:0 $ mkvmerge --version mkvmerge v1.6.5 ('Watcher Of The Skies') built on Mar 18 2006 05:40:50 I don't think it's a mkvmerge bug because i don't have this bug with mplayer.
Created attachment 67553 [details] debug log I just play the first seconds of the video (the bug occurs in the first subtitles) and then close totem. Unfortunately i can't upload this mkv but if you want i could give it to you at GUADEC. ;-)
I'm encountering the same bug with the files Guillaume handed to me.
Can reproduce this as well.
This is the problem (from your log): (totem:9746): Pango-WARNING **: pango_layout_set_markup_with_accel: Erreur à la ligne 2 caractère 9 : Texte codé en UTF-8 non valide (totem:9746): Pango-WARNING **: pango_layout_set_markup_with_accel: Erreur à la ligne 1 caractère 9 : Texte codé en UTF-8 non valide (totem:9746): Pango-WARNING **: pango_layout_set_markup_with_accel: Erreur à la ligne 1 caractère 9 : Texte codé en UTF-8 non valide It will not show those subtitles that are not valid UTF-8. There are other issues too, but this seems to be what causes the problem. SRT-in-matroska must always be UTF-8 according to the matroska specification [1], so there is an mkvmerge bug involved as well. [1] http://www.matroska.org/technical/specs/subtitles/srt.html
Agree, it sounds like a mkvmerge bug. But as we can now read non UTF-8 srt files, maybe it would be fine to read subtitles according their encoding if we detect they are not in UTF-8.
> Agree, it sounds like a mkvmerge bug. > > But as we can now read non UTF-8 srt files, maybe it would be fine to read > subtitles according their encoding if we detect they are not in UTF-8. How do you know what their encoding is? You can't detect it ... (and if we used environment variable / properties for the subtitle encoding, then we'd soon have to make this instant-switchable and all that for a clearly broken file).
Since you fixed bug 172848 i have no more problem with .srt files. So i suppose we could do the same kind of stuff for subtitles in matroska.
2006-06-22 Tim-Philipp Müller <tim at centricular dot net> * gst/matroska/matroska-demux.c: (gst_matroska_demux_check_subtitle_buffer), (gst_matroska_demux_parse_blockgroup_or_simpleblock), (gst_matroska_demux_subtitle_caps): * gst/matroska/matroska-ids.c: (gst_matroska_track_init_subtitle_context): * gst/matroska/matroska-ids.h: Try to fix up broken matroska files containing subtitle streams with non-UTF8 character encodings (courtesy of mkvmerge) using either the encoding specified in the GST_SUBTITLE_ENCODING environment variable or the current locale's character set if it is non-UTF8. Fixes #337076.
I tested using gst HEAD and it seems to work perfectly well now. Thanks a lot for the fix Tim-Philipp !