After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 337076 - Problem with broken matroska files containing non-UTF8 subtitles
Problem with broken matroska files containing non-UTF8 subtitles
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
0.10.x
Other Linux
: Normal normal
: 0.10.4
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2006-04-03 19:36 UTC by Guillaume Desmottes
Modified: 2006-07-04 13:55 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
debug log (48.10 KB, application/x-bzip)
2006-06-17 23:10 UTC, Guillaume Desmottes
Details

Description Guillaume Desmottes 2006-04-03 19:36:57 UTC
When i play (using totem) a .mkv video there is often a very disturbing problem with the display of the subtitles.
Let's say, for example, than the following subtitles sequence is defined in the srt file:
A
B
C
D
E

Sometimes, it will displayed:
A
A
A
D
E

I have no problem if i play the avi with the srt file separately.
Comment 1 Tim-Philipp Müller 2006-06-17 15:39:38 UTC
Just so I understand this correctly: in both cases the subtitles come from the same external .srt file, right?

Is this reproducable using a certain .mkv + .srt, or a more spurious phenomenon?
Comment 2 Tim-Philipp Müller 2006-06-17 16:20:39 UTC
If you can reproduce it somehow, it would be great if you could make a log file with

  $ export GST_DEBUG_NO_COLOR=1
  $ export GST_DEBUG=totem:5,textoverlay:5,pango:5,matroskademux:5,subparse:5
  $ totem 2> dbg.log
  $ bzip2 dbg.log

that logs the problem and then attach it here.
Comment 3 Guillaume Desmottes 2006-06-17 23:03:12 UTC
Yes, if the subtitles are in a external .srt file there is no problem but when i merges it in a mkv this bug occurs.
I do my mkv using this command:

mkvmerge -o file.mkv  --language 1:eng --default-track 1 -a 1 -d 0 -S file.avi --language 0:fre --default-track 0 -s 0 -D -A file.srt --track-order 0:0,0:1,1:0

$ mkvmerge --version
mkvmerge v1.6.5 ('Watcher Of The Skies') built on Mar 18 2006 05:40:50

I don't think it's a mkvmerge bug because i don't have this bug with mplayer.
Comment 4 Guillaume Desmottes 2006-06-17 23:10:31 UTC
Created attachment 67553 [details]
debug log

I just play the first seconds of the video (the bug occurs in the first subtitles) and then close totem.

Unfortunately i can't upload this mkv but if you want i could give it to you at GUADEC. ;-)
Comment 5 Nicolas da Luz Duque 2006-06-19 16:30:08 UTC
I'm encountering the same bug with the files Guillaume handed to me.
Comment 6 Tim-Philipp Müller 2006-06-21 13:54:40 UTC
Can reproduce this as well.


Comment 7 Tim-Philipp Müller 2006-06-21 14:36:26 UTC
This is the problem (from your log):

  (totem:9746): Pango-WARNING **: pango_layout_set_markup_with_accel: Erreur à la ligne 2 caractère 9 : Texte codé en UTF-8 non valide
  (totem:9746): Pango-WARNING **: pango_layout_set_markup_with_accel: Erreur à la ligne 1 caractère 9 : Texte codé en UTF-8 non valide
  (totem:9746): Pango-WARNING **: pango_layout_set_markup_with_accel: Erreur à la ligne 1 caractère 9 : Texte codé en UTF-8 non valide


It will not show those subtitles that are not valid UTF-8. There are other issues too, but this seems to be what causes the problem. SRT-in-matroska must always be UTF-8 according to the matroska specification [1], so there is an mkvmerge bug involved as well.


[1] http://www.matroska.org/technical/specs/subtitles/srt.html
Comment 8 Guillaume Desmottes 2006-06-22 09:28:54 UTC
Agree, it sounds like a mkvmerge bug.

But as we can now read non UTF-8 srt files, maybe it would be fine to read subtitles according their encoding if we detect they are not in UTF-8.
Comment 9 Tim-Philipp Müller 2006-06-22 11:09:42 UTC
> Agree, it sounds like a mkvmerge bug.
> 
> But as we can now read non UTF-8 srt files, maybe it would be fine to read
> subtitles according their encoding if we detect they are not in UTF-8.

How do you know what their encoding is? You can't detect it ...  (and if we used environment variable / properties for the subtitle encoding, then we'd soon have to make this instant-switchable and all that for a clearly broken file).
Comment 10 Guillaume Desmottes 2006-06-22 13:17:11 UTC
Since you fixed bug 172848 i have no more problem with .srt files.
So i suppose we could do the same kind of stuff for subtitles in matroska.
Comment 11 Tim-Philipp Müller 2006-06-22 16:27:47 UTC
 2006-06-22  Tim-Philipp Müller  <tim at centricular dot net>

        * gst/matroska/matroska-demux.c:
        (gst_matroska_demux_check_subtitle_buffer),
        (gst_matroska_demux_parse_blockgroup_or_simpleblock),
        (gst_matroska_demux_subtitle_caps):
        * gst/matroska/matroska-ids.c:
        (gst_matroska_track_init_subtitle_context):
        * gst/matroska/matroska-ids.h:
          Try to fix up broken matroska files containing subtitle
          streams with non-UTF8 character encodings (courtesy of
          mkvmerge) using either the encoding specified in the
          GST_SUBTITLE_ENCODING environment variable or the
          current locale's character set if it is non-UTF8.
          Fixes #337076.

Comment 12 Guillaume Desmottes 2006-07-04 13:55:07 UTC
I tested using gst HEAD and it seems to work perfectly well now.
Thanks a lot for the fix Tim-Philipp !