GNOME Bugzilla – Bug 616936
[matroskademux] Incorrect display of subtitles with markup
Last modified: 2012-02-18 15:27:16 UTC
When using and avi file plus a srt text file, totem can handle tags in subtitles by showing bold, italic and underline. If instead the video is in MKV with embedded subtitles, it displays the tag. I also made a test by extracting the subtitles and converting from MKV to AVI. The result confirms the bug
Please upload a test file.
Created attachment 159684 [details] the avi file
Created attachment 159685 [details] the srt file
Created attachment 159686 [details] the mkv file with subtitles
Here are all the files. if you open directly the mkv file you'll see the tags. if you download the avi and srt files you won't see the tags, but see the italic font.
Reproduced with gst-launch, so GStreamer bug. My guess is that the subtitle stream is wrongly tagged, or detected.
I think it's just that subtitles in matroska were assumed to be without markup, so the caps end up as text/plain rather than text/x-pango-markup.
Created attachment 164376 [details] [review] matroskademux: UTF-8 subtitles may have markup On the one hand, matroskademux specs are not very specific on what "plain" UTF8 subtitles entail. On the other hand, replacing text/plain with text/x-pango-markup (what patch does) Just Works, and AFAIK it should not break other plain text cases (being markup with empty markup) ?
If we re-label plain text as markup, then we need to make sure & < > etc. are escaped properly (g_markup_*() utility functions). So I guess we'd need to check if there are tags in the text; if yes, then we can just assume everything is fine, if not we have to check if there are chars that need to be escaped that aren't escaped yet, or something.
Ah. That's a bit of a snag. Sounds like some not well-defined heuristics are in order (somewhere) ...
Maybe something like this would be enough: init stream->seen_tag to FALSE and then: stream->seen_tag = stream->seen_tag || check_if_subtitle_chunk_has_tag (txt); if (!stream->seen_tag) xyz = g_markup_escape_text (txt); ?
and what about the rare case where plaintext subtitles contains something that looks like a tag?
> and what about the rare case where plaintext subtitles contains something that > looks like a tag? That's just tough luck then. We could whitelist a number of common/acceptable tags to check for. I think the chances that someone puts '<b>' or '<i>' in a subtitle chunk and actually wants it displayed like that are close to 0..
Created attachment 165560 [details] [review] matroskademux: UTF-8 subtitles may have markup As before, but adds some (simple) heuristics as proposed to determine whether or not to escape subtitle text.
commit 74e0c05ff7d2270494d616ea2d86811bae5a3d53 Author: Mark Nauwelaerts <mark.nauwelaerts@collabora.co.uk> Date: Wed Jun 23 11:12:00 2010 +0200 matroskademux: UTF-8 subtitles may have markup Fixes #616936.
*** Bug 651739 has been marked as a duplicate of this bug. ***
*** Bug 654596 has been marked as a duplicate of this bug. ***