GNOME Bugzilla – Bug 795891
matroskademux: emits invalid Pango markup
Last modified: 2018-11-03 12:07:14 UTC
VLC doesn't complain with that broken markup. Maybe it's actually valid? For WebVTT I think it is, at least. 0:00:15.766899000 26502 0x7f995da8fc50 DEBUG pango gstbasetextoverlay.c:2250:void gst_base_text_overlay_render_text(GstBaseTextOverlay *, const gchar *, gint): Rendering '<i>I'll try to contact them...' (<unknown>:26502): Pango-WARNING **: 21:39:58.241: pango_layout_set_markup_with_accel: Error on line 1 char 47: Element 'markup' was closed, but the currently open element is 'i' This isn't rendered. Instead the most recent valid text is rendered, which is confusing :)
Where does this come from? I think we have code in subparse to fix this up. It's clearly not valid pango markup?
It's a subtitle embedded in a Matroska file.
GStreamer is not the render, so why is this bug files against GST instead of Pango ? Do you suggest et should implement our own to comply with WebVTT ?
(In reply to Nicolas Dufresne (ndufresne) from comment #3) > GStreamer is not the render, so why is this bug files against GST instead of > Pango ? Well, Tim mentioned there might be some code in subparse to workaround this issue... I wonder why it's not used then, I'll have a look. > Do you suggest et should implement our own to comply with WebVTT ? This is not a WebVTT subtitle.
I'm guessing this is matroskademux sending out pango markup text directly, so no subparse in the mix.
(In reply to Tim-Philipp Müller from comment #5) > I'm guessing this is matroskademux sending out pango markup text directly, > so no subparse in the mix. That's right :) So what should we do then? Fixup the markup in matroskademux like subparse does? Or maybe we could add a subparse element after the demuxer?
matroskademux could application/x-subtitle instead of pango markup and then we could add a subparse element in subtitleoverlay.
Created attachment 372896 [details] [review] 0001-pbutils-Add-markup-utils-module.patch
Created attachment 372897 [details] [review] 0002-subparse-Use-new-pbutils-markup-utilities.patc
Created attachment 372898 [details] [review] 0001-matroskademux-Sanitize-SubRip-markup-data.patch
*** Bug 796695 has been marked as a duplicate of this bug. ***
Comment on attachment 372896 [details] [review] 0001-pbutils-Add-markup-utils-module.patch >+++ b/gst-libs/gst/pbutils/markup-utils.c >+/* GStreamer base utils library text markup utility functions >+ * Copyright (C) 2018 Philippe Normand <philn@igalia.com> Haven't looked at the patch properly yet, was just wondering if this code was all yours originally? :)
Review of attachment 372898 [details] [review]: ::: gst/matroska/matroska-demux.c @@ +2315,3 @@ + keyunit = !!(flags & GST_SEEK_FLAG_KEY_UNIT); + after = !!(flags & GST_SEEK_FLAG_SNAP_AFTER); + before = !!(flags & GST_SEEK_FLAG_SNAP_BEFORE); Irrelevant changes, please put into a separate commit (just push it, no need for review) :) @@ +3400,3 @@ utf8 = g_markup_escape_text ((gchar *) map.data, map.size); + /* FIXME: WebVTT support */ What does this mean? Will WebVTT work correctly but it is just not sanitized?
Review of attachment 372896 [details] [review]: Generally looks good to me ::: gst-libs/gst/pbutils/markup-utils.c @@ +1,2 @@ +/* GStreamer base utils library text markup utility functions + * Copyright (C) 2018 Philippe Normand <philn@igalia.com> Isn't this based on the code from subparse? :) @@ +70,3 @@ + } + + tag_regex = g_regex_new (search_pattern, 0, 0, NULL); Yes please store the regex in some kind of GOnce :)
(In reply to Tim-Philipp Müller from comment #12) > Comment on attachment 372896 [details] [review] [review] > 0001-pbutils-Add-markup-utils-module.patch > > >+++ b/gst-libs/gst/pbutils/markup-utils.c > >+/* GStreamer base utils library text markup utility functions > >+ * Copyright (C) 2018 Philippe Normand <philn@igalia.com> > > Haven't looked at the patch properly yet, was just wondering if this code > was all yours originally? :) Ah sorry about this, of course I didn't write all this, it comes from subparse. I'll git blame it and mention original authors. (In reply to Sebastian Dröge (slomo) from comment #13) > Review of attachment 372898 [details] [review] [review]: > > ::: gst/matroska/matroska-demux.c > @@ +2315,3 @@ > + keyunit = !!(flags & GST_SEEK_FLAG_KEY_UNIT); > + after = !!(flags & GST_SEEK_FLAG_SNAP_AFTER); > + before = !!(flags & GST_SEEK_FLAG_SNAP_BEFORE); > > Irrelevant changes, please put into a separate commit (just push it, no need > for review) :) > Ok :) > @@ +3400,3 @@ > utf8 = g_markup_escape_text ((gchar *) map.data, map.size); > > + /* FIXME: WebVTT support */ > > What does this mean? Will WebVTT work correctly but it is just not sanitized? Well, Matroska seems to have WebVTT support in addition to SRT and other formats already supported by the demuxer. But currently WebVTT isn't supported by the demuxer, afaics.
Is the general approach ok though? I wasn't sure about the API naming and binding friendliness :)
I think neither the naming nor the way this works is ideal yet :) How does it work? Does it modify (and replace, if needed) an existing string in place? I dislike the _subrip() specific thing and also the is_webvtt boolean. Could we do this as a generic function? (e.g. passing in a format enum with a flags field or something if needed) How about we pass a string in, function takes ownership and we return a new string? (Which may be the old one or not?) What about error handling? Are there cases where we should/need to return an error or a warning? OR MAYBE we should just do this all differently, and force subparse to be plugged in this case via an extra caps field, and let subparse clean it up?
(In reply to Tim-Philipp Müller from comment #17) > I think neither the naming nor the way this works is ideal yet :) > Yeah this is a first shot, I basically extracted the code from subparse and didn't change it. > How does it work? Does it modify (and replace, if needed) an existing string > in place? > Yes. > I dislike the _subrip() specific thing and also the is_webvtt boolean. > > Could we do this as a generic function? (e.g. passing in a format enum with > a flags field or something if needed) > Hmm, yeah something like that would be possible. > How about we pass a string in, function takes ownership and we return a new > string? (Which may be the old one or not?) > Yes this sounds better than modifying in-place and is most likely better for bindings. > What about error handling? Are there cases where we should/need to return an > error or a warning? > I don't think so. The idea is to output valid markup, not sure which case should trigger errors. > OR MAYBE we should just do this all differently, and force subparse to be > plugged in this case via an extra caps field, and let subparse clean it up? Actually I tried this first :) The problem is that subparse wouldn't be able to auto-detect the subtitle format because it expects a full payload (cue timing data and actual text data) whereas matroskademux only emits the text data.
> > OR MAYBE we should just do this all differently, and force subparse to be > > plugged in this case via an extra caps field, and let subparse clean it up? > > Actually I tried this first :) The problem is that subparse wouldn't be able > to auto-detect the subtitle format because it expects a full payload (cue > timing data and actual text data) whereas matroskademux only emits the text > data. I'm sure this could be made to work. There's no need to autodetect if we get full and proper caps from upstream.
It's not only an issue about the caps. subparse is stateful, it needs timing informations from the cues but those infos are not provided, matroskademux emits only the cue data.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/452.