After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 795891 - matroskademux: emits invalid Pango markup
matroskademux: emits invalid Pango markup
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
git master
Other Mac OS
: Normal normal
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 796695 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2018-05-07 20:47 UTC by Philippe Normand
Modified: 2018-11-03 12:07 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
0001-pbutils-Add-markup-utils-module.patch (12.57 KB, patch)
2018-06-30 12:42 UTC, Philippe Normand
needs-work Details | Review
0002-subparse-Use-new-pbutils-markup-utilities.patc (11.38 KB, patch)
2018-06-30 12:42 UTC, Philippe Normand
accepted-commit_now Details | Review
0001-matroskademux-Sanitize-SubRip-markup-data.patch (1.99 KB, patch)
2018-06-30 12:43 UTC, Philippe Normand
needs-work Details | Review

Description Philippe Normand 2018-05-07 20:47:29 UTC
VLC doesn't complain with that broken markup. Maybe it's actually valid? For WebVTT I think it is, at least.

0:00:15.766899000 26502 0x7f995da8fc50 DEBUG                  pango gstbasetextoverlay.c:2250:void gst_base_text_overlay_render_text(GstBaseTextOverlay *, const gchar *, gint): Rendering '<i>I'll try to contact them...'

(<unknown>:26502): Pango-WARNING **: 21:39:58.241: pango_layout_set_markup_with_accel: Error on line 1 char 47: Element 'markup' was closed, but the currently open element is 'i'

This isn't rendered. Instead the most recent valid text is rendered, which is confusing :)
Comment 1 Tim-Philipp Müller 2018-05-07 21:55:53 UTC
Where does this come from?

I think we have code in subparse to fix this up.

It's clearly not valid pango markup?
Comment 2 Philippe Normand 2018-05-08 08:27:52 UTC
It's a subtitle embedded in a Matroska file.
Comment 3 Nicolas Dufresne (ndufresne) 2018-05-08 13:36:08 UTC
GStreamer is not the render, so why is this bug files against GST instead of Pango ? Do you suggest et should implement our own to comply with WebVTT ?
Comment 4 Philippe Normand 2018-05-08 14:19:02 UTC
(In reply to Nicolas Dufresne (ndufresne) from comment #3)
> GStreamer is not the render, so why is this bug files against GST instead of
> Pango ?

Well, Tim mentioned there might be some code in subparse to workaround this issue...
I wonder why it's not used then, I'll have a look.

> Do you suggest et should implement our own to comply with WebVTT ?

This is not a WebVTT subtitle.
Comment 5 Tim-Philipp Müller 2018-05-08 14:32:01 UTC
I'm guessing this is matroskademux sending out pango markup text directly, so no subparse in the mix.
Comment 6 Philippe Normand 2018-05-08 18:09:05 UTC
(In reply to Tim-Philipp Müller from comment #5)
> I'm guessing this is matroskademux sending out pango markup text directly,
> so no subparse in the mix.

That's right :)
So what should we do then? Fixup the markup in matroskademux like subparse does? Or maybe we could add a subparse element after the demuxer?
Comment 7 Philippe Normand 2018-05-08 18:16:34 UTC
matroskademux could application/x-subtitle instead of pango markup and then we could add a subparse element in subtitleoverlay.
Comment 8 Philippe Normand 2018-06-30 12:42:14 UTC
Created attachment 372896 [details] [review]
0001-pbutils-Add-markup-utils-module.patch
Comment 9 Philippe Normand 2018-06-30 12:42:45 UTC
Created attachment 372897 [details] [review]
0002-subparse-Use-new-pbutils-markup-utilities.patc
Comment 10 Philippe Normand 2018-06-30 12:43:21 UTC
Created attachment 372898 [details] [review]
0001-matroskademux-Sanitize-SubRip-markup-data.patch
Comment 11 Sebastian Dröge (slomo) 2018-07-02 10:26:31 UTC
*** Bug 796695 has been marked as a duplicate of this bug. ***
Comment 12 Tim-Philipp Müller 2018-07-02 10:37:24 UTC
Comment on attachment 372896 [details] [review]
0001-pbutils-Add-markup-utils-module.patch

>+++ b/gst-libs/gst/pbutils/markup-utils.c
>+/* GStreamer base utils library text markup utility functions
>+ * Copyright (C) 2018 Philippe Normand <philn@igalia.com>

Haven't looked at the patch properly yet, was just wondering if this code was all yours originally? :)
Comment 13 Sebastian Dröge (slomo) 2018-07-02 10:37:59 UTC
Review of attachment 372898 [details] [review]:

::: gst/matroska/matroska-demux.c
@@ +2315,3 @@
+  keyunit = !!(flags & GST_SEEK_FLAG_KEY_UNIT);
+  after = !!(flags & GST_SEEK_FLAG_SNAP_AFTER);
+  before = !!(flags & GST_SEEK_FLAG_SNAP_BEFORE);

Irrelevant changes, please put into a separate commit (just push it, no need for review) :)

@@ +3400,3 @@
       utf8 = g_markup_escape_text ((gchar *) map.data, map.size);
 
+      /* FIXME: WebVTT support */

What does this mean? Will WebVTT work correctly but it is just not sanitized?
Comment 14 Sebastian Dröge (slomo) 2018-07-02 10:41:08 UTC
Review of attachment 372896 [details] [review]:

Generally looks good to me

::: gst-libs/gst/pbutils/markup-utils.c
@@ +1,2 @@
+/* GStreamer base utils library text markup utility functions
+ * Copyright (C) 2018 Philippe Normand <philn@igalia.com>

Isn't this based on the code from subparse? :)

@@ +70,3 @@
+  }
+
+  tag_regex = g_regex_new (search_pattern, 0, 0, NULL);

Yes please store the regex in some kind of GOnce :)
Comment 15 Philippe Normand 2018-07-02 10:46:34 UTC
(In reply to Tim-Philipp Müller from comment #12)
> Comment on attachment 372896 [details] [review] [review]
> 0001-pbutils-Add-markup-utils-module.patch
> 
> >+++ b/gst-libs/gst/pbutils/markup-utils.c
> >+/* GStreamer base utils library text markup utility functions
> >+ * Copyright (C) 2018 Philippe Normand <philn@igalia.com>
> 
> Haven't looked at the patch properly yet, was just wondering if this code
> was all yours originally? :)

Ah sorry about this, of course I didn't write all this, it comes from subparse. I'll git blame it and mention original authors.

(In reply to Sebastian Dröge (slomo) from comment #13)
> Review of attachment 372898 [details] [review] [review]:
> 
> ::: gst/matroska/matroska-demux.c
> @@ +2315,3 @@
> +  keyunit = !!(flags & GST_SEEK_FLAG_KEY_UNIT);
> +  after = !!(flags & GST_SEEK_FLAG_SNAP_AFTER);
> +  before = !!(flags & GST_SEEK_FLAG_SNAP_BEFORE);
> 
> Irrelevant changes, please put into a separate commit (just push it, no need
> for review) :)
> 

Ok :)

> @@ +3400,3 @@
>        utf8 = g_markup_escape_text ((gchar *) map.data, map.size);
>  
> +      /* FIXME: WebVTT support */
> 
> What does this mean? Will WebVTT work correctly but it is just not sanitized?

Well, Matroska seems to have WebVTT support in addition to SRT and other formats already supported by the demuxer. But currently WebVTT isn't supported by the demuxer, afaics.
Comment 16 Philippe Normand 2018-07-02 13:11:33 UTC
Is the general approach ok though? I wasn't sure about the API naming and binding friendliness :)
Comment 17 Tim-Philipp Müller 2018-07-02 13:25:14 UTC
I think neither the naming nor the way this works is ideal yet :)

How does it work? Does it modify (and replace, if needed) an existing string in place?

I dislike the _subrip() specific thing and also the is_webvtt boolean.

Could we do this as a generic function? (e.g. passing in a format enum with a flags field or something if needed)

How about we pass a string in, function takes ownership and we return a new string? (Which may be the old one or not?)

What about error handling? Are there cases where we should/need to return an error or a warning?

OR MAYBE we should just do this all differently, and force subparse to be plugged in this case via an extra caps field, and let subparse clean it up?
Comment 18 Philippe Normand 2018-07-02 13:44:58 UTC
(In reply to Tim-Philipp Müller from comment #17)
> I think neither the naming nor the way this works is ideal yet :)
> 

Yeah this is a first shot, I basically extracted the code from subparse and didn't change it.

> How does it work? Does it modify (and replace, if needed) an existing string
> in place?
> 

Yes.

> I dislike the _subrip() specific thing and also the is_webvtt boolean.
> 
> Could we do this as a generic function? (e.g. passing in a format enum with
> a flags field or something if needed)
> 

Hmm, yeah something like that would be possible.

> How about we pass a string in, function takes ownership and we return a new
> string? (Which may be the old one or not?)
> 

Yes this sounds better than modifying in-place and is most likely better for bindings.

> What about error handling? Are there cases where we should/need to return an
> error or a warning?
> 

I don't think so. The idea is to output valid markup, not sure which case should trigger errors.

> OR MAYBE we should just do this all differently, and force subparse to be
> plugged in this case via an extra caps field, and let subparse clean it up?

Actually I tried this first :) The problem is that subparse wouldn't be able to auto-detect the subtitle format because it expects a full payload (cue timing data and actual text data) whereas matroskademux only emits the text data.
Comment 19 Tim-Philipp Müller 2018-07-02 14:00:23 UTC
> > OR MAYBE we should just do this all differently, and force subparse to be
> > plugged in this case via an extra caps field, and let subparse clean it up?
> 
> Actually I tried this first :) The problem is that subparse wouldn't be able
> to auto-detect the subtitle format because it expects a full payload (cue
> timing data and actual text data) whereas matroskademux only emits the text
> data.

I'm sure this could be made to work. There's no need to autodetect if we get full and proper caps from upstream.
Comment 20 Philippe Normand 2018-10-27 14:43:46 UTC
It's not only an issue about the caps. subparse is stateful, it needs timing informations from the cues but those infos are not provided, matroskademux emits only the cue data.
Comment 21 GStreamer system administrator 2018-11-03 12:07:14 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/452.