Bug 751735 – dashdemux: incorrect parsing and handling of segment templates

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 751735 - dashdemux: incorrect parsing and handling of segment templates


Summary:	dashdemux: incorrect parsing and handling of segment templates


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-bad
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	1.7.1
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2015-06-30 15:02 UTC by Florin Apostol
Modified:	2015-10-29 13:14 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
proposed fix and unit tests (8.26 KB, patch) 2015-06-30 15:05 UTC, Florin Apostol	none	Details \| Review
patch 1/2: proposed fix and unit test. (7.87 KB, patch) 2015-09-09 11:38 UTC, Florin Apostol	committed	Details \| Review
patch 2/2 (2.65 KB, patch) 2015-09-09 11:39 UTC, Florin Apostol	committed	Details \| Review
*Restrict format strings to %0[0-9]d** (3.80 KB, patch) 2015-10-29 12:06 UTC, Vincent Penquerc'h	committed	Details \| Review

Description Florin Apostol 2015-06-30 15:02:01 UTC

The gst_mpdparser_build_URL_from_template function does not correctly parse and validate the template. For some templates it will accept a wrong format and provide a wrong output.

The function will split the url template into tokens separated by $. 
The main problem is that the function will try to match and replace any of these tokens, instead of analysing only the ones contained between 2 $ characters.
More precisely, for a string like token0$token1$token2$token3$token4 only the odd number tokens (1,3,5,...) should be analysed, because only those have the format $<identifier>$.
For example, token0 does not have a preceding $, so it cannot be an identifier.

Another problem is the validation of format done by validate_format function. It should reject formats that do not start with %. Instead, it will accept them, leading to strange output produced by gst_mpdparser_build_URL_from_template function.

The following scenarios are not correctly handled.
The format of the examples is: uri_template, expected response. 
  const gchar *id = "TestId";
  guint number = 7;
  guint bandwidth = 2500;
  guint64 time = 100;

{"Number", "Number"},       /* string similar with an identifier, but without $ */
{"Number$Number$", "Number7"},      /* Number identifier */
{"Number$Number$$$", "Number7$"},   /* Number identifier followed by $$ */
{"Number$Number$Number$Number$", "Number7Number7"}, /* series of "Number" string and Number identifier */
{"TestMedia$Time", NULL},   /* Identifier not finished with $ */
{"Time$Time%0d$", "Time100"},       /* usage of format smaller than number of digits */
{"Time$Time%01d$", "Time100"},      /* usage of format smaller than number of digits */
{"Time$Time%05d$", "Time00100"},    /* usage of format bigger than number of digits */
{"Time$Time%05dtest$", "Time00100test"},    /* usage extra text in format */
{"Time$Time1%01d$", NULL},  /* incorrect format: does not start with % after identifier */
{"$RepresentationID1$", NULL},      /* incorrect identifier */
{"$Bandwidth1$", NULL},     /* incorrect identifier */
{"$Number1$", NULL},        /* incorrect identifier */
{"$RepresentationID%01d$", NULL},   /* incorrect format: RepresentationID does not support formatting */
{"Time$Time%05u$", "Time00100"},    /* %u format */
{"Time$Time%05x$", "Time00064"},    /* %x format */
{"Time$Time%05utest$", "Time00100test"},    /* %u format followed by text */
{"Time$Time%05xtest$", "Time00064test"},    /* %x format followed by text */

Comment 1 Florin Apostol 2015-06-30 15:05:24 UTC

Created attachment 306415 [details] [review]
proposed fix and unit tests

Fixed the template parsing. Added unit tests to reproduce and verify all reported problems

Comment 2 Thiago Sousa Santos 2015-07-05 18:51:13 UTC

(In reply to Florin Apostol from comment #0)
> {"Number", "Number"},       /* string similar with an identifier, but
> without $ */
> {"Number$Number$", "Number7"},      /* Number identifier */
> {"Number$Number$$$", "Number7$"},   /* Number identifier followed by $$ */

> {"Number$Number$Number$Number$", "Number7Number7"}, /* series of "Number"
> string and Number identifier */

I'm not sure about this one, I think your interpretation is correct because the spec tells to replace the $Number$ with the value and then the 2nd $Number$ would lose its first $ and would not be replaced, but what if we start replacing from the end towards the start or if all replacements should be done 'at the same time'? What interpretation could have us covering most of the valid use cases?

> {"TestMedia$Time", NULL},   /* Identifier not finished with $ */
> {"Time$Time%0d$", "Time100"},       /* usage of format smaller than number
> of digits */
> {"Time$Time%01d$", "Time100"},      /* usage of format smaller than number
> of digits */
> {"Time$Time%05d$", "Time00100"},    /* usage of format bigger than number of
> digits */

> {"Time$Time%05dtest$", "Time00100test"},    /* usage extra text in format */

Is this valid? I couldn't find anything in the spec that allowed suffixing that was not the %0d identifier.

> {"Time$Time1%01d$", NULL},  /* incorrect format: does not start with % after
> identifier */
> {"$RepresentationID1$", NULL},      /* incorrect identifier */
> {"$Bandwidth1$", NULL},     /* incorrect identifier */
> {"$Number1$", NULL},        /* incorrect identifier */
> {"$RepresentationID%01d$", NULL},   /* incorrect format: RepresentationID
> does not support formatting */

> {"Time$Time%05u$", "Time00100"},    /* %u format */
> {"Time$Time%05x$", "Time00064"},    /* %x format */
> {"Time$Time%05utest$", "Time00100test"},    /* %u format followed by text */
> {"Time$Time%05xtest$", "Time00064test"},    /* %x format followed by text */
Are %u and %x allowed by the spec? It seems that it should always be a %d but as it is a bit dubious I'd be ok for accepting those as well.

Comment 3 Sebastian Dröge (slomo) 2015-07-06 07:26:23 UTC

(In reply to Thiago Sousa Santos from comment #2)
> I'm not sure about this one, I think your interpretation is correct because
> the spec tells to replace the $Number$ with the value and then the 2nd
> $Number$ would lose its first $ and would not be replaced, but what if we
> start replacing from the end towards the start or if all replacements should
> be done 'at the same time'? What interpretation could have us covering most
> of the valid use cases?

I would make it left to right, and if stuff is not consistent make it an error or just fail with the resulting 404. As the spec uses $ as the opening and closing "parenthesis", it's not unambiguous and there can't be a solution that catches everything.

> > {"TestMedia$Time", NULL},   /* Identifier not finished with $ */
> > {"Time$Time%0d$", "Time100"},       /* usage of format smaller than number
> > of digits */
> > {"Time$Time%01d$", "Time100"},      /* usage of format smaller than number
> > of digits */
> > {"Time$Time%05d$", "Time00100"},    /* usage of format bigger than number of
> > digits */
> 
> > {"Time$Time%05dtest$", "Time00100test"},    /* usage extra text in format */
> 
> Is this valid? I couldn't find anything in the spec that allowed suffixing
> that was not the %0d identifier.

IIRC it's valid, but even if not I wouldn't surprise if streams with that appear out there. It seems too intuitive that this should be possible.

I don't think it's great that C format strings were used here because of that reason.

> > {"Time$Time%05u$", "Time00100"},    /* %u format */
> > {"Time$Time%05x$", "Time00064"},    /* %x format */
> > {"Time$Time%05utest$", "Time00100test"},    /* %u format followed by text */
> > {"Time$Time%05xtest$", "Time00064test"},    /* %x format followed by text */
> Are %u and %x allowed by the spec? It seems that it should always be a %d
> but as it is a bit dubious I'd be ok for accepting those as well.

The spec only allows %d, yes. But because it's a bit dubious as you say, we also support %x and %u. In mpd parser.

Comment 4 Florin Apostol 2015-07-08 19:51:31 UTC

Slomo answered correctly all questions. 
Is anything else unclear? Are there any open questions about my proposed interpretation?

Comment 5 Vincent Penquerc'h 2015-09-08 09:48:46 UTC

Review of attachment 306415 [details] [review]:

The spec seems to forbid formatting not starting with 0. The code will allow something like %3d, and
there are no tests with formats not starting with 0. We might want to allow those anyway (in which
case, presumably not 0 padding). In general, I'm not sure it's such a good idea to allow these things
(like u and x), because it may cause use of such templates by people not realizing it's not standard,
thereby causing pressure on other implementations to start allowing those, etc. No strong feeling
though. In any case, it needs either rejecting them, or a test for such.

::: tests/check/elements/dash_mpd.c
@@ +2173,3 @@
+    {"Time$Time%05xtest$", "Time00064test"},    /* %x format followed by text */
+    {"Time$Time%05xtest%$", NULL},      /* second % character in format */
+  };

Needs a test with %5 (ie, not starting with 0, which the spec mandates)
Needs a test with %0-4d (the spec says %0[width]d, want to catch parsing a signed integer after the leading 0.
Maybe a test with a silly number. like %073843278d, and %0d too. Just to see if we can break this code with a crafted template.

Comment 6 Florin Apostol 2015-09-09 11:38:44 UTC

Created attachment 310967 [details] [review]
patch 1/2: proposed fix and unit test.

synchronized with mainline the original patch

Comment 7 Florin Apostol 2015-09-09 11:39:52 UTC

Created attachment 310969 [details] [review]
patch 2/2

Added support for %d and a few more testcases requested by Vincent.

Comment 8 Florin Apostol 2015-09-09 11:43:36 UTC

(In reply to Vincent Penquerc'h from comment #5)
> Review of attachment 306415 [details] [review] [review]:
> 
> The spec seems to forbid formatting not starting with 0. The code will allow
> something like %3d, and
> there are no tests with formats not starting with 0. 
Fixed the code. Added tests.


> Needs a test with %5 (ie, not starting with 0, which the spec mandates)
Added test.

> Needs a test with %0-4d (the spec says %0[width]d, want to catch parsing a
> signed integer after the leading 0.
Added test.

> Maybe a test with a silly number. like %073843278d, and %0d too. Just to see
> if we can break this code with a crafted template.
Test for %0d already existed.
%073843278d is valid. I am not aware of on any limitations on the possible range for the width parameter.

Comment 9 A Ashley 2015-10-02 09:10:00 UTC

> In general, I'm not sure it's such a good idea to allow these things
(like u and x), because it may cause use of such templates by people not realizing it's not standard,

Completely agree. If it's not in the form "%0[width]d" we should not be trying to support it.

Comment 10 Sebastian Dröge (slomo) 2015-10-02 09:15:02 UTC

I'm ok with changing that and dropping support for %x and %u :)

Comment 11 Vincent Penquerc'h 2015-10-29 12:06:44 UTC

Created attachment 314384 [details] [review]
Restrict format strings to %0[0-9*]d

To be applied on top of the two patches already here.

Comment 12 Vincent Penquerc'h 2015-10-29 13:13:51 UTC

commit 7443f35249572d237ef16859969fb11126ea0184
Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk>
Date:   Thu Oct 29 12:04:31 2015 +0000

    dash_mpd: restrict segment template format strings to %0[0-9]*d as per spec
    
    https://bugzilla.gnome.org/show_bug.cgi?id=751735

commit 4eea6c3833f5ad3f9154ec67785ae4ea1f64b89b
Author: Florin Apostol <florin.apostol@oregan.net>
Date:   Wed Sep 9 12:36:10 2015 +0100

    dashdemux: segment template parsing: added support for %d
    
    Added support for %d in template identifier.
    Added testcases for %d, %3d, %0-4d identifier formats.

commit 933d3674407286df052be993b64db013656f1ecb
Author: Florin Apostol <florin.apostol@oregan.net>
Date:   Thu Oct 29 11:54:34 2015 +0000

    dashdemux: corrected parsing of segment templates
    
    Corrected the parsing of a segment template string.
    Added unit tests to test the segment template parsing.
    All reported problems are now correctly handled.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=751735