GNOME Bugzilla – Bug 751735
dashdemux: incorrect parsing and handling of segment templates
Last modified: 2015-10-29 13:14:57 UTC
The gst_mpdparser_build_URL_from_template function does not correctly parse and validate the template. For some templates it will accept a wrong format and provide a wrong output. The function will split the url template into tokens separated by $. The main problem is that the function will try to match and replace any of these tokens, instead of analysing only the ones contained between 2 $ characters. More precisely, for a string like token0$token1$token2$token3$token4 only the odd number tokens (1,3,5,...) should be analysed, because only those have the format $<identifier>$. For example, token0 does not have a preceding $, so it cannot be an identifier. Another problem is the validation of format done by validate_format function. It should reject formats that do not start with %. Instead, it will accept them, leading to strange output produced by gst_mpdparser_build_URL_from_template function. The following scenarios are not correctly handled. The format of the examples is: uri_template, expected response. const gchar *id = "TestId"; guint number = 7; guint bandwidth = 2500; guint64 time = 100; {"Number", "Number"}, /* string similar with an identifier, but without $ */ {"Number$Number$", "Number7"}, /* Number identifier */ {"Number$Number$$$", "Number7$"}, /* Number identifier followed by $$ */ {"Number$Number$Number$Number$", "Number7Number7"}, /* series of "Number" string and Number identifier */ {"TestMedia$Time", NULL}, /* Identifier not finished with $ */ {"Time$Time%0d$", "Time100"}, /* usage of format smaller than number of digits */ {"Time$Time%01d$", "Time100"}, /* usage of format smaller than number of digits */ {"Time$Time%05d$", "Time00100"}, /* usage of format bigger than number of digits */ {"Time$Time%05dtest$", "Time00100test"}, /* usage extra text in format */ {"Time$Time1%01d$", NULL}, /* incorrect format: does not start with % after identifier */ {"$RepresentationID1$", NULL}, /* incorrect identifier */ {"$Bandwidth1$", NULL}, /* incorrect identifier */ {"$Number1$", NULL}, /* incorrect identifier */ {"$RepresentationID%01d$", NULL}, /* incorrect format: RepresentationID does not support formatting */ {"Time$Time%05u$", "Time00100"}, /* %u format */ {"Time$Time%05x$", "Time00064"}, /* %x format */ {"Time$Time%05utest$", "Time00100test"}, /* %u format followed by text */ {"Time$Time%05xtest$", "Time00064test"}, /* %x format followed by text */
Created attachment 306415 [details] [review] proposed fix and unit tests Fixed the template parsing. Added unit tests to reproduce and verify all reported problems
(In reply to Florin Apostol from comment #0) > {"Number", "Number"}, /* string similar with an identifier, but > without $ */ > {"Number$Number$", "Number7"}, /* Number identifier */ > {"Number$Number$$$", "Number7$"}, /* Number identifier followed by $$ */ > {"Number$Number$Number$Number$", "Number7Number7"}, /* series of "Number" > string and Number identifier */ I'm not sure about this one, I think your interpretation is correct because the spec tells to replace the $Number$ with the value and then the 2nd $Number$ would lose its first $ and would not be replaced, but what if we start replacing from the end towards the start or if all replacements should be done 'at the same time'? What interpretation could have us covering most of the valid use cases? > {"TestMedia$Time", NULL}, /* Identifier not finished with $ */ > {"Time$Time%0d$", "Time100"}, /* usage of format smaller than number > of digits */ > {"Time$Time%01d$", "Time100"}, /* usage of format smaller than number > of digits */ > {"Time$Time%05d$", "Time00100"}, /* usage of format bigger than number of > digits */ > {"Time$Time%05dtest$", "Time00100test"}, /* usage extra text in format */ Is this valid? I couldn't find anything in the spec that allowed suffixing that was not the %0d identifier. > {"Time$Time1%01d$", NULL}, /* incorrect format: does not start with % after > identifier */ > {"$RepresentationID1$", NULL}, /* incorrect identifier */ > {"$Bandwidth1$", NULL}, /* incorrect identifier */ > {"$Number1$", NULL}, /* incorrect identifier */ > {"$RepresentationID%01d$", NULL}, /* incorrect format: RepresentationID > does not support formatting */ > {"Time$Time%05u$", "Time00100"}, /* %u format */ > {"Time$Time%05x$", "Time00064"}, /* %x format */ > {"Time$Time%05utest$", "Time00100test"}, /* %u format followed by text */ > {"Time$Time%05xtest$", "Time00064test"}, /* %x format followed by text */ Are %u and %x allowed by the spec? It seems that it should always be a %d but as it is a bit dubious I'd be ok for accepting those as well.
(In reply to Thiago Sousa Santos from comment #2) > I'm not sure about this one, I think your interpretation is correct because > the spec tells to replace the $Number$ with the value and then the 2nd > $Number$ would lose its first $ and would not be replaced, but what if we > start replacing from the end towards the start or if all replacements should > be done 'at the same time'? What interpretation could have us covering most > of the valid use cases? I would make it left to right, and if stuff is not consistent make it an error or just fail with the resulting 404. As the spec uses $ as the opening and closing "parenthesis", it's not unambiguous and there can't be a solution that catches everything. > > {"TestMedia$Time", NULL}, /* Identifier not finished with $ */ > > {"Time$Time%0d$", "Time100"}, /* usage of format smaller than number > > of digits */ > > {"Time$Time%01d$", "Time100"}, /* usage of format smaller than number > > of digits */ > > {"Time$Time%05d$", "Time00100"}, /* usage of format bigger than number of > > digits */ > > > {"Time$Time%05dtest$", "Time00100test"}, /* usage extra text in format */ > > Is this valid? I couldn't find anything in the spec that allowed suffixing > that was not the %0d identifier. IIRC it's valid, but even if not I wouldn't surprise if streams with that appear out there. It seems too intuitive that this should be possible. I don't think it's great that C format strings were used here because of that reason. > > {"Time$Time%05u$", "Time00100"}, /* %u format */ > > {"Time$Time%05x$", "Time00064"}, /* %x format */ > > {"Time$Time%05utest$", "Time00100test"}, /* %u format followed by text */ > > {"Time$Time%05xtest$", "Time00064test"}, /* %x format followed by text */ > Are %u and %x allowed by the spec? It seems that it should always be a %d > but as it is a bit dubious I'd be ok for accepting those as well. The spec only allows %d, yes. But because it's a bit dubious as you say, we also support %x and %u. In mpd parser.
Slomo answered correctly all questions. Is anything else unclear? Are there any open questions about my proposed interpretation?
Review of attachment 306415 [details] [review]: The spec seems to forbid formatting not starting with 0. The code will allow something like %3d, and there are no tests with formats not starting with 0. We might want to allow those anyway (in which case, presumably not 0 padding). In general, I'm not sure it's such a good idea to allow these things (like u and x), because it may cause use of such templates by people not realizing it's not standard, thereby causing pressure on other implementations to start allowing those, etc. No strong feeling though. In any case, it needs either rejecting them, or a test for such. ::: tests/check/elements/dash_mpd.c @@ +2173,3 @@ + {"Time$Time%05xtest$", "Time00064test"}, /* %x format followed by text */ + {"Time$Time%05xtest%$", NULL}, /* second % character in format */ + }; Needs a test with %5 (ie, not starting with 0, which the spec mandates) Needs a test with %0-4d (the spec says %0[width]d, want to catch parsing a signed integer after the leading 0. Maybe a test with a silly number. like %073843278d, and %0d too. Just to see if we can break this code with a crafted template.
Created attachment 310967 [details] [review] patch 1/2: proposed fix and unit test. synchronized with mainline the original patch
Created attachment 310969 [details] [review] patch 2/2 Added support for %d and a few more testcases requested by Vincent.
(In reply to Vincent Penquerc'h from comment #5) > Review of attachment 306415 [details] [review] [review]: > > The spec seems to forbid formatting not starting with 0. The code will allow > something like %3d, and > there are no tests with formats not starting with 0. Fixed the code. Added tests. > Needs a test with %5 (ie, not starting with 0, which the spec mandates) Added test. > Needs a test with %0-4d (the spec says %0[width]d, want to catch parsing a > signed integer after the leading 0. Added test. > Maybe a test with a silly number. like %073843278d, and %0d too. Just to see > if we can break this code with a crafted template. Test for %0d already existed. %073843278d is valid. I am not aware of on any limitations on the possible range for the width parameter.
> In general, I'm not sure it's such a good idea to allow these things (like u and x), because it may cause use of such templates by people not realizing it's not standard, Completely agree. If it's not in the form "%0[width]d" we should not be trying to support it.
I'm ok with changing that and dropping support for %x and %u :)
Created attachment 314384 [details] [review] Restrict format strings to %0[0-9*]d To be applied on top of the two patches already here.
commit 7443f35249572d237ef16859969fb11126ea0184 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Thu Oct 29 12:04:31 2015 +0000 dash_mpd: restrict segment template format strings to %0[0-9]*d as per spec https://bugzilla.gnome.org/show_bug.cgi?id=751735 commit 4eea6c3833f5ad3f9154ec67785ae4ea1f64b89b Author: Florin Apostol <florin.apostol@oregan.net> Date: Wed Sep 9 12:36:10 2015 +0100 dashdemux: segment template parsing: added support for %d Added support for %d in template identifier. Added testcases for %d, %3d, %0-4d identifier formats. commit 933d3674407286df052be993b64db013656f1ecb Author: Florin Apostol <florin.apostol@oregan.net> Date: Thu Oct 29 11:54:34 2015 +0000 dashdemux: corrected parsing of segment templates Corrected the parsing of a segment template string. Added unit tests to test the segment template parsing. All reported problems are now correctly handled. https://bugzilla.gnome.org/show_bug.cgi?id=751735