GNOME Bugzilla – Bug 750852
dashdemux: id@RepresentationType should be StringNoWhitespaceType but accepts spaces
Last modified: 2015-10-28 15:55:45 UTC
The id field in RepresentationType element has the type "StringNoWhitespaceType". This means it is not allowed to contain spaces. The gst_mpdparser_parse_representation_node functions makes no attempt to validate this and will accept an id containing spaces.
Is that a problem? Doesn't this only mean that we also accept slightly malformed streams, but apart from that it shouldn't really cause any issues?
I believe the reason for this restriction in the standard is the fact that RepresentationId can be used as a identifier in SegmentTemplate. If it contains spaces, the generated url will contain spaces.
That's a good point. Does the spec say anything about escaping the template? Also "StringNoWhitespaceType" can still contain characters that are invalid for an URI.
I did not find anything about escaping the template. The definition for StringNoWhitespaceType is: <!-- String without white spaces --> <xs:simpleType name="StringNoWhitespaceType"> <xs:restriction base="xs:string"> <xs:pattern value="[^\r\n\t \p{Z}]*"/> </xs:restriction> </xs:simpleType> The pattern excludes spaces (tabs, new lines, etc) and any other type of delimiters (\p[{Z}). I think the parser should check the id field for those and reject the representation if the id is not properly formatted. Failure to do so might lead to difficult to debug bugs later on.
So what is allowed here? Any other character and we should URI-escape it? Or only alphanum ASCII? Or ...? Seems like a gap in the spec :)
All the standard says is this: " The identifier shall not contain whitespace characters. If used in the template-based URL construction as defined in 5.3.9.4.4, the string shall only contain characters that are permitted within an HTTP-URL according to RFC 1738. " So there is a mandatory requirement (no whitespaces) and an additional one if a template is using it as an identifier.
So should we also check if things are valid URI components?
I propose to check in the xml parser (gst_mpdparser_parse_representation_node) the fact that the representation id does not contain spaces. Maybe add a new function gst_mpdparser_get_xml_prop_string_no_whitespace and use that. Then, in the gst_mpdparser_build_URL_from_template we can check that the id string contains only "characters that are permitted within an HTTP-URL according to RFC 1738."
Sounds reasonable.
Created attachment 313751 [details] [review] forbid whitespace in id attribute
Created attachment 313752 [details] [review] validate id field as per RFC 1738, HTTP URL type
From what I can find, \p{Z} includes non ASCII whitespace (eg, non breaking space). I'm not sure whether that applies here. There's also a huge amount of stuff that can be considered whitespace, like BOMs, l2r/r2l switches, etc. I kept the patch to ASCII/byte whitespace.
Review of attachment 313751 [details] [review]: ::: ext/dash/gstmpdparser.c @@ +307,3 @@ + +static gboolean +validate_nows (const char *s) please rename to validate_no_whitespace @@ +313,3 @@ + +static gboolean +gst_mpdparser_get_xml_prop_string_nows (xmlNode * a_node, please rename to gst_mpdparser_get_xml_prop_string_no_whitespace
Review of attachment 313752 [details] [review]: please add some unit tests for this. ::: ext/dash/gstmpdparser.c @@ +2910,3 @@ + return FALSE; + if (*s == '%') { + /* returns FALSE for NUL, so safe for strings that end early */ I don't understand this comment. What does "end early" mean? @@ +2911,3 @@ + if (*s == '%') { + /* returns FALSE for NUL, so safe for strings that end early */ + if (!g_ascii_isxdigit (s[1]) || !g_ascii_isxdigit (s[2])) wouldn't s[1] and s[2] cause an out of bounds access if the string ends with "%" ?
Created attachment 313795 [details] [review] forbid whitespace in id attribute
(In reply to Vincent Penquerc'h from comment #15) > Created attachment 313795 [details] [review] [review] > forbid whitespace in id attribute looks good.
"end early" means a NUL bedore two hexadecimal characters. s[1] and s[2] do not cause an out of bounds access since if s[1] is NUL, then g_ascii_isxdigit (s[1]) is FALSE, and || will not evaluate the second part, where s[2] might lie in unallocated memory (s[1] never can, since the string should NUL terminated). This is what the comment means, but it apparently is not clear, so I will restate it. I will add tests too.
Created attachment 313798 [details] [review] validate representation set identifier
Review of attachment 313798 [details] [review]: ::: ext/dash/gstmpdparser.h @@ +599,3 @@ gboolean gst_mpd_client_has_isoff_ondemand_profile (GstMpdClient *client); +gboolean gst_mpdparser_validate_no_whitespace (const char *s); don't make the functions public. Keep them static in the c file. The unit test includes the c file so that it can test static functions.
Created attachment 313800 [details] [review] validate representation set identifier
Review of attachment 313800 [details] [review]: ::: tests/check/elements/dash_mpd.c @@ +4496,3 @@ TCase *tc_complexMPD = tcase_create ("complexMPD"); TCase *tc_negativeTests = tcase_create ("negativeTests"); + TCase *tc_stringTests = tcase_create ("stringTests"); I believe this is not used
Created attachment 313802 [details] [review] validate representation set identifier
Review of attachment 313802 [details] [review]: looks good.
commit b8df6cc316f76eda5f592a70a7d0a69cf6eeb9f8 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Tue Sep 29 16:17:03 2015 +0100 mpdparser: validate representation set identifier It must have no whitespace, and must comply with RFC 1738 when used to build a URL. https://bugzilla.gnome.org/show_bug.cgi?id=750852