GNOME Bugzilla – Bug 715023
StrokeDashArray-attribute in Path-Element containing trailing whitespaces cause libgxps to fail
Last modified: 2013-11-26 09:52:48 UTC
Using version 0.2.2 or latest commit adbfe0ea.., libgxps chokes on a .xps produced by Microsoft Visio 2013 with with following errors: Error rendering page 1: Error rendering page /Documents/1/Pages/1.fpage: /Documents/1/Pages/1.fpage:2512:27 invalid content in attribute 'StrokeDashArray' of element 'Path': 7 5 Error rendering page 2: Error rendering page /Documents/1/Pages/2.fpage: /Documents/1/Pages/2.fpage:3676:27 invalid content in attribute 'StrokeDashArray' of element 'Path': 7 5 The StrokeDashArray-attribute is given as "7 5 ", which causes gxps_dash_array_parse() to choke as the string is split into {"7","5",""} and gxps_value_get_double("") fails on the empty string. This may or may not be considered a bug in the implementation by Microsoft Visio. Given this is a major product, I'd suspect there are many files out there with this problem. I gave the error message some debugging, the attached patch gives you an idea about the simple workaround. The PDF resulting from xpstopdf looks just fine now.
Created attachment 261241 [details] [review] Patch to fix surrounding whitespaces Cluttered by whitespaces-changes
Review of attachment 261241 [details] [review]: Thanks for the patch, could you share the document having this issue to include it in my test suite? ::: libgxps/gxps-page.c @@ +213,3 @@ + g_strstrip (_dash); + items = g_strsplit (_dash, " ", -1); + g_free (_dash); This looks good to me, my only concern is that we are going to always duplicate the string and run strip to fix a very specific case, so in 99% of the cases we are doing it for nothing. So, even if it's a bit more complex, I wonder if we could check first if the given string contains trailing or leading whitespaces before dup + strip.
I duped the dash-string into _dash because I wasnt sure if surrounding whitespaces are required for other syntax nodes. If not so, gxps_dash_array_parse()'s caller should call g_strsplit() on what becomes *dash, so all nodes only get to see attribute-strings with their whitespaces removed.
Oh and I'm afraid I can't share the original file that caused the problem. I would guess that Visio 2013 uses some general component to write .xps files (maybe .NET ?).
(In reply to comment #3) > I duped the dash-string into _dash because I wasnt sure if surrounding > whitespaces are required for other syntax nodes. If not so, > gxps_dash_array_parse()'s caller should call g_strsplit() on what becomes I guess you mean strip instead of split here. > *dash, so all nodes only get to see attribute-strings with their whitespaces > removed. The caller can't modify the string either because it's a const char* owned by the xml parser. It's fine to duplicate the string when we need to run strip. We could simply check it with something like: char* stripped_dash = NULL; if (g_ascii_isspace (dash[0]) || g_ascii_isspace (dash[strlen (dash) - 1]) stripped_dash = g_strstrip (g_strdup (dash)); items = g_strsplit (stripped_dash ? stripped_dash : dash, " ", -1); g_free (stripped_dash); Something like this.
If we're concerned about perf here, should maybe avoid the triple strlen (explicit one, strdup does one, and strstrip as well) :-)
(In reply to comment #6) > If we're concerned about perf here, should maybe avoid the triple strlen > (explicit one, strdup does one, and strstrip as well) :-) Well, I'm assuming that the alloc/dealloc is the slowest thing here. In the best case (no whitespaces at all) we were not doing any strlen, if there are trailing whitespace we are doing the 4 strlen actually because strip does two, one in strchomp and one in strchug. In any case, I'm not specially concerned about performance here, that's why I was wondering whether it is worth the checks to avoid alloc + strip + dealloc, considering that this is only necessary for very few cases where the dash string contains trailing or leading whitespaces.
I'd suspect a single alloc and strip inside gxps_dash_array_parse() has close to zero performance penalty on the overall process. As a side note: I added "glib_assert(num_strokes % 2 == 0)" to my patch since the spec says that StrokeDashArray "Specifies the length of dashes and gaps of the outline stroke. These values are specified as multiples of the stroke thickness as a space-separated list with an *even number of non-negative values*." (emphasis added). The assertion should be thrown out (to avoid a crash in case of an invalid file) but gxps_dash_array_parse() should nevertheless check that num_strokes is 0 modulo 2 and has no negative values before returning True.
(In reply to comment #8) > I'd suspect a single alloc and strip inside gxps_dash_array_parse() has close > to zero performance penalty on the overall process. Agree, I've just pushed the patch to git master, slightly modified and without the other changes (tabs vs spaces and the assert). > > As a side note: I added "glib_assert(num_strokes % 2 == 0)" to my patch since > the spec says that StrokeDashArray "Specifies the length of dashes and gaps of > the outline stroke. These values are specified as multiples of the stroke > thickness as a space-separated list with an *even number of non-negative > values*." (emphasis added). Right, good catch, however, I think we should simply make the parser fail instead of crashing. We can also check num_dashes is an even number before allocating and parsing all the values, returning earlier when it fails. > The assertion should be thrown out (to avoid a crash in case of an invalid > file) but gxps_dash_array_parse() should nevertheless check that num_strokes is > 0 modulo 2 and has no negative values before returning True. Right, I've done both things in follow up patches. Regarding the tabs vs spaces, I prefer to use spaces, and I know we are mixing them in several places, so we should probably untabify everywhere, but I'm not sure it's worth it.