Bug 715023 – StrokeDashArray-attribute in Path-Element containing trailing whitespaces cause libgxps to fail

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 715023 - StrokeDashArray-attribute in Path-Element containing trailing whitespaces cause libgxps to fail


Summary:	StrokeDashArray-attribute in Path-Element containing trailing whitespaces cau...


Status:	RESOLVED FIXED

Product:	libgxps
Classification:	Platform
Component:	general
Version:	0.2.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	libgxps maintainers
QA Contact:	libgxps maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2013-11-22 16:08 UTC by Lukas Lueg
Modified:	2013-11-26 09:52 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Patch to fix surrounding whitespaces (1.50 KB, patch) 2013-11-22 16:10 UTC, Lukas Lueg	reviewed	Details \| Review

Description Lukas Lueg 2013-11-22 16:08:22 UTC

Using version 0.2.2 or latest commit adbfe0ea.., libgxps chokes on a .xps produced by Microsoft Visio 2013 with with following errors:

Error rendering page 1: Error rendering page /Documents/1/Pages/1.fpage: /Documents/1/Pages/1.fpage:2512:27 invalid content in attribute 'StrokeDashArray' of element 'Path': 7 5 
Error rendering page 2: Error rendering page /Documents/1/Pages/2.fpage: /Documents/1/Pages/2.fpage:3676:27 invalid content in attribute 'StrokeDashArray' of element 'Path': 7 5 


The StrokeDashArray-attribute is given as "7 5 ", which causes gxps_dash_array_parse() to choke as the string is split into {"7","5",""} and gxps_value_get_double("") fails on the empty string.

This may or may not be considered a bug in the implementation by Microsoft Visio. Given this is a major product, I'd suspect there are many files out there with this problem.

I gave the error message some debugging, the attached patch gives you an idea about the simple workaround. The PDF resulting from xpstopdf looks just fine now.

Comment 1 Lukas Lueg 2013-11-22 16:10:42 UTC

Created attachment 261241 [details] [review]
Patch to fix surrounding whitespaces

Cluttered by whitespaces-changes

Comment 2 Carlos Garcia Campos 2013-11-25 10:11:26 UTC

Review of attachment 261241 [details] [review]:

Thanks for the patch, could you share the document having this issue to include it in my test suite?

::: libgxps/gxps-page.c
@@ +213,3 @@
+	g_strstrip (_dash);
+	items = g_strsplit (_dash, " ", -1);
+	g_free (_dash);

This looks good to me, my only concern is that we are going to always duplicate the string and run strip to fix a very specific case, so in 99% of the cases we are doing it for nothing. So, even if it's a bit more complex, I wonder if we could check first if the given string contains trailing or leading whitespaces before dup + strip.

Comment 3 Lukas Lueg 2013-11-25 13:26:08 UTC

I duped the dash-string into _dash because I wasnt sure if surrounding whitespaces are required for other syntax nodes. If not so, gxps_dash_array_parse()'s caller should call g_strsplit() on what becomes *dash, so all nodes only get to see attribute-strings with their whitespaces removed.

Comment 4 Lukas Lueg 2013-11-25 13:31:44 UTC

Oh and I'm afraid I can't share the original file that caused the problem.

I would guess that Visio 2013 uses some general component to write .xps files (maybe .NET ?).

Comment 5 Carlos Garcia Campos 2013-11-25 13:52:26 UTC

(In reply to comment #3)
> I duped the dash-string into _dash because I wasnt sure if surrounding
> whitespaces are required for other syntax nodes. If not so,
> gxps_dash_array_parse()'s caller should call g_strsplit() on what becomes

I guess you mean strip instead of split here.

> *dash, so all nodes only get to see attribute-strings with their whitespaces
> removed.

The caller can't modify the string either because it's a const char* owned by the xml parser. It's fine to duplicate the string when we need to run strip. We could simply check it with something like:

char* stripped_dash = NULL;
if (g_ascii_isspace (dash[0]) || g_ascii_isspace (dash[strlen (dash) - 1])
    stripped_dash = g_strstrip (g_strdup (dash));

items = g_strsplit (stripped_dash ? stripped_dash : dash, " ", -1);
g_free (stripped_dash);

Something like this.

Comment 6 Christian Persch 2013-11-25 14:36:20 UTC

If we're concerned about perf here, should maybe avoid the triple strlen (explicit one, strdup does one, and strstrip as well) :-)

Comment 7 Carlos Garcia Campos 2013-11-25 14:48:16 UTC

(In reply to comment #6)
> If we're concerned about perf here, should maybe avoid the triple strlen
> (explicit one, strdup does one, and strstrip as well) :-)

Well, I'm assuming that the alloc/dealloc is the slowest thing here. In the best case (no whitespaces at all) we were not doing any strlen, if there are trailing whitespace we are doing the 4 strlen actually because strip does two, one in strchomp and one in strchug.

In any case, I'm not specially concerned about performance here, that's why I was wondering whether it is worth the checks to avoid alloc + strip + dealloc, considering that this is only necessary for very few cases where the dash string contains trailing or leading whitespaces.

Comment 8 Lukas Lueg 2013-11-25 15:02:37 UTC

I'd suspect a single alloc and strip inside gxps_dash_array_parse() has close to zero performance penalty on the overall process.


As a side note: I added "glib_assert(num_strokes % 2 == 0)" to my patch since the spec says that StrokeDashArray "Specifies the length of dashes and gaps of the outline stroke. These values are specified as multiples of the stroke thickness as a space-separated list with an *even number of non-negative values*." (emphasis added).

The assertion should be thrown out (to avoid a crash in case of an invalid file) but gxps_dash_array_parse() should nevertheless check that num_strokes is 0 modulo 2 and has no negative values before returning True.

Comment 9 Carlos Garcia Campos 2013-11-26 09:52:48 UTC

(In reply to comment #8)
> I'd suspect a single alloc and strip inside gxps_dash_array_parse() has close
> to zero performance penalty on the overall process.

Agree, I've just pushed the patch to git master, slightly modified and without the other changes (tabs vs spaces and the assert).

> 
> As a side note: I added "glib_assert(num_strokes % 2 == 0)" to my patch since
> the spec says that StrokeDashArray "Specifies the length of dashes and gaps of
> the outline stroke. These values are specified as multiples of the stroke
> thickness as a space-separated list with an *even number of non-negative
> values*." (emphasis added).

Right, good catch, however, I think we should simply make the parser fail instead of crashing. We can also check num_dashes is an even number before allocating and parsing all the values, returning earlier when it fails.

> The assertion should be thrown out (to avoid a crash in case of an invalid
> file) but gxps_dash_array_parse() should nevertheless check that num_strokes is
> 0 modulo 2 and has no negative values before returning True.

Right, I've done both things in follow up patches. 

Regarding the tabs vs spaces, I prefer to use spaces, and I know we are mixing them in several places, so we should probably untabify everywhere, but I'm not sure it's worth it.