After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 715023 - StrokeDashArray-attribute in Path-Element containing trailing whitespaces cause libgxps to fail
StrokeDashArray-attribute in Path-Element containing trailing whitespaces cau...
Status: RESOLVED FIXED
Product: libgxps
Classification: Platform
Component: general
0.2.x
Other Linux
: Normal normal
: ---
Assigned To: libgxps maintainers
libgxps maintainers
Depends on:
Blocks:
 
 
Reported: 2013-11-22 16:08 UTC by Lukas Lueg
Modified: 2013-11-26 09:52 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to fix surrounding whitespaces (1.50 KB, patch)
2013-11-22 16:10 UTC, Lukas Lueg
reviewed Details | Review

Description Lukas Lueg 2013-11-22 16:08:22 UTC
Using version 0.2.2 or latest commit adbfe0ea.., libgxps chokes on a .xps produced by Microsoft Visio 2013 with with following errors:

Error rendering page 1: Error rendering page /Documents/1/Pages/1.fpage: /Documents/1/Pages/1.fpage:2512:27 invalid content in attribute 'StrokeDashArray' of element 'Path': 7 5 
Error rendering page 2: Error rendering page /Documents/1/Pages/2.fpage: /Documents/1/Pages/2.fpage:3676:27 invalid content in attribute 'StrokeDashArray' of element 'Path': 7 5 


The StrokeDashArray-attribute is given as "7 5 ", which causes gxps_dash_array_parse() to choke as the string is split into {"7","5",""} and gxps_value_get_double("") fails on the empty string.

This may or may not be considered a bug in the implementation by Microsoft Visio. Given this is a major product, I'd suspect there are many files out there with this problem.

I gave the error message some debugging, the attached patch gives you an idea about the simple workaround. The PDF resulting from xpstopdf looks just fine now.
Comment 1 Lukas Lueg 2013-11-22 16:10:42 UTC
Created attachment 261241 [details] [review]
Patch to fix surrounding whitespaces

Cluttered by whitespaces-changes
Comment 2 Carlos Garcia Campos 2013-11-25 10:11:26 UTC
Review of attachment 261241 [details] [review]:

Thanks for the patch, could you share the document having this issue to include it in my test suite?

::: libgxps/gxps-page.c
@@ +213,3 @@
+	g_strstrip (_dash);
+	items = g_strsplit (_dash, " ", -1);
+	g_free (_dash);

This looks good to me, my only concern is that we are going to always duplicate the string and run strip to fix a very specific case, so in 99% of the cases we are doing it for nothing. So, even if it's a bit more complex, I wonder if we could check first if the given string contains trailing or leading whitespaces before dup + strip.
Comment 3 Lukas Lueg 2013-11-25 13:26:08 UTC
I duped the dash-string into _dash because I wasnt sure if surrounding whitespaces are required for other syntax nodes. If not so, gxps_dash_array_parse()'s caller should call g_strsplit() on what becomes *dash, so all nodes only get to see attribute-strings with their whitespaces removed.
Comment 4 Lukas Lueg 2013-11-25 13:31:44 UTC
Oh and I'm afraid I can't share the original file that caused the problem.

I would guess that Visio 2013 uses some general component to write .xps files (maybe .NET ?).
Comment 5 Carlos Garcia Campos 2013-11-25 13:52:26 UTC
(In reply to comment #3)
> I duped the dash-string into _dash because I wasnt sure if surrounding
> whitespaces are required for other syntax nodes. If not so,
> gxps_dash_array_parse()'s caller should call g_strsplit() on what becomes

I guess you mean strip instead of split here.

> *dash, so all nodes only get to see attribute-strings with their whitespaces
> removed.

The caller can't modify the string either because it's a const char* owned by the xml parser. It's fine to duplicate the string when we need to run strip. We could simply check it with something like:

char* stripped_dash = NULL;
if (g_ascii_isspace (dash[0]) || g_ascii_isspace (dash[strlen (dash) - 1])
    stripped_dash = g_strstrip (g_strdup (dash));

items = g_strsplit (stripped_dash ? stripped_dash : dash, " ", -1);
g_free (stripped_dash);

Something like this.
Comment 6 Christian Persch 2013-11-25 14:36:20 UTC
If we're concerned about perf here, should maybe avoid the triple strlen (explicit one, strdup does one, and strstrip as well) :-)
Comment 7 Carlos Garcia Campos 2013-11-25 14:48:16 UTC
(In reply to comment #6)
> If we're concerned about perf here, should maybe avoid the triple strlen
> (explicit one, strdup does one, and strstrip as well) :-)

Well, I'm assuming that the alloc/dealloc is the slowest thing here. In the best case (no whitespaces at all) we were not doing any strlen, if there are trailing whitespace we are doing the 4 strlen actually because strip does two, one in strchomp and one in strchug.

In any case, I'm not specially concerned about performance here, that's why I was wondering whether it is worth the checks to avoid alloc + strip + dealloc, considering that this is only necessary for very few cases where the dash string contains trailing or leading whitespaces.
Comment 8 Lukas Lueg 2013-11-25 15:02:37 UTC
I'd suspect a single alloc and strip inside gxps_dash_array_parse() has close to zero performance penalty on the overall process.


As a side note: I added "glib_assert(num_strokes % 2 == 0)" to my patch since the spec says that StrokeDashArray "Specifies the length of dashes and gaps of the outline stroke. These values are specified as multiples of the stroke thickness as a space-separated list with an *even number of non-negative values*." (emphasis added).

The assertion should be thrown out (to avoid a crash in case of an invalid file) but gxps_dash_array_parse() should nevertheless check that num_strokes is 0 modulo 2 and has no negative values before returning True.
Comment 9 Carlos Garcia Campos 2013-11-26 09:52:48 UTC
(In reply to comment #8)
> I'd suspect a single alloc and strip inside gxps_dash_array_parse() has close
> to zero performance penalty on the overall process.

Agree, I've just pushed the patch to git master, slightly modified and without the other changes (tabs vs spaces and the assert).

> 
> As a side note: I added "glib_assert(num_strokes % 2 == 0)" to my patch since
> the spec says that StrokeDashArray "Specifies the length of dashes and gaps of
> the outline stroke. These values are specified as multiples of the stroke
> thickness as a space-separated list with an *even number of non-negative
> values*." (emphasis added).

Right, good catch, however, I think we should simply make the parser fail instead of crashing. We can also check num_dashes is an even number before allocating and parsing all the values, returning earlier when it fails.

> The assertion should be thrown out (to avoid a crash in case of an invalid
> file) but gxps_dash_array_parse() should nevertheless check that num_strokes is
> 0 modulo 2 and has no negative values before returning True.

Right, I've done both things in follow up patches. 

Regarding the tabs vs spaces, I prefer to use spaces, and I know we are mixing them in several places, so we should probably untabify everywhere, but I'm not sure it's worth it.