GNOME Bugzilla – Bug 705313
Improve chart import from XLSX
Last modified: 2015-01-20 18:08:26 UTC
Created attachment 250652 [details] .xlsx file that produces lots of "Unexpected element" errors and "out of range" errors Example file: http://www.av8n.com/physics/bugs/import-bugs.xlsx Summary: I was sent a .xlsx file. 1) When I tried to import it, I got lots of errors and warnings. For details, see attached typescript. 2) When I converted it to straight .xls, the converted file still had warnings. Priority: This is not urgent, because the problems appear to be non-fatal. ========================= : jsd@asclepias bugs ; ssconvert import-bugs.xlsx import-bugs.xls Using exporter Gnumeric_Excel:excel_biff8 Unexpected element 'extLst' in state : styleSheet Unexpected element 'c:date1904' in state : chartSpace Unexpected element 'c:roundedCorners' in state : chartSpace Unexpected element 'c:varyColors' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:trendline' in state : chartSpace -> chart -> plotArea -> scatterChart -> ser Unexpected element 'c:dLbls' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:showDLblsOverMax' in state : chartSpace -> chart Unexpected element 'c:date1904' in state : chartSpace Unexpected element 'c:roundedCorners' in state : chartSpace Unexpected element 'c:varyColors' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:trendline' in state : chartSpace -> chart -> plotArea -> scatterChart -> ser Unexpected element 'c:dLbls' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:showDLblsOverMax' in state : chartSpace -> chart : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; ssconvert import-bugs.xls import-bugs2.xls Using exporter Gnumeric_Excel:excel_biff8 (ssconvert:19988): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-width' of type `gint' (ssconvert:19988): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-height' of type `gint' : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; ssconvert import-bugs2.xls import-bugs3.xls Using exporter Gnumeric_Excel:excel_biff8 (ssconvert:19998): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-width' of type `gint' (ssconvert:19998): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-height' of type `gint' : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; uname -a Linux asclepias 2.6.39.4 #4 SMP Thu May 30 01:02:55 MST 2013 x86_64 x86_64 x86_64 GNU/Linux : jsd@asclepias bugs ; : jsd@asclepias bugs ; : jsd@asclepias bugs ; gnumeric --version gnumeric version '1.12.1' datadir := '/usr/share/gnumeric/1.12.1' libdir := '/usr/lib/gnumeric/1.12.1'
Created attachment 250664 [details] pdf file created by Excel 2013 This file is attached for comparison with our import.
The chazrt import from xlsx is known to be very incomplete. And export to xlsx is even much worse.
The spurious xls-->xls warnings have been fixed.
Note that with current git, the xlsx->... import gives the warnings: sing exporter Gnumeric_Excel:excel_biff8 Encountered uninterpretable "ext" extension in namespace "{EB79DEF2-80B8-43e5-95BD-54CBDDF9020C}" Unexpected element 'x14:slicerStyles' in state : styleSheet -> extLst -> ext Unexpected element 'c:date1904' in state : chartSpace Unexpected element 'c:roundedCorners' in state : chartSpace Unexpected element 'c:varyColors' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:trendline' in state : chartSpace -> chart -> plotArea -> scatterChart -> ser Unexpected element 'c:dLbls' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:showDLblsOverMax' in state : chartSpace -> chart Unexpected element 'c:date1904' in state : chartSpace Unexpected element 'c:roundedCorners' in state : chartSpace Unexpected element 'c:varyColors' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:trendline' in state : chartSpace -> chart -> plotArea -> scatterChart -> ser Unexpected element 'c:dLbls' in state : chartSpace -> chart -> plotArea -> scatterChart Unexpected element 'c:showDLblsOverMax' in state : chartSpace -> chart The Unexpected element 'x14:slicerStyles' in state : styleSheet -> extLst -> ext cannot be 'fixed'. Basically the ext element contains elements not defined in ECMA but specifically intended to be application specific. The other chartSpace elements are elements that are not yet interpreted by the importer.
We now have only: Encountered uninterpretable "ext" extension in namespace "{EB79DEF2-80B8-43e5-95BD-54CBDDF9020C}" Unexpected element 'x14:slicerStyles' in state : styleSheet -> extLst -> ext These are expected. Of course the elements that are not listed here are now expected but not yet interpreted. We still have to make the graphs in the original file look like the images in the pdf file attached to comment #1.
The file contains: <c:scatterChart><c:scatterStyle val="lineMarker"/> lineMarker is (according to ECMA 376) defined to be: "Specifies the points on the scatter chart shall be connected with straight lines and markers shall be drawn." So we are correct in drawing those line segments, even if Excel 2013 does not seem to draw tehm!
Series format always override plot format in excel, as far as I remember from what we did with xls format. This is quite different from what we have in gnumeric.
We write lines+markers as 'marker' and LO interprets this correctly as lines+markers when we read 'marker' we interpret it as markers+nolines (as the Standard claims). So we have a problem that we do not roundtrip these scatter plots correctly. Moreover it seems that the standard does not match what applications implement.
The series has a line style child: <a:ln w="28575"><a:noFill/></a:ln> I suppose that in this context, noFill is equivalent to no line (why a width then?).
(In reply to comment #9) > The series has a line style child: <a:ln w="28575"><a:noFill/></a:ln> > I suppose that in this context, noFill is equivalent to no line (why a width > then?). I don't understand comment #9. Normally "filling a curve" is a separate concept from "stroking a curve". Some curves have only a stroke, some curves have only a fill, some have both, and (in the trivial case) some have neither. Width is a stroke property whereas nofill is a fill property, so don't see any conflict here. If there is some reason why the normal concepts do not apply to this situation, please explain.
(In reply to comment #8) > We write lines+markers as 'marker' and LO interprets this correctly as > lines+markers Could we perhaps write lines+markers as 'lines+markers' explicitly, so the question of interpretation does not arise? Is there any possible downside to doing this?
(In reply to comment #10) > (In reply to comment #9) > > The series has a line style child: <a:ln w="28575"><a:noFill/></a:ln> > > I suppose that in this context, noFill is equivalent to no line (why a width > > then?). > > I don't understand comment #9. > Normally "filling a curve" is a separate concept from "stroking > a curve". Some curves have only a stroke, some curves have only > a fill, some have both, and (in the trivial case) some have > neither. Width is a stroke property whereas nofill is a fill > property, so don't see any conflict here. > > If there is some reason why the normal concepts do not apply > to this situation, please explain. noFill can be a child of spPr or a child of ln which is itself a chile of spPr. At the spPr level, it means no filling of the series. At the ln level, it probably means no filling of the line.
(In reply to comment #11) John, what do you mean with "writing lines+markers as 'lines+markers' explicitly"? This file format is described in ECMA-376. We have to follow what is described there. The problem is that it is not quite clear what is meant there since Excel's behaviour appears to differ from that description. So we need to figure out how we are misunderstanding that description.
Other than the fact that we connect points with lines, this actually looks pretty much the same in gnumeric and the pdf from above. (I have no idea how it looked back when this was filed.)
We are also missing the regression lines, regression equations and R^2 values.
Lines between points are gone. Trend lines are in. No equations etc.
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.