After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 705313 - Improve chart import from XLSX
Improve chart import from XLSX
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export MS Excel (tm)
1.12.x
Other Linux
: Normal minor
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2013-08-01 18:57 UTC by John Denker
Modified: 2015-01-20 18:08 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
.xlsx file that produces lots of "Unexpected element" errors and "out of range" errors (19.74 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-08-01 18:57 UTC, John Denker
Details
pdf file created by Excel 2013 (229.04 KB, application/pdf)
2013-08-01 23:04 UTC, Andreas J. Guelzow
Details

Description John Denker 2013-08-01 18:57:10 UTC
Created attachment 250652 [details]
.xlsx file that produces lots of "Unexpected element" errors and "out of range" errors

Example file:
  http://www.av8n.com/physics/bugs/import-bugs.xlsx

Summary:  I was sent a .xlsx file.  
 1) When I tried to import it, I got lots of errors and warnings.
  For details, see attached typescript.

 2) When I converted it to straight .xls, the converted file still
  had warnings.

Priority:  This is not urgent, because the problems appear to
be non-fatal.

=========================

: jsd@asclepias bugs ; ssconvert import-bugs.xlsx import-bugs.xls
Using exporter Gnumeric_Excel:excel_biff8
Unexpected element 'extLst' in state : 
	styleSheet
Unexpected element 'c:date1904' in state : 
	chartSpace
Unexpected element 'c:roundedCorners' in state : 
	chartSpace
Unexpected element 'c:varyColors' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:trendline' in state : 
	chartSpace -> chart -> plotArea -> scatterChart -> ser
Unexpected element 'c:dLbls' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:showDLblsOverMax' in state : 
	chartSpace -> chart
Unexpected element 'c:date1904' in state : 
	chartSpace
Unexpected element 'c:roundedCorners' in state : 
	chartSpace
Unexpected element 'c:varyColors' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:trendline' in state : 
	chartSpace -> chart -> plotArea -> scatterChart -> ser
Unexpected element 'c:dLbls' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:showDLblsOverMax' in state : 
	chartSpace -> chart
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 


: jsd@asclepias bugs ; ssconvert  import-bugs.xls import-bugs2.xls
Using exporter Gnumeric_Excel:excel_biff8

(ssconvert:19988): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-width' of type `gint'

(ssconvert:19988): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-height' of type `gint'
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 

: jsd@asclepias bugs ; ssconvert  import-bugs2.xls import-bugs3.xls
Using exporter Gnumeric_Excel:excel_biff8

(ssconvert:19998): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-width' of type `gint'

(ssconvert:19998): GLib-GObject-WARNING **: value "0" of type `gint' is invalid or out of range for property `preferred-height' of type `gint'
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; uname -a
Linux asclepias 2.6.39.4 #4 SMP Thu May 30 01:02:55 MST 2013 x86_64 x86_64 x86_64 GNU/Linux
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; 
: jsd@asclepias bugs ; gnumeric --version
gnumeric version '1.12.1'
datadir := '/usr/share/gnumeric/1.12.1'
libdir := '/usr/lib/gnumeric/1.12.1'
Comment 1 Andreas J. Guelzow 2013-08-01 23:04:48 UTC
Created attachment 250664 [details]
pdf file created by Excel 2013

This file is attached for comparison with our import.
Comment 2 Jean Bréfort 2013-08-02 14:16:57 UTC
The chazrt import from xlsx is known to be very incomplete. And export to xlsx is even much worse.
Comment 3 Andreas J. Guelzow 2013-08-03 18:21:16 UTC
The spurious xls-->xls warnings have been fixed.
Comment 4 Andreas J. Guelzow 2013-08-03 20:05:02 UTC
Note that with current git, the xlsx->... import gives the warnings:

sing exporter Gnumeric_Excel:excel_biff8
Encountered uninterpretable "ext" extension in namespace "{EB79DEF2-80B8-43e5-95BD-54CBDDF9020C}"
Unexpected element 'x14:slicerStyles' in state : 
	styleSheet -> extLst -> ext
Unexpected element 'c:date1904' in state : 
	chartSpace
Unexpected element 'c:roundedCorners' in state : 
	chartSpace
Unexpected element 'c:varyColors' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:trendline' in state : 
	chartSpace -> chart -> plotArea -> scatterChart -> ser
Unexpected element 'c:dLbls' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:showDLblsOverMax' in state : 
	chartSpace -> chart
Unexpected element 'c:date1904' in state : 
	chartSpace
Unexpected element 'c:roundedCorners' in state : 
	chartSpace
Unexpected element 'c:varyColors' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:trendline' in state : 
	chartSpace -> chart -> plotArea -> scatterChart -> ser
Unexpected element 'c:dLbls' in state : 
	chartSpace -> chart -> plotArea -> scatterChart
Unexpected element 'c:showDLblsOverMax' in state : 
	chartSpace -> chart

The
Unexpected element 'x14:slicerStyles' in state : 
	styleSheet -> extLst -> ext
cannot be 'fixed'. Basically the ext element contains elements not defined in ECMA but specifically intended to be application specific.

The other chartSpace elements are elements that are not yet interpreted by the importer.
Comment 5 Andreas J. Guelzow 2013-08-05 21:53:49 UTC
We now have only:

Encountered uninterpretable "ext" extension in namespace "{EB79DEF2-80B8-43e5-95BD-54CBDDF9020C}"
Unexpected element 'x14:slicerStyles' in state : 
	styleSheet -> extLst -> ext

These are expected.

Of course the elements that are not listed here are now expected but not yet interpreted. We still have to make the graphs in the original file look like the images in the pdf file attached to comment #1.
Comment 6 Andreas J. Guelzow 2013-08-06 04:52:01 UTC
The file contains:

<c:scatterChart><c:scatterStyle val="lineMarker"/>

lineMarker is (according to ECMA 376) defined to be: "Specifies the points on the scatter chart shall be connected with straight lines and markers shall be drawn."

So we are correct in drawing those line segments, even if Excel 2013 does not seem to draw tehm!
Comment 7 Jean Bréfort 2013-08-06 06:36:20 UTC
Series format always override plot format in excel, as far as I remember from what we did with xls format. This is quite different from what we have in gnumeric.
Comment 8 Andreas J. Guelzow 2013-08-06 07:14:17 UTC
We write lines+markers as 'marker' and LO interprets this correctly as  lines+markers

when we read 'marker' we interpret it as markers+nolines (as the Standard claims).

So we have a problem that we do not roundtrip these scatter plots correctly.

Moreover it seems that the standard does not match what applications implement.
Comment 9 Jean Bréfort 2013-08-06 08:44:51 UTC
The series has a line style child: <a:ln w="28575"><a:noFill/></a:ln>
I suppose that in this context, noFill is equivalent to no line (why a width then?).
Comment 10 John Denker 2013-08-06 14:12:15 UTC
(In reply to comment #9)
> The series has a line style child: <a:ln w="28575"><a:noFill/></a:ln>
> I suppose that in this context, noFill is equivalent to no line (why a width
> then?).

I don't understand comment #9.
  Normally "filling a curve" is a separate concept from "stroking 
  a curve".  Some curves have only a stroke, some curves have only 
  a fill, some have both, and (in the trivial case) some have 
  neither.  Width is a stroke property whereas nofill is a fill 
  property, so don't see any conflict here.

If there is some reason why the normal concepts do not apply
to this situation, please explain.
Comment 11 John Denker 2013-08-06 14:17:07 UTC
(In reply to comment #8)
> We write lines+markers as 'marker' and LO interprets this correctly as 
> lines+markers

Could we perhaps write lines+markers as 'lines+markers' explicitly,
so the question of interpretation does not arise?  Is there any
possible downside to doing this?
Comment 12 Jean Bréfort 2013-08-06 14:53:56 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > The series has a line style child: <a:ln w="28575"><a:noFill/></a:ln>
> > I suppose that in this context, noFill is equivalent to no line (why a width
> > then?).
> 
> I don't understand comment #9.
>   Normally "filling a curve" is a separate concept from "stroking 
>   a curve".  Some curves have only a stroke, some curves have only 
>   a fill, some have both, and (in the trivial case) some have 
>   neither.  Width is a stroke property whereas nofill is a fill 
>   property, so don't see any conflict here.
> 
> If there is some reason why the normal concepts do not apply
> to this situation, please explain.

noFill can be a child of spPr or a child of ln which is itself a chile of spPr. At the spPr level, it means no filling of the series. At the ln level, it probably means no filling of the line.
Comment 13 Andreas J. Guelzow 2013-08-06 17:02:13 UTC
(In reply to comment #11)
John, what do you mean with "writing lines+markers as 'lines+markers' explicitly"? This file format is described in ECMA-376. We have to follow what is described there. The problem is that it is not quite clear what is meant there since Excel's behaviour appears to differ from that description. So we need to figure out how we are misunderstanding that description.
Comment 14 Morten Welinder 2014-05-31 16:14:05 UTC
Other than the fact that we connect points with lines, this actually looks
pretty much the same in gnumeric and the pdf from above.

(I have no idea how it looked back when this was filed.)
Comment 15 Andreas J. Guelzow 2014-06-01 17:21:59 UTC
We are also missing the regression lines, regression equations and R^2 values.
Comment 16 Morten Welinder 2015-01-20 17:20:22 UTC
Lines between points are gone.
Trend lines are in.
No equations etc.
Comment 17 Morten Welinder 2015-01-20 18:08:26 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.