After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 750627 - [ods] ~ operator
[ods] ~ operator
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export OOo / OASIS
git master
Other All
: Normal normal
: ---
Assigned To: Andreas J. Guelzow
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2015-06-09 11:53 UTC by Morten Welinder
Modified: 2015-06-19 01:07 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
sample patch (1.22 KB, patch)
2015-06-18 01:40 UTC, Andreas J. Guelzow
none Details | Review
proposed patch (1.91 KB, patch)
2015-06-18 02:10 UTC, Andreas J. Guelzow
committed Details | Review

Description Morten Welinder 2015-06-09 11:53:27 UTC
The file...

http://cgit.freedesktop.org/libreoffice/core/plain/sc/qa/unit/data/ods/functions.ods

...shows that we import certain functions that came from xl 2003+ as
unknown.  Statistics!A4 has =COM.MICROSOFT.EXPON.DIST(3,0.5,1), for
example, and we evaluate that as #NAME?
Comment 1 Andreas J. Guelzow 2015-06-09 13:29:22 UTC
Considering that with respect to EXPON.DIST: "The function is new in Excel 2010, so is not available in earlier versions of Excel. However, the Expon.Dist function is simply a renamed version of the Expondist function, that is available in earlier versions of Excel." it is strange that any ODF file would contain COM.MICROSOFT.EXPON.DIST when it should have just been the ODF function EXPONDIST.

I will add the appropriate mapping later today/
Comment 2 Morten Welinder 2015-06-09 13:49:34 UTC
Hmm...

My copy of localc also gets #NAME? when reading this file.

Excel seems to read it, complain over it, and discard all expressions in
favour of the stored values.  Ugh!

So where does that leave us?  I am not sure.  The file at

http://cgit.freedesktop.org/libreoffice/core/plain/sc/qa/unit/data/contentCSV/statistical-functions.csv

suggests that some version of LO can read this.
Comment 3 Andreas J. Guelzow 2015-06-11 05:24:47 UTC
Those functions that were just new names for existing old functions are now handled. There are still some new functions left to map correctly.
Comment 4 Morten Welinder 2015-06-11 12:35:38 UTC
Definitely improved.

The failure in B2 puzzles me.  The cell imports as "=rsq([]range1,[]range2)"
which evaluated to #NAME?, presumably due to the ranges.

But Statistics has ranges called "range1" and "range2".
Comment 5 Andreas J. Guelzow 2015-06-11 13:34:37 UTC
The ODF file contains
table:formula="of:=RSQ(range1;range2)"

I don't know (yet) where the [] come from. Note that =rsq(range1,range2) would evaluate just fine.
Comment 6 Andreas J. Guelzow 2015-06-13 22:51:23 UTC
WE are now left with the [] problem and a bunch of functions we do not seem to have equivalent versions of.
Comment 7 Andreas J. Guelzow 2015-06-17 13:26:02 UTC
The function issues have now been all resolved. B2 (and B3 and B8) still have a problem. For B2 we are parsing
table:formula="of:=RSQ(range1;range2)"
getting an expression, setting cell B2 to that expression and then see =rsq([]range1,[]range2)

I am not even sure what expression that really is!
Comment 8 Morten Welinder 2015-06-17 15:18:51 UTC
See parse-util.c function std_expr_name_handler near line 1292.

There are, somehow, two different names defined with the name "range1".
One is a placeholder, one is not.

(gdb) p thename->pos
$22 = {eval = {col = 1, row = 1}, sheet = 0x0, wb = 0x7f1690}
(gdb) p thename
$23 = (const GnmNamedExpr *) 0xa758a0
(gdb) p expr_name_lookup (out->pp, expr_name_name (thename))
$24 = (GnmNamedExpr *) 0x124c4a0
(gdb) p *$24
$25 = {ref_count = 1, name = 0x4eaf480, pos = {eval = {col = 14, row = 5}, sheet = 0x9199c0, wb = 0x7f1690}, 
  dependents = 0x0, texpr = 0x1a5af60, is_placeholder = 0, is_hidden = 0, is_permanent = 0, is_editable = 1, 
  scope = 0x983920}
(gdb) p *$24->name
$26 = {str = 0xb8e0d0 "range1"}
(gdb) p *thename
$27 = {ref_count = 6, name = 0x4eaf480, pos = {eval = {col = 1, row = 1}, sheet = 0x0, wb = 0x7f1690}, 
Python Exception <class 'gdb.error'> There is no member named keys.: 
  dependents = 0x9ec860, texpr = 0x15bc020, is_placeholder = 1, is_hidden = 0, is_permanent = 0, 
  is_editable = 1, scope = 0x8d5140}
(gdb) p *thename->name
$28 = {str = 0xb8e0d0 "range1"}
Comment 9 Andreas J. Guelzow 2015-06-17 16:46:06 UTC
In fact this is not just the same name, but the pointer to the name string appears to be the same !?

When we parse
<table:named-range table:name="range1" table:base-cell-address="$Statistical.$O$6" table:cell-range-address="$Statistical.$O$1:.$O$7"/>

We call
expr_name_add (&pp, name, texpr, NULL, TRUE, NULL);

In expr_name_add this should replace the existing place holder. Since the names are defined at the end of an ODF file, we create place holders as the formulas are parsed.
Comment 10 Morten Welinder 2015-06-17 16:54:32 UTC
The name is a GOString so the pointers are the same simply because the
names are.

The whole named expression placeholder concept is ripe for a rethink.
Comment 11 Andreas J. Guelzow 2015-06-17 17:06:09 UTC
The placeholder we create when parsing the formula ends up in workbook scope. The named expression we create later is in sheet scope.

My first reaction would be to shift the handling of named expressions into the pre-parsing of the file, but at that time we cannot parse the formula yet since we do not know the sizes of the sheets.
Comment 12 Andreas J. Guelzow 2015-06-17 17:20:38 UTC
There is a bunch of things wrong with how we handle the named expressions in ODF import/export. I will dig into that.
Comment 13 Morten Welinder 2015-06-17 18:22:04 UTC
For .gnumeric we do...

1. Create the names as soon as we can with an empty definition.
2. Collect the expressions (as strings) as set them at the end.
Comment 14 Andreas J. Guelzow 2015-06-17 18:56:03 UTC
Already fixed in my tree. I will commit later.
Comment 15 Andreas J. Guelzow 2015-06-17 19:45:59 UTC
This problem has been fixed in the unstable development version. The fix will be available in the next major software release. You may need to upgrade your Linux distribution to obtain that newer version.
Comment 16 Morten Welinder 2015-06-17 20:13:42 UTC
I still get an error on load:

Spreadsheet!A12
Unable to parse 'AREAS(([.A1:.B3]~[.F2]~[.G1]))' ('Invalid expression')
Comment 17 Andreas J. Guelzow 2015-06-17 20:17:37 UTC
Does Gnumeric have the ~ operator? That is the "reference union operator">
Comment 18 Andreas J. Guelzow 2015-06-17 20:20:07 UTC
I guess in Gnumeric that would be  =areas((A1:B3,F2,G1))
Comment 19 Andreas J. Guelzow 2015-06-18 01:40:45 UTC
Created attachment 305509 [details] [review]
sample patch

While the attached patch works in this specific case it is clearly not the correct solution. We could easily encounter formulas that contain the tilde and the regular argument seperator. Unfortunately we seem to be able to only specify a single argument separator.

Note that the ODF standard even says: "Reference union. Displayed as the function parameter separator in some implementations."

Replacing all tildes with semicolons of course causes many other issues.

Perhaps we need to change how convs->arg_sep is used by adding a second convs->arg_sep_alt.
Comment 20 Andreas J. Guelzow 2015-06-18 02:10:46 UTC
Created attachment 305511 [details] [review]
proposed patch

Since this patch touches the parser I would like somebody else to review it.

Note that nobody seems to do "reference union" correctly. In my version of LO  =SUM(A5:C5~B4:B6) with every cell of interest containing a 1 evaluates to 6 (and I would think that 5 would be more sensible if it is truly a union. So having ~ really act like an argument separator looks appropriate to me.
Comment 21 Andreas J. Guelzow 2015-06-19 01:07:25 UTC
Review of attachment 305511 [details] [review]:

committed
Comment 22 Andreas J. Guelzow 2015-06-19 01:07:51 UTC
This problem has been fixed in the unstable development version. The fix will be available in the next major software release. You may need to upgrade your Linux distribution to obtain that newer version.