After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 682734 - some expressions involving arrays in {curly braces} give wrong answers, or complain #VALUE
some expressions involving arrays in {curly braces} give wrong answers, or co...
Status: RESOLVED OBSOLETE
Product: Gnumeric
Classification: Applications
Component: Main System
1.10.x
Other All
: Normal normal
: ---
Assigned To: Jody Goldberg
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2012-08-26 16:41 UTC by John Denker
Modified: 2018-05-22 13:52 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
spreadsheet exhibiting bug: some expressions involving arrays in {curly braces} give wrong answers, or complain #VALUE (2.50 KB, application/x-gnumeric)
2012-08-26 16:41 UTC, John Denker
Details
improved demonstration of the problem with scalar-valued functions of arrays (2.56 KB, application/x-gnumeric)
2012-08-27 02:12 UTC, John Denker
Details
further improved demonstration of the problem with scalar-valued functions of arrays (2.65 KB, application/x-gnumeric)
2012-08-27 08:22 UTC, John Denker
Details

Description John Denker 2012-08-26 16:41:44 UTC
Created attachment 222481 [details]
spreadsheet exhibiting bug: some expressions involving arrays in {curly braces} give wrong answers, or complain #VALUE

Gnumeric version '1.10.13'
The following appear to be 100% reproducible:

Symptom #1:  =sumproduct({1,2,3},sin({1,3,5}))  ... complains #VALUE

Symptom #2:  =max(sin({1,3,5}))  ... gives wrong numerical result

==========

By way of contrast, seemingly-similar expressions such as the following
appear to give correct results:

    =sumproduct({1,2,3},10^({1,3,5}))  ... (OK)

    =linest(A15:A18,sin(2*{1,3,5}*A15:A18),FALSE)   ...  (OK)
Comment 1 Andreas J. Guelzow 2012-08-26 19:07:03 UTC
=sumproduct({1,2,3},sin({1,3,5}))  ... complains #VALUE if it is _not_ entered as an array function. (In that case implicit intersection is used and of course that fails.) 

=sumproduct({1,2,3},sin({1,3,5})) yields -1.75... if entered as an array function which seems to be correct.

=max(sin({1,3,5})) whether entered as a array function or otehrwise yields 0.84... which is correct. What do you see?

Note:
I am using Gnumeric 1.11.5+. So there could be some bugs fixed already.

So I do not see a bug here...
Comment 2 John Denker 2012-08-27 02:12:36 UTC
Created attachment 222506 [details]
improved demonstration of the problem with scalar-valued functions of arrays
Comment 3 John Denker 2012-08-27 02:40:11 UTC
Thanks for the prompt and informative response!


On 08/26/2012 12:07 PM, Gnumeric (bugzilla.gnome.org) wrote:

> =sumproduct({1,2,3},sin({1,3,5})) yields -1.75... if entered as an array
> function which seems to be correct.

Wow.  That's a useful workaround.

Specifically:  If we treat it as an array of one element, and enter it
with ctrl-shift-enter, it works.

I don't understand why some other functions such as
  =sumproduct({1,2,3},100^({1,2,3}))
work just fine, even if they are not entered as arrays-of-one-element ... 
but I don't urgently need to understand that, now that a workaround has 
been provided.

> =max(sin({1,3,5})) whether entered as a array function or otehrwise yields
> 0.84... which is correct. What do you see?

My bad.  There is a bug, but that is not the example that demonstrates
the bug.  A more illuminating example is

   =max(sin({1,2,3}))

which gives 0.909 (the right answer) when entered as an array-of-one-element,
but insidiously gives 0.841 when entered as a normal scalar-valued expression.

I just now uploaded a greatly improved attachment that exhibits these
phenomena.

===========

Given that
  =sumproduct({1,2,3},100^({1,2,3}))
works just fine even when entered as a normal scalar-valued expression,
one would hope that the seemingly-simpler expression
  =sumproduct({1,2,3},sin({1,2,3}))
could be made to work in the same way, without too much pain.  In
particular, I suspect that getting this to work would be much easier
than trying to explain to users why /some/ scalar-valued expressions
need to be entered as arrays-of-one-element.

Also, if in the short term 
  =max(sin({1,2,3}))
cannot be made to return the correct value when entered as a normal
scalar-valued expression, I suggest it should return #VALUE or some
such.  Returning a numerically-incorrect answer makes life very
unpleasant for ordinary users.

============

I wonder how MS-XL handles these cases.  I am not at the moment able
to check this myself.  Maybe tomorrow.  Let me know if such a check
is worth pursuing.

Thanks!
Comment 4 Andreas J. Guelzow 2012-08-27 02:48:48 UTC
Hi John, thnaks this example file will help to understand what the problem may be (if any).

In cells C11 and C12 I see 0.909 and 0.841 respectively, I note that these results are the same as LibreOffice. Clearly for C12 max is applied to each argument value individualy. So this might not be bug but just an unexpected behaviour. I will need to check XL to see how it is behaving.

In view of C4,C5, and C6 I would say that C7 and C8 is clearly a bug, although C9 shows the working workaround.
Comment 5 John Denker 2012-08-27 08:22:41 UTC
Created attachment 222517 [details]
further improved demonstration of the problem with scalar-valued functions of arrays

As versions change, bugs will disappear ... making my
examples invalid and/or very mysterious.

One sheet in this file shows the numerical values, as 
I see them today.  This is for reference, to show what 
the problem was, even after it gets fixed ... and perhaps 
to show what the problem is even today, when using versions 
different from mine.  I created this using paste-by-value.
Values I consider less-than-desirable are shown in red.

The other sheet contains the "live" spreadsheet formulas.
Comment 6 John Denker 2012-08-27 19:25:48 UTC
Earlier there was some discussion of what is "unexpected behavior" as
opposed to an outright bug.

For your consideration: Here is a possible guideline, to help with the
design, to help clarify what is the desired, expected behavior.

  A) Suppose we start with an array constant such as
        =sin({1,2,3})
   and then enter it into three cells A1:C1, entering it using 
   <ctrl-shift-enter>, since it is an array.

  B) Further suppose that we enter a function such as
        =max(A1:C1)
   into another cell, entering it as an ordinary scalar-valued
   function using simply <enter> ... not <ctrl-shift-enter>.

  C) Then it could be argued that the combined expression
        =max(sin({1,2,3}))
   should give the same answer ... when entered into a single
   cell and when used as part of a larger expression.  It should 
   not require <ctrl-shift-enter> for the same reason that step (B) 
   did not require <ctrl-shift-enter>.

Rationale: This guideline (if A and B then C) is motivated by one of
the axioms of basic algebra, and by the notion that an array constant
should be /equivalent/ to an array in cells.

If there is some reason why this guideline cannot be followed, or if
there is a better way of looking at the issue, please explain.
Comment 7 Andreas J. Guelzow 2012-08-27 19:33:18 UTC
With respect to functions that are in Excel our behaviour must match Excel's. With respect to other ODF functions oyr behaviour must follow whatever ODF1.2 part2 defines. For every thing else we try to do what is reasonable.

If we weren't doing that interoperability would suffer.
Comment 8 Andreas J. Guelzow 2012-08-27 19:38:11 UTC
With respect to your ligic:

Say A1 to A3 contains 0, =pi()/4, =pi()/2.
If you enter in C2 =sin(A1:A3) what would you expect?
Similarly for entering that expression in C4, what would you expect?
Comment 9 John Denker 2012-08-27 21:30:01 UTC
In reply to comment 8:

1) Thanks for asking!

2) As a preliminary remark, let me say that my answer will be very
 sensitive to details of the question.  Logic has a lot to say
 about the behavior of functions such as max(...) and sumproduct(...)
 that are, by necessity, scalar-valued functions that take arrays as
 arguments.  Logic has rather less to say about how to coerce an array
 to "fit" into a single cell.

3) Note that the question is AFAICT equally interesting if you leave
 out the "sin" function.  That is, in cell D2 enter the formula =A1:A3
 and hit <enter>.

 Note that you get a different answer if you go to cell E2 and enter
 the same formula and hit <ctrl-shift-enter>.

4) It depends on whether you are asking about /logic/ or about
 /expectations/.  I know what gnumeric 1.10.13 does in this situation ... 
 and I find it to be highly illogical, for reasons discussed in item
 6 below.

5) So, to answer the questions, here is my opinion.  It is /not/ a
 strongly-held opinion.  Based on thinking about it for half a day,
 my preference would be to return an error in this situation.  I do
 not see any logical way to coerce an array to "fit" into a single 
 cell.

 I hope such coercions are vanishingly rare in practice.

 Even if the program offers some sort of coercion as a "feature" I
 would advise my friends not to use it.  Instead I would recommend
 using the index(...) function to select a specific element from the
 array.

 If in the interests of interoperability it is necessary to do
 "something" with such coercions, I recommend making the "feature"
 optional, perhaps contingent on a command-line option such as
    --enable-bug-for-bug-compatibility

 By way of analogy, I note that the gcc and g++ compilers do this
 sort of thing, conditionally offering compatibility with respect
 to dubious "features".

6) The existing behavior obtained using <enter> is illogical because
 it violates the "rule of minimum surprise".  Ordinary users expect 
 that they can enter a formula in any location and get the same result,
 independent of location.  They can relocate a formula using cut-and-paste 
 without changing the result.  This expectation is violated in a very
 surprising way by the existing rule for coercing arrays to scalars.

 In contrast, the result obtained with <ctrl-shift-enter> is not
 nearly so bad, not nearly so surprising.  It can be freely relocated.

7) I emphasize that the question of how to handle coercions is much
 less interesting and much less important than the issues I intended
 to raise in this bug report.  The interesting and important case
 concerns functions such as max(...) and sumproduct(...) where no
 coercion is required.  Logic has a great deal to say about what
 should happen in this case.

 A solid logical argument is something entirely different from
 a weakly-held opinion.
Comment 10 Andreas J. Guelzow 2012-08-27 21:50:31 UTC
If any of our users opens an Excel document they would be very disturbed if the results were horribly different from what they see on Excel or LibreOffice or OpenOffice.

So we really don't have an option to disable "implicit intersection" by default.

If you work with arrays one should enter formulas as array functions and there should be no surprises then.
Comment 11 Andreas J. Guelzow 2012-08-27 21:57:42 UTC
I have just verified that in Excel 2010
=max(sin({1,2,3})) in fact evaluates as 0.909 independent from whether it is entered as an array formula or not.
Since interoperability with Excel is more important to us than interoperability with LibreOffice, this has just become a bug.

As expected (and indicated in comment #4) our behaviour for
=sumproduct({1,2,3},sin({1,2,3}))
is also a bug. (Excel yields 3.08)
Comment 12 Andreas J. Guelzow 2012-08-27 21:59:00 UTC
What this really means is that Excel triggers "implicit intersection" for ranges A1:A3 but not for array {1,2,3}.
Comment 13 John Denker 2012-08-28 01:10:40 UTC
In reply to comment 11:  It appears there is a consensus.  There are
two lines of reasoning, both leading to the same conclusion.  Specifically,
the principled logical argument agrees with the MS-XL interoperability 
argument.  This is good.

I do not understand comment 12.  As I see it, the principle of the thing 
is that ranges such as A1:A3 should be treated on exactly the same footing
as array constants such as {1,2,3} ... as detailed in comment 6.  If the reasoning in comment 6 is not 100% clear and persuasive, please explain in 
more detail why not.

I do not see how intersections (implicit or otherwise) are relevant here.
I cannot imagine max(...) performing any kind of intersection in the cases
I care about.  Ditto for sumproduct(...).

To briefly paraphrase comment 6:  When in doubt, unpack the array constant
into a range of temporary cells, and then invoke the function on that
range of cells.  In the cases I care about, this is a well-defined
operation and uniquely defines what the result should be.  This is the
logical rule ... and all available evidence indicates that MS-XL plays by 
this rule.
Comment 14 Andreas J. Guelzow 2012-08-28 01:28:34 UTC
In Excel A1:A3 and {1,2,3} are not treated the same. Since virtually nobody uses {1,2,3}, we hadn't noticed that before and used the same interpretation for A1:A3 and {1,2,3}.

In Excel A1:A3 in functions that are not entered as array functions uses implicit intersection, ie. that cell from the range A1:A3 is used that intersect the row in which the formula is used.

You may think that that makes no sense (and I would be hard pressed to disagree) but we cannot change that because of interoperability with Excel.
Comment 15 John Denker 2012-08-28 02:37:26 UTC
The numerical observations in comment 11 seem to indicate that XL is
behaving in a way that is consistent with the logical rule set forth 
in comment 6.

I do not understand how to make the mechanistic explanations in
comment 12 and comment 14 consistent with the XL observations (or 
with logic).

Specifically:  It has been implied several times that XL max(...) and
sumproduct(...) are using an intersection rule.  What is the evidence
supporting this conclusion?  I haven't seen any evidence of this, and 
the observations in comment 11 seem to be powerful evidence to the contrary.
Comment 17 John Denker 2012-08-28 03:54:58 UTC
The question on the table is this:  Is there any evidence that XL
sumproduct(...) and max(...) are using any kind of intersection rule
in the situations of interest?  Note that these are scalar-valued 
functions so the result fits naturally into a single cell.

The exercise suggested in comment 8 and revisited in comment 16 does
not mention sumproduct(...) or max(...) but does involve coercing
arrays to "fit" into a single cell.  Therefore this exercise is two
steps removed from being relevant to the question.

The only relevant observational evidence that has been adduced is in 
comment 11, which strongly suggests that XL sumproduct(...) and max(...) 
are not using any kind of intersection rule.  In addition we have the
theoretical argument, namely the fact that it would be wildly illogical
to use such a rule.  Indeed, it would be doubly illogical, given that 
a nice simple alternative rule is available, as mentioned in comment 6.

I do not understand why the topic of intersections ever came up in this
discussion.  It seems like a red herring, i.e. an irrelevant distraction.
I cannot see why it is relevant to the XL behavior of the functions I 
originally called attention to, namely sumproduct(...), max(...) and 
similar scalar-valued functions that act on arrays and ranges.

I looked at the references cited.  Detailed descriptions of situations 
where intersections are performed are not helpful in situations where 
intersections are not performed.

I could be wrong, but I suspect that recognizing that intersections are 
not relevant to the problem at hand might be a major step toward solving
the problem.
Comment 18 Morten Welinder 2012-09-05 13:51:31 UTC
The main problem here probably is that we aren't entirely sure that we
understand XL's semantics with respect to array formulas and implicit
iteration.

So to summarize, we have two formulas not acting like XL when entered
as a non-array formula:

1. =sumproduct({1,2,3},sin({1,3,5}))
   We get #VALUE, XL gets a number.

2. =max(sin({1,2,3}))
   We get 0.84 (==sin(1)); XL gets 0.91 (==sin(2))

In both cases XL clearly does implicit iteration while we do not.  My guess
is that we are going implicit intersection.
Comment 19 Morten Welinder 2012-09-05 13:57:46 UTC
Note, that I am very wary of arguments like comment 6.  They assume a level
of sanity that XL's semantics doesn't have.  For example: If A1:A3 contains
1, 2, and "x" then sum(A1:A3) is 3.  But sum(1,2,"x") is #VALUE.  (In XL;
we don't do that right.)

So we need to understand what XL is actually doing.
Comment 20 Morten Welinder 2012-09-06 14:33:53 UTC
For =concatenate(if({TRUE,FALSE},"T","F")) we have

              XL                  Gnumeric
array         T                   TF
non-array     T                   T

I don't even know where to start to explain that one.
Comment 21 Morten Welinder 2012-09-06 15:17:36 UTC
"if" might be special, so it might not be the right function to use for
conducting experiments.

Experiments with count(atan2(__X__,__Y__)) show that iteration takes
place when one of __X__ and __Y__ is a set like {1,2} but not when both
are.  In that case, only the first element in each set is used.
Comment 22 GNOME Infrastructure Team 2018-05-22 13:52:36 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnumeric/issues/194.