Bug 504256 – Paired t-test failures

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 504256 - Paired t-test failures


Summary:	Paired t-test failures


Status:	RESOLVED FIXED

Product:	Gnumeric
Classification:	Applications
Component:	Analytics
Version:	git master
Hardware:	Other All

Importance:	Normal blocker
Target Milestone:	---
Assigned To:	Morten Welinder
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2007-12-18 14:54 UTC by J.H.M. Dassen (Ray)
Modified:	2007-12-19 01:04 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed patch for Observed Mean Difference (783 bytes, patch) 2007-12-18 14:55 UTC, J.H.M. Dassen (Ray)	rejected	Details \| Review
proposed patch (3.07 KB, patch) 2007-12-18 20:58 UTC, Andreas J. Guelzow	accepted-commit_now	Details \| Review

Description J.H.M. Dassen (Ray) 2007-12-18 14:54:45 UTC

[Originally reported as http://bugs.debian.org/456798]

Package: gnumeric
Version: 1.7.91-1.1
Severity: normal

Hello,
    When running a paired t-test, several of the outputs failed. This
turned out to be due to the two selected cell ranges being subtracted
rather than compared (i.e. you had RANGE1 - RANGE2 rather than RANGE1,
RANGE2) in the function call. This happened for the Observed Mean
Difference, Variance of the Differences, and df, but not for the Pearson
Correlation (which also requires two cell ranges rather than one). 

   I've attached a patch that fixes this problem for the Observed Mean
Difference when doing the paired t-test. I basically copied the way that
the Pearson Correlation calculation was done. I'm happy to do the same fix
for the other broken functions if this is indeed the correct fix, but I'm
not sure about it because the way it's written does appear to be
deliberate, even if I can't make heads or tails of it. I don't know if this
affects other tests as well, but I can check if it's desired. Thanks!

 - David Nusinow

Comment 1 J.H.M. Dassen (Ray) 2007-12-18 14:55:33 UTC

Created attachment 101193 [details] [review]
Proposed patch for Observed Mean Difference

Comment 2 Andreas J. Guelzow 2007-12-18 17:22:53 UTC

On first glance this does not look right. The observed mean difference is equal to the mean of the pairwise differences; it is not equal to the mean of the combined samples.

Please provide an example of how the "outputs fail", ideally a small sample file with a little bit of data, teh results obtained by gnumeric and what you would expect.

Comment 3 Andreas J. Guelzow 2007-12-18 17:30:52 UTC

I see what you mean: somehow the formulas are messed up they don't evluate as expected. We do have to clculate the difference though.

Morten: Shouldn't
=average(Sheet1!$A$1:$A$3-Sheet1!$B$1:$B$3)
calculate the average of the differences of corresponding cells in those regions?

Comment 4 Andreas J. Guelzow 2007-12-18 17:37:32 UTC

hmm, if I type the same formula into a different cell it works fine. If I type it into the same cell it fails again with #VALUE!

Comment 5 Andreas J. Guelzow 2007-12-18 17:38:32 UTC

I was mistaken in the last comment, retyping into a differnet cell gives a value rather than an error but not the correct value.

Comment 6 Andreas J. Guelzow 2007-12-18 17:43:20 UTC

In the moment I can only try with 1.7.13, but the calculations with ranges seems to be messed up. We have to figure out what is going on before releasing 1.8.

Comment 7 J.H.M. Dassen (Ray) 2007-12-18 18:13:31 UTC

Adding original bug reporter.

Comment 8 Andreas J. Guelzow 2007-12-18 18:26:41 UTC

I guess this is all intentionally to make the functions behave like XL. We need some different formulas here instead.

Comment 9 Morten Welinder 2007-12-18 18:29:29 UTC

> Morten: Shouldn't
> =average(Sheet1!$A$1:$A$3-Sheet1!$B$1:$B$3)
> calculate the average of the differences of corresponding cells in those
> regions?

As discussed on irc, this is only true if the formula is entered as an
array formula.

Comment 10 Morten Welinder 2007-12-18 18:32:55 UTC

Andreas: we also, eventually, need to stop going things like...

		static const GnmCellRef mean_diff_hypo =
			{NULL, 0, -2, TRUE, TRUE};
...
                 gnm_expr_new_cellref(&mean_diff_hypo)),


It would be far better to have a local function, make_rel_ref, taking
dx and dy arguments.  It would be shorter and not lock us into the
structure layout.

Comment 11 Andreas J. Guelzow 2007-12-18 20:50:32 UTC

re comment #10 this should wait until 1.9

Comment 12 Andreas J. Guelzow 2007-12-18 20:58:38 UTC

Created attachment 101220 [details] [review]
proposed patch

proposed (minimal) patch, please review

Comment 13 Jody Goldberg 2007-12-19 00:53:03 UTC

<jody> I'd prefer not have the arguments until they are necessary rather than a g_warning and some partial code
<aguelzow> okay, easy enough to remove...
<jody> It also seems like we should throw in a comment to think about adding a utility function
<jody> to do the position validation
<jody> That + if (dao->type == RangeOutput && ...
<jody> and the SHEET_MAX tests are likely to be replicated in lots of places
<jody> but those are post branch changes
<jody> for now that looks good

Comment 14 Andreas J. Guelzow 2007-12-19 01:04:02 UTC

This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.