GNOME Bugzilla – Bug 504256
Paired t-test failures
Last modified: 2007-12-19 01:04:02 UTC
[Originally reported as http://bugs.debian.org/456798] Package: gnumeric Version: 1.7.91-1.1 Severity: normal Hello, When running a paired t-test, several of the outputs failed. This turned out to be due to the two selected cell ranges being subtracted rather than compared (i.e. you had RANGE1 - RANGE2 rather than RANGE1, RANGE2) in the function call. This happened for the Observed Mean Difference, Variance of the Differences, and df, but not for the Pearson Correlation (which also requires two cell ranges rather than one). I've attached a patch that fixes this problem for the Observed Mean Difference when doing the paired t-test. I basically copied the way that the Pearson Correlation calculation was done. I'm happy to do the same fix for the other broken functions if this is indeed the correct fix, but I'm not sure about it because the way it's written does appear to be deliberate, even if I can't make heads or tails of it. I don't know if this affects other tests as well, but I can check if it's desired. Thanks! - David Nusinow
Created attachment 101193 [details] [review] Proposed patch for Observed Mean Difference
On first glance this does not look right. The observed mean difference is equal to the mean of the pairwise differences; it is not equal to the mean of the combined samples. Please provide an example of how the "outputs fail", ideally a small sample file with a little bit of data, teh results obtained by gnumeric and what you would expect.
I see what you mean: somehow the formulas are messed up they don't evluate as expected. We do have to clculate the difference though. Morten: Shouldn't =average(Sheet1!$A$1:$A$3-Sheet1!$B$1:$B$3) calculate the average of the differences of corresponding cells in those regions?
hmm, if I type the same formula into a different cell it works fine. If I type it into the same cell it fails again with #VALUE!
I was mistaken in the last comment, retyping into a differnet cell gives a value rather than an error but not the correct value.
In the moment I can only try with 1.7.13, but the calculations with ranges seems to be messed up. We have to figure out what is going on before releasing 1.8.
Adding original bug reporter.
I guess this is all intentionally to make the functions behave like XL. We need some different formulas here instead.
> Morten: Shouldn't > =average(Sheet1!$A$1:$A$3-Sheet1!$B$1:$B$3) > calculate the average of the differences of corresponding cells in those > regions? As discussed on irc, this is only true if the formula is entered as an array formula.
Andreas: we also, eventually, need to stop going things like... static const GnmCellRef mean_diff_hypo = {NULL, 0, -2, TRUE, TRUE}; ... gnm_expr_new_cellref(&mean_diff_hypo)), It would be far better to have a local function, make_rel_ref, taking dx and dy arguments. It would be shorter and not lock us into the structure layout.
re comment #10 this should wait until 1.9
Created attachment 101220 [details] [review] proposed patch proposed (minimal) patch, please review
<jody> I'd prefer not have the arguments until they are necessary rather than a g_warning and some partial code <aguelzow> okay, easy enough to remove... <jody> It also seems like we should throw in a comment to think about adding a utility function <jody> to do the position validation <jody> That + if (dao->type == RangeOutput && ... <jody> and the SHEET_MAX tests are likely to be replicated in lots of places <jody> but those are post branch changes <jody> for now that looks good
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.