After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 620505 - Tool for multiple 2-variable linear regression with multiple dependent variables
Tool for multiple 2-variable linear regression with multiple dependent variables
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: Analytics
1.10.x
Other All
: Normal enhancement
: ---
Assigned To: Andreas J. Guelzow
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2010-06-03 18:00 UTC by Hoyt
Modified: 2010-06-07 03:31 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Gnumeric spreadsheet demonstrating different slope calculations (12.30 KB, application/x-gnumeric)
2010-06-03 18:00 UTC, Hoyt
Details

Description Hoyt 2010-06-03 18:00:29 UTC
Created attachment 162680 [details]
Gnumeric spreadsheet demonstrating different slope calculations

The new multiple linear regression function (THANKS! I LOVE IT!) correctly gives F and F-significance, but the slopes are reported using LINEST with the independent variable first, then the dependent variable. When using LINEST, the slope results are more intuitive with the charting functions if the DV listed, followed by the IV. Example:

=linest($B$22:$B$26,C22:C26,,FALSE)
should be
=linest(C22:C26,$B$22:$B$26,,FALSE)

I have attached a spreadsheet showing the difference between these formulas using a small dataset, and how the latter arrangement more closely matches the charts generated by Gnumeric (and Excel).

Other statistical software packages also return slopes using the latter method (one is shown in the attached spreadsheet). 

-pete
Comment 1 Andreas J. Guelzow 2010-06-03 19:45:39 UTC
ODF, Excel and Gnumeric should (and used to) agree on the argument and results order:
LINEST(dependent,independents)

and the first row returns:

slope for last independent, slope for second last indep., etc, intercept

[There is no slope for the dependent variable so I don't know what you mean with that.)

I have just checked Gnumeric and that is what is in fact happening.
Comment 2 Andreas J. Guelzow 2010-06-03 19:57:55 UTC
I gather you are talking about the regression tool (specifically the Multiple 2-variable regression), not the LINEST function.
Comment 3 Andreas J. Guelzow 2010-06-03 20:06:09 UTC
I don't know what the XL_toolbox is doing but in your plot you have plotted the independent variable (acidovorax) on the vertical axis and the dependent (biomass) on the horizontal. In that case your slope will of course look like the reciprocal of the slope calculated. Mathematically the slope should always be teh change in the dependent divided by the change in the independent.
Comment 4 Andreas J. Guelzow 2010-06-03 21:26:40 UTC
Rather than closing this bug I should change it into an enhancement request to provide the option to perform a multiple 2-variable regression with multiple dependent and a single independent variable.
Comment 5 Hoyt 2010-06-03 22:02:58 UTC
Thank you Andreas, 
That is exactly what I was going to request. 

For biologists, it is better to say I saw 6 bird per minute than 0.167 minutes per bird. 

I appreciate all your hard work !
Comment 6 Andreas J. Guelzow 2010-06-07 03:31:28 UTC
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.


Note that if you switch dependent and independent variables, then in general the slopes will not be the reciprocal, since the regression line will change due to the asymmetry of linear regression: lin regression minimizes the sum of the squares of the deviations of the dependent variable (ie. the vertical distance) not the sum of the squares of the euclidean distance.