GNOME Bugzilla – Bug 620505
Tool for multiple 2-variable linear regression with multiple dependent variables
Last modified: 2010-06-07 03:31:28 UTC
Created attachment 162680 [details] Gnumeric spreadsheet demonstrating different slope calculations The new multiple linear regression function (THANKS! I LOVE IT!) correctly gives F and F-significance, but the slopes are reported using LINEST with the independent variable first, then the dependent variable. When using LINEST, the slope results are more intuitive with the charting functions if the DV listed, followed by the IV. Example: =linest($B$22:$B$26,C22:C26,,FALSE) should be =linest(C22:C26,$B$22:$B$26,,FALSE) I have attached a spreadsheet showing the difference between these formulas using a small dataset, and how the latter arrangement more closely matches the charts generated by Gnumeric (and Excel). Other statistical software packages also return slopes using the latter method (one is shown in the attached spreadsheet). -pete
ODF, Excel and Gnumeric should (and used to) agree on the argument and results order: LINEST(dependent,independents) and the first row returns: slope for last independent, slope for second last indep., etc, intercept [There is no slope for the dependent variable so I don't know what you mean with that.) I have just checked Gnumeric and that is what is in fact happening.
I gather you are talking about the regression tool (specifically the Multiple 2-variable regression), not the LINEST function.
I don't know what the XL_toolbox is doing but in your plot you have plotted the independent variable (acidovorax) on the vertical axis and the dependent (biomass) on the horizontal. In that case your slope will of course look like the reciprocal of the slope calculated. Mathematically the slope should always be teh change in the dependent divided by the change in the independent.
Rather than closing this bug I should change it into an enhancement request to provide the option to perform a multiple 2-variable regression with multiple dependent and a single independent variable.
Thank you Andreas, That is exactly what I was going to request. For biologists, it is better to say I saw 6 bird per minute than 0.167 minutes per bird. I appreciate all your hard work !
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report. Note that if you switch dependent and independent variables, then in general the slopes will not be the reciprocal, since the regression line will change due to the asymmetry of linear regression: lin regression minimizes the sum of the squares of the deviations of the dependent variable (ie. the vertical distance) not the sum of the squares of the euclidean distance.