GNOME Bugzilla – Bug 761107
percentrank : multiple errors in documentation
Last modified: 2018-05-22 14:23:23 UTC
These remarks apply to the web documentation and to the interactive gnumeric gui. The current web documentation is observed at: https://help.gnome.org/users/gnumeric/stable/CATEGORY_Statistics.html.en#gnumeric-function-PERCENTRANK Current: > If array contains no data points, this function returns a #NUM! error. Should be: >> If array contains no data points, this function returns a #N/A error. Current: > If significance is less than one, this function returns a #NUM! error. Should be: >> If significance is less than zero, this function returns a #NUM! error. Current: > If x exceeds the largest value or is less than the smallest value in array, this function returns a #NUM! error. Should be: >> If x exceeds the largest value or is less than the smallest value in array, this function returns a #N/A error. Current: > If x does not match any of the values in array or x matches more than once, this function interpolates the returned value. Should be: >> If x falls between two values in the array, the result is computed by linear interpolation as described below. >> If there is a tie, i.e. if x matches more than data point in the array, this function returns the lowest rank of any of the matched elements. ============ Should be added: >> In the array, blank cells and string-valued cells are ignored and do not count toward $N$, the number of "data points" >> The result is always in the range [0, 1]. >> If x matches the smallest element of the array, the result is 0. >> If x matches the largest element of the array and is not tied, the result is 1. >> Suppose some data point with value $V$ is part of an $M$-way tie, and suppose it corresponds to some percentrank result we denote $Rlow$. Then if x is infinitesimally larger than $V$, the result is $Rhigh = Rlow + (M-1)/(N-1)$, where $N$ is the number of valid data points in the array. This makes sense insofar as it is $1/(N-1)$ less than the result for the next-larger data point in the array, i.e. Rnext = Rlow + M/(N−1). If x falls in the interior of the open interval between $V$ and the next-larger data point, the result is computed by linear interpolation, starting from Rhigh and ending at Rnext. --------------------------- See also https://www.av8n.com/physics/spreadsheet-tips.htm#sec-percentrank
For the #NA vs #NUM case we should do whatever Excel does. Having a difference would be pointless. Obviously the code and the docs should agree.
"less than one" -> "less than zero" fixed.
ODF/OpenFormula requires the significance to be at least 1, so "less than one" is the correct description for ODF. Excel 2013 gives a #NUM error if the significance is 0 or 0.6. The result that Gnumeric gives for significance 0 is always 0 except for the largest value in the data set. So that does not seem to be reasonable. I think the "less than one" in the documentation used to be correct. The function should be fixed.
https://support.office.com/en-us/article/PERCENTRANK-function-f1b5836c-9619-4847-9fc9-080ec9024442 says... "If array is empty, PERCENTRANK returns the #NUM! error value." "If significance < 1, PERCENTRANK returns the #NUM! error value."
We are back to "less than one", but the code now matches that.
Empty dataset now returns #NUM per docs. Out-of-range still gives #NA.
I am equally happy with precision<0 or precision<1 ... the main point is that the code and the docs should agree. Choosing on the basis of XL compatibility is fine with me. I habitually set the precision to 20 anyway. Similarly I am equally happy with #NUM! or #N/A ... the main point is that the code and the docs should agree. Choosing on the basis of XL compatibility is fine with me.
(In reply to Morten Welinder from comment #6) > Empty dataset now returns #NUM per docs. Out-of-range still gives #NA. We could make that more explicit, to cover the case where the array has nonzero size but contains blanks and/or strings. I'm not sure the relevant definition of "empty" is obvious to the ordinary user. I suggest something like: >> If the array contains no valid numeric data, ......
I have fixed the docs re out-of-range to say N/A. That matches the code. There is, IMHO, no need to elaborate of the fact that strings are ignored. They are ignored for all numerical functions. I think we're left with what to do for the interpolation case.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnumeric/issues/296.