Bug 761107 – percentrank : multiple errors in documentation

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 761107 - percentrank : multiple errors in documentation


Summary:	percentrank : multiple errors in documentation


Status:	RESOLVED OBSOLETE

Product:	Gnumeric
Classification:	Applications
Component:	Documentation
Version:	1.12.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Andreas J. Guelzow
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-01-26 00:05 UTC by John Denker
Modified:	2018-05-22 14:23 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description John Denker 2016-01-26 00:05:20 UTC

These remarks apply to the web documentation and to the interactive
gnumeric gui.

The current web documentation is observed at:
https://help.gnome.org/users/gnumeric/stable/CATEGORY_Statistics.html.en#gnumeric-function-PERCENTRANK

Current:  
> If array contains no data points, this function returns a #NUM! error.

Should be:
>> If array contains no data points, this function returns a #N/A error.

Current:
> If significance is less than one, this function returns a #NUM! error.

Should be:
>> If significance is less than zero, this function returns a #NUM! error.

Current:
> If x exceeds the largest value or is less than the smallest value in array, this function returns a #NUM! error.

Should be:
>> If x exceeds the largest value or is less than the smallest value in array, this function returns a #N/A error.

Current:
> If x does not match any of the values in array or x matches more than once, this function interpolates the returned value.

Should be:
>> If x falls between two values in the array, the result is computed by linear interpolation as described below.

>> If there is a tie, i.e. if x matches more than data point in the array, this function returns the lowest rank of any of the matched elements.

============

Should be added:
>> In the array, blank cells and string-valued cells are ignored and do not count toward $N$, the number of "data points"

>> The result is always in the range [0, 1].
>> If x matches the smallest element of the array, the result is 0.
>> If x matches the largest element of the array and is not tied, the result is 1.

>> Suppose some data point with value $V$ is part of an $M$-way tie,
 and suppose it corresponds to some percentrank result we denote $Rlow$.
 Then if x is infinitesimally larger than $V$, the result is
 $Rhigh = Rlow + (M-1)/(N-1)$,
 where $N$ is the number of valid data points in the array.

 This makes sense insofar as it is $1/(N-1)$ less than the result for
 the next-larger data point in the array, i.e. Rnext = Rlow + M/(N−1).

 If x falls in the interior of the open interval between $V$ and the 
 next-larger data point, the result is computed by linear interpolation,
 starting from Rhigh and ending at Rnext.

---------------------------
See also
  https://www.av8n.com/physics/spreadsheet-tips.htm#sec-percentrank

Comment 1 Morten Welinder 2016-01-28 15:15:21 UTC

For the #NA vs #NUM case we should do whatever Excel does.  Having a
difference would be pointless.  Obviously the code and the docs should
agree.

Comment 2 Morten Welinder 2016-01-28 15:19:32 UTC

"less than one" -> "less than zero" fixed.

Comment 3 Andreas J. Guelzow 2016-01-28 16:48:32 UTC

ODF/OpenFormula requires the significance to be at least 1, so "less than one" is the correct description for ODF.

Excel 2013 gives a #NUM error if the significance is 0 or 0.6.

The result that Gnumeric gives for significance 0 is always 0 except for the largest value in the data set. So that does not seem to be reasonable.

I think the "less than one" in the documentation used to be correct. The function should be fixed.

Comment 4 Morten Welinder 2016-01-28 17:09:58 UTC

https://support.office.com/en-us/article/PERCENTRANK-function-f1b5836c-9619-4847-9fc9-080ec9024442 says...

"If array is empty, PERCENTRANK returns the #NUM! error value."
"If significance < 1, PERCENTRANK returns the #NUM! error value."

Comment 5 Morten Welinder 2016-01-28 17:13:04 UTC

We are back to "less than one", but the code now matches that.

Comment 6 Morten Welinder 2016-01-28 17:22:46 UTC

Empty dataset now returns #NUM per docs.  Out-of-range still gives #NA.

Comment 7 John Denker 2016-01-28 17:45:44 UTC

I am equally happy with precision<0 or precision<1 ... the main point is that the code and the docs should agree.  Choosing on the basis of XL compatibility is fine with me.  I habitually set the precision to 20 anyway.

Similarly I am equally happy with #NUM! or #N/A ... the main point is that the code and the docs should agree.  Choosing on the basis of XL compatibility is fine with me.

Comment 8 John Denker 2016-01-28 17:48:55 UTC

(In reply to Morten Welinder from comment #6)
> Empty dataset now returns #NUM per docs.  Out-of-range still gives #NA.

We could make that more explicit, to cover the case where the array has nonzero size but contains blanks and/or strings.  I'm not sure the relevant definition of "empty" is obvious to the ordinary user.  I suggest something like:

>> If the array contains no valid numeric data, ......

Comment 9 Morten Welinder 2016-01-30 18:11:49 UTC

I have fixed the docs re out-of-range to say N/A.  That matches the code.

There is, IMHO, no need to elaborate of the fact that strings are ignored.
They are ignored for all numerical functions.

I think we're left with what to do for the interpolation case.

Comment 10 GNOME Infrastructure Team 2018-05-22 14:23:23 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnumeric/issues/296.