GNOME Bugzilla – Bug 647247
SSMEDIAN documentation is insufficient
Last modified: 2011-04-09 17:17:32 UTC
The documentation for the SSMEDIAN function is insufficient to explain how it differs from MEDIAN. The manual for 1.10 currently has the description: "The data points given in array are assumed to be the result of grouping data into intervals of length interval" My suggestion is: The data points given in array are treated as the midpoints of grouped data, each group being of width interval. E.g. with interval=1, a data point of 3 represents a value between 2.5 and 3.5. The median is then calculated by interpolation into the middle group, assuming that the true values within that group are distributed uniformly. The result is calculated by the equivalent to: median = L + i*(n/2 - CF)/f where: L = the lower limit of the median group i = interval (the width of the group) n = the total number of data points CF = the number of data points that occur below the median group f = the frequency of the median group
The description of the each function is supposed to be brief. I don't think your suggestion qualifies. If we expand the description of this function to this extend should we extend the descriptions of all other functions similarly? The suggestion also does not match the names used for the arguments. I am not sure what "The result is calculated by the equivalent to" is supposed to mean.
I have committed: static GnmFuncHelp const help_ssmedian[] = { { GNM_FUNC_HELP_NAME, F_("SSMEDIAN:median for grouped data")}, { GNM_FUNC_HELP_ARG, F_("array:data set")}, { GNM_FUNC_HELP_ARG, F_("interval:length of each grouping interval, defaults to 1")}, { GNM_FUNC_HELP_DESCRIPTION, F_("The data are assumed to be grouped into intervals of width @{interval}. " "Each data point in @{array} is the midpoint of the interval containing the true value. " "The median is calculated by interpolation within the median interval " "(the interval containing the median value), " "assuming that the true values within that interval are distributed uniformly:\n" "median = L + @{interval}*(N/2 - CF)/F\n" "where:\n" "L = the lower limit of the median interval\n" "N = the total number of data points\n" "CF = the number of data points below the median interval\n" "F = the number of data points in the median interval") }, { GNM_FUNC_HELP_NOTE, F_("If @{array} is empty, this function returns a #NUM! error.") }, { GNM_FUNC_HELP_NOTE, F_("If @{interval} <= 0, this function returns a #NUM! error. " "SSMEDIAN does not check whether the data points are " "at least @{interval} apart.") }, { GNM_FUNC_HELP_EXAMPLES, "=SSMEDIAN(ARRAY(7,7,8,9), 1)" }, { GNM_FUNC_HELP_EXAMPLES, "=SSMEDIAN(ARRAY(7,7,8,8,9), 1)" }, { GNM_FUNC_HELP_EXAMPLES, "=SSMEDIAN(ARRAY(7,7,8,8,8,9), 1)" }, { GNM_FUNC_HELP_SEEALSO, "MEDIAN"}, { GNM_FUNC_HELP_END } }; This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.