After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 500168 - feature request: probability plot
feature request: probability plot
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: Charting
1.7.x
Other All
: Normal enhancement
: ---
Assigned To: Jean Bréfort
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2007-11-28 11:03 UTC by Samuel Verstraete
Modified: 2008-09-12 20:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
proposed API for probablility distributions in goffice (2.69 KB, text/x-chdr)
2007-12-07 16:15 UTC, Jean Bréfort
  Details
the correct version. (2.75 KB, text/x-chdr)
2007-12-07 16:20 UTC, Jean Bréfort
  Details
implements normal probability plots (73.85 KB, patch)
2007-12-23 17:18 UTC, Jean Bréfort
none Details | Review
adds support for normal, lognormal, weibull, cauchy and uniform distributions (120.57 KB, patch)
2008-09-10 14:26 UTC, Jean Bréfort
none Details | Review

Description Samuel Verstraete 2007-11-28 11:03:57 UTC
If possible add the possibility to calculate a probability plot:
http://www.itl.nist.gov/div898/handbook/eda/section3/normprpl.htm
Comment 1 Samuel Verstraete 2007-11-28 11:05:04 UTC
this is a fairly basic statistic that can help judging if a distribution is normal or not (only indicative)
Comment 2 Andreas J. Guelzow 2007-11-28 16:43:30 UTC
This really shouldn't be that difficult to create already with current charting possibilities. 
Comment 3 samverstraete 2007-11-30 12:37:11 UTC
it would be easy to do if it would be possible to put the Y-axis in "normal probability scale" (contrary to log or arithmetic scale). Afaik gnumeric can't do that so i think it's currently not possible...

Basically a probability plot is a cumulative histogram with the Y axis ins "normal probability scale"
gr,S.
Comment 4 Jean Bréfort 2007-11-30 13:55:07 UTC
Which data do you want to provide to the graph: raw data or the frequencies (as for the current histogram plot)?
Comment 5 Andreas J. Guelzow 2007-11-30 16:22:50 UTC
jean: a normal probability plot has to use the raw data, you can't create it from frequencies.

sam: what do you mean with "normal probability scale"? The x-axis should be a simple linear scale. (Of course there is some transformation happening to the data before it is plotted.)

Comment 6 Jean Bréfort 2007-11-30 16:41:52 UTC
Thanks Andrea, but I don't understand why it is impossible to use frequencies. I probably missed something in the way such plots might be produced. Didn't find anything in my statistics books, unfortunately, and I'm all but an expert in the domain.
Comment 7 Jean Bréfort 2007-11-30 16:52:29 UTC
Ah, more googling made me understand. No problem, should be very easy to implement.
Comment 8 Andreas J. Guelzow 2007-11-30 17:59:00 UTC
Jean: If you have any questions regarding the plot please let me know. (We do teach how to construct normal probability plots in our first year stats course, so it is really very simple.)

You probably figured out that you have a dot for each data point so you need the raw data.

It would be nice to be able to plot not only normal probability plots but also such probability plots for other distributions.  
Comment 9 Jean Bréfort 2007-11-30 19:42:29 UTC
Of course we need other distributions. I'll implement GODist or so to support that. I'll probably have questions but I'm able to implement at least the normal distribution.
Comment 10 Jean Bréfort 2007-12-07 16:15:57 UTC
Created attachment 100537 [details]
proposed API for probablility distributions in goffice

Some comments:
- The functions do not use the location and scale parameters, so the necessary transformations must be done on the input.
- The list of distributions supported will be much larger.
- The shape properties will be passed as object properties, a list of the appropriate property names will appear in the documentation (may be this is not appropritae if we want to be able to use long double for properties).
- GODistribution will be an abstract base class.

Andreas, any comment?
Comment 11 Jean Bréfort 2007-12-07 16:20:07 UTC
Created attachment 100538 [details]
the correct version.
Comment 12 Andreas J. Guelzow 2007-12-09 19:40:37 UTC
Without more details, it is difficult to comment.

Obviously there will need to be some way to specify parameters for the specific distributions. There is no easy way of adjusting densities for changed degree of freedom after the fact.

The naming scheme seems to assume continuous distributions. Is it supposed to also handle discrete distributions? (For discrete distributions there are no densities but mass functions. 
Comment 13 Jean Bréfort 2007-12-10 07:07:47 UTC
Hmm, I don't even know which nethods should be implemented for discrete distributions. May be we need a separate structure for them.
Comment 14 Jean Bréfort 2007-12-23 17:18:42 UTC
Created attachment 101511 [details] [review]
implements normal probability plots

This is not the final version. Anyway it can't be commited before goffice is branched. Things still to do are (I might forget, or just ignore, some):
- finish editing tools/import-R for new distributions, and to remove all gnumeric specific code;
- add new distributions;
- support discrete distributions;
- add a user interface to change the distribution type (and parameters when relevant).

Waiting for feedback.
Comment 15 Morten Welinder 2007-12-24 20:33:57 UTC
I am not comfortable with just throwing mathfunc.c into goffice.

We seem to need simplified versions of dnorm [trivial], pnorm [not hard],
and qnorm [not hard].

> (For discrete distributions there are no densities but mass functions.)

Just a matter of what measure we are implying.
Comment 16 Jean Bréfort 2007-12-27 20:08:40 UTC
dnorm, pnorm and qnorm are not the only one needed. The idea is to support as many probablilty distributions as possible. The easiest way for me is to import the code from the R project, not to rewrite everything. If some code in gnumeric is useful for graphs in goffice, why should it be not copied?
Comment 17 Jean Bréfort 2008-09-10 14:26:38 UTC
Created attachment 118432 [details] [review]
adds support for normal, lognormal, weibull, cauchy and uniform distributions

Seems I just need to add more distribution types (all those supported in gnumeric, ideally). Wainting for comments, but I feel I might commit this patch if nobody argues against.
Comment 18 Andreas J. Guelzow 2008-09-11 21:43:21 UTC
It's not as simple as it should be to try out this patch. There seem to be a few images missing!
Comment 19 Andreas J. Guelzow 2008-09-11 22:03:28 UTC
This looks good to me, but I suggest you include chart_prob_1_1.png and probability.xpm when committing.

Thanks!
Comment 20 Jean Bréfort 2008-09-12 05:03:34 UTC
I'll include the images, but probably group all distribution related plots (box-plots, histograms and probability plots for now) in one family.
Comment 21 Jean Bréfort 2008-09-12 20:41:30 UTC
Patch commited. I consider this bug as fixed although only a few distributions are supported at the moment.