After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 134166 - statistical analysis frequency table desired
statistical analysis frequency table desired
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: Documentation
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: Andreas J. Guelzow
Jody Goldberg
Depends on:
Blocks:
 
 
Reported: 2004-02-12 00:32 UTC by garetholoughlin
Modified: 2008-10-12 23:49 UTC
See Also:
GNOME target: ---
GNOME version: 2.3/2.4



Description garetholoughlin 2004-02-12 00:32:07 UTC
Hi - I'm really happy with gnumeric so far.

I took my first crack at using the statistics features today and started
with the histogram.

A couple observations... not sure how stable this stuff is supposed to be.

#1) Manual setting of the bins doesn't seem to work - or at least it is not
obvious how to set up.

I have a simple column of 9 datapoints and can get it to work fine with
calculated bins:

	A
1	Result
2	1
3	2
4	3
5	1
6	1
7	2
8	2
9	2
10	2

I want to get a histogram of three bins:
I set the Input range to A2:A10
I set the bins to: Calculated, min=1, max=3, N=3
Output sheet is as follows:
Bin	Frequency
<1	0
1.66666666666667	3
2.33333333333333	5
3	1
>3	0

The frequency numbers are correct and the bins are logical:
1 to 1.666
1.666 to 2.333
2.333 to 3
With outliers also shown.

** So that works but if I use pre-assigned bins it doesn't.

I add the following column
	B
1	Bins
2	<1
3	1.666
4	2.333
5	3
6       >3

These are the same as what the calculated attempt used.
Then I set go back in use the same input data and select pre-assigned bins
with range B2:B6.

However the output sheet is empty.  Nothing.  I tried B3:B5 too in case the
< or > were causing problems but same thing - no output.

How should this be specified to get the desired output?


#2) What I'd really like to do is create a histogram for text entries.  Is
this possible as well?  e.g. 
Say a data set like:
Bill
Bill
Bob
Bob
Chris
Paul

Can I set bins up a) on explicit names? b) using regular expressions like
B* etc.?

thx!
Comment 1 Andreas J. Guelzow 2004-02-12 02:09:05 UTC
#2: unfortunately you can't do that yet. The values have to be
numerical and the preassigned bins are given by the cutoffs.

#1: to get a table like the calculated bins your bin values should be:
1
1.66666
2.33333
3

this will yield 5 intervals. This should work.

Which exact version of gnumeric are you using?
Comment 2 garetholoughlin 2004-02-13 01:45:45 UTC
Yes that does yield output and this is good enough for me that I can
set up bins and get the answers I want.  However the actual results
are different.  Note that the first bin is <=1 rather than <1 so
distribution is 3,0,5,1,0 - not 0,3,5,1,0.  I don't really care as
long as I can predict what it'll do - but you probably do want it to
be consistent.  On the upper bound I don't care if it says More or >3
-- all the same to me.

Bin	Frequency
1	3
1.6666	0
2.3333	5
3	1
More	0

So case closed.

Pity about #2 not being there - that would be very nice.

thx!
Comment 3 Andreas J. Guelzow 2004-02-13 19:45:20 UTC
I guess we should check the documentaion and see that this gets
correctly documented.

There is a point to the slightly differnet behaviour: the calculated
bis assume a finite interval and have two overflow/underflow bins,
while the predetermined bins act as cutoffs with the same behaviour
for each.

Ideally of course this can be specified by the user.
Comment 4 Andreas J. Guelzow 2008-09-16 05:55:35 UTC
I have just made some changes to the histogram tool. I hope the cutoffs/bins make more sense now.

I am leaving this report open with a new subject line to remind us of the need to support #2.
Comment 5 Andreas J. Guelzow 2008-10-12 23:49:45 UTC
I have written a new tool to handle #2. Adding it to the histogram tool would have create a rather complicated looking dialog with many items not applicable for the current situation.

This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.