GNOME Bugzilla – Bug 558582
Confidence Intervals in Kaplan Meier Tool
Last modified: 2018-05-22 13:30:22 UTC
The recent inclusion of the Kaplan Meier plugin in the Gnumeric 1.93 release is a big step forward in the routine use of Gnumeric in the medical proffession looking for an easy way to do Survival Analysis. Here are some suggestions to improve the Analysis. 1. Include other numbers that can be used as censors in addition to 0 and 1. 2. Comparison of two Kaplan Meier curves - example senario - I want to analyse the difference in the survival between the two groups of patients one treated with medicine A and one treated with Medicine B and I want to know any significant differences between the two. Using the log rank test we can find out the difference in the mean survival estimates and the statistical significance. The wikipedia article which links to log rank test is here : http://en.wikipedia.org/wiki/Logrank_test 3. Another feature I would like to see is the mean and median survivals with the 95% confidence intervals. Here is an article that describes the KM test procedure in SPSS in details http://faculty.chass.ncsu.edu/garson/PA765/kaplanmeier.htm
I am attaching an example HTML output for a set of data analyzed in SPSS which shows a comparative analysis for 3 categories.
Created attachment 121684 [details] HTML output from SPSS HTML output from the SPSS program showing the important additions to the test and results
Created attachment 121685 [details] Hazards plot
Created attachment 121686 [details] KM curves of three categories superimposed
thanks for the suggestions I hope when we impement your suggestion we can avoid the colour problems in teh survival function you attached where the 3-censored marks have the colour of the 2-curve!
Just a correction in the title since this is not a plugin.
Santam, regarding your item (1) of including other numbers to be used as censors: which numbers did you have in mind?
Other Numbers means that in the present dialog box we have the option of using 0 or 1 as censored values. But sometimes a person may code the status (eg surviving or not) using values other than 0 or 1, say I use 2 as not surviving and 3 as surviving. SPSS also allows the use of a range of numbers which is useful when you have outcomes that are not binary - for example a symptom may be absent, mild, moderate or severe. So if you want you can find out the actuarial duration of persistence of symptoms which were moderate to severe if you can use a range of numbers as a censoring variable.
I have just committed changes that allow the censor marks to be a consecutive range of integers. So if absent, mild, moderate or severe is coded as 0,1,2,3 you could use the range 0 to 1 as censor marks and the remainder as deaths. This should handle request (1).
I have just committed changes that allow multiple groups to be handled simultaneously. This is part of request (2). From here the log-rank test should be straight forward.
regarding comment (3): the given reference describes the SPSS output without explaining the meaning of those terms. So for implementeation of the mean and median survival times this is quite useless.
I have committed an optional log-rank test to the kaplan meier tool. This completes request (2). This leaves output of the mean and of confidence intervals for both the mean and median.
Sorry for the delay in getting back. This pertains to the feature request in the Mean and Median survival times section. I am taking this data from a Stastic Book " A Foundation for Analysis in health Sciences " - Wayne W. Daniel. Median Survival: It is the time at which the survival probablity is equal to 0.5. If the output doesnot have the value 0.5 per se, then the median survival is the time interval after which the survival drops below 0.5. Example: If the survival probablity is 0.61 at 13 months and 0.35 at 14 months then the median survival is 14 months. N.B. If the survival doesnot drop below 0.5 then the Median survival is not reached for the population. Mean Survival: This calculated by finding the mean of the survival time in months. I am attaching a Gnumeric sheet which demonstrates this
Created attachment 122778 [details] Demo of Mean and Median Survival Here is a demo Gnumeric sheet showing the median and the mean survival estimates calculated with the KM plugin
In addition another enhancement that can be added is the hazard rate which is simply the quotient of the number of deaths / total survival times which is calculated by adding up the survival times as done when calculated for the mean.
I found a webpage that can give the KM curve 95% confidence interval formula http://www.hutchon.net/Kaplan-Meier.htm
Thanks for the information. The median is alrady being calculated. Most sites I have found differentiate between the mean of the survival times (that you mention above) and the mean survival time. Whenever they give examples, those two values differ.
In you example you are also including censured events in your mean survival time. This does not make sense since those events haven't happen.
ya that is the true about the mean. The example of the mean I have taken is from the stats book I have mentioned. As far as the meaning is concerned mean is a very bad measure to be used for survival as it is influenced by the extremes. But sometimes when the median is not reached it is the only measure available.
I think I will have to dig into some journal articles. There has tp be some better way. (And it may become useful that I am a mathematician.)
I have a couple of PDFs you may want to look into - books rather
perhaps yu can send the PDFs to me: aguelzow@pyrshep.ca
sent please check spam if not found in inbox
Thanks I have received them. I will have a look at them.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnumeric/issues/109.