1 min read

生存模型 Survival Analysis 介绍

本文于2020-10-10更新。 如发现问题或者有建议,欢迎提交 Issue

相关笔记可见 github

The Kaplan-Meier estimator, independently described by Edward Kaplan and Paul Meier and conjointly published in 1958 in the Journal of the American Statistical Association, is a non-parametric statistic that allows us to estimate the survival function. (Schuette 2018)

Remember that a non-parametric statistic is not based on the assumption of an underlying probability distribution, which makes sense since survival data has a skewed distribution. (Schuette 2018)

Kaplan-Meier 估计是非参数的,因此主要用户描述性统计,主要见 github

Schuette (2018) 给出了相关教程但是我觉得github对初学者更友好,直观地展现了生存模型。

  • 可练习的数据来自 Edmonson et al. (1979) ,R中可以直接调用survival::ovarian

这个模型类似于互联网运营的漏斗模型和转化率相关的模型。

\[\begin{alignat}{2} S(t)&=p_{1} \cdot \dots \cdot p_{t}\\ &=\prod_{i=1}^t p_i\\ \end{alignat}\]

  • \(S(t)\)衡量在\(t\)时刻,总样本存活率、转化率。
  • \(p(t)\)衡量在\(t\)时刻,总样本在该时刻的存活率、转化率。

且满足关系

\[S(t) = S(t-1) \cdot p_{t}\]

Edmonson, J. H., T. R. Fleming, D. G. Decker, G. D. Malkasian, E. O. Jorgensen, J. A. Jefferies, M. J. Webb, and L. K. Kvols. 1979. “Different Chemotherapeutic Sensitivities and Host Factors Affecting Prognosis in Advanced Ovarian Carcinoma Versus Minimal Residual Disease.” Cancer Treatment Reports 63 (2): 241.

Schuette, Daniel. 2018. “Survival Analysis in R for Beginners.” 2018. https://www.datacamp.com/community/tutorials/survival-analysis-R.