Comparison of the fit of Cox Regression and Random Survival Forest model with the help of simulation data

Paksoy T., Yavuz İ.

12th International Statistics Days Conference (ISDC 2022), İzmir, Turkey, 13 - 16 October 2022, pp.1

  • Publication Type: Conference Paper / Summary Text
  • City: İzmir
  • Country: Turkey
  • Page Numbers: pp.1
  • Middle East Technical University Affiliated: Yes


Survival analysis, which is a widely used and important subject in applied statistics, is related

to the analysis of the time until a certain event occurs. An important feature that distinguishes

this analysis from other statistical analyzes is that it contains censored observations. Censored

data is frequently observed in real data sets and for this reason artificial data produced with

the help of simulation for survival analysis generally involves censoring. Various simulation

scenarios have been tried in the literature to generate artificial data for survival analysis. The

most commonly used approaches either assume a cox proportional hazards model [1] or

parametric distributions. When data is produced assuming parametric distributions, the most

important assumption of the Cox regression is broken. This assumption is related to the fact

that the shape of the baseline hazard function should not conform to a particular distribution.

Therefore, Kropko et al. proposed a method to generate survival data via simulation which

they provided in the “coxed” package in R [4]. The sim.survdata function [3] in this package

uses cubic splines for the baseline hazard function and the data was produced in accordance

with the proportional hazards assumption. In this study, survival data were obtained by

using the sim.survdata function in the “coxed” package. Different simulation scenarios were

employed in the study. Different sample sizes and censorship rates were used in the

simulations, and trials where the proportional hazards assumption was not provided were

also included. The aim of the study is to compare the concordance values obtained by using

various Random Survival Forest models (RSF) [2] and Cox regression. It was observed that

the Cox regression fits the model better than RSF if the Cox proportional hazards assumption

is met however the performance of RSF improves as the sample size increases and it appears

to behave more robust to higher levels of censorship.

Keywords: sim.survdata, cox regression, random survival forest