12th International Statistics Days Conference (ISDC 2022), İzmir, Türkiye, 13 - 16 Ekim 2022, ss.1
Survival analysis, which is a widely used and important subject in applied statistics, is related
to the analysis of the time until a certain event occurs. An important feature that distinguishes
this analysis from other statistical analyzes is that it contains censored observations. Censored
data is frequently observed in real data sets and for this reason artificial data produced with
the help of simulation for survival analysis generally involves censoring. Various simulation
scenarios have been tried in the literature to generate artificial data for survival analysis. The
most commonly used approaches either assume a cox proportional hazards model [1] or
parametric distributions. When data is produced assuming parametric distributions, the most
important assumption of the Cox regression is broken. This assumption is related to the fact
that the shape of the baseline hazard function should not conform to a particular distribution.
Therefore, Kropko et al. proposed a method to generate survival data via simulation which
they provided in the “coxed” package in R [4]. The sim.survdata function [3] in this package
uses cubic splines for the baseline hazard function and the data was produced in accordance
with the proportional hazards assumption. In this study, survival data were obtained by
using the sim.survdata function in the “coxed” package. Different simulation scenarios were
employed in the study. Different sample sizes and censorship rates were used in the
simulations, and trials where the proportional hazards assumption was not provided were
also included. The aim of the study is to compare the concordance values obtained by using
various Random Survival Forest models (RSF) [2] and Cox regression. It was observed that
the Cox regression fits the model better than RSF if the Cox proportional hazards assumption
is met however the performance of RSF improves as the sample size increases and it appears
to behave more robust to higher levels of censorship.
Keywords: sim.survdata, cox regression, random survival forest