Çok değişkenli iki değerli uzunlamasına veri modelleri ve bu modellerin öngörü uygulamaları.


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Fen Edebiyat Fakültesi, İstatistik Bölümü, Türkiye

Tezin Onay Tarihi: 2012

Tezin Dili: İngilizce

Öğrenci: Özgür Asar

Danışman: ÖZLEM İLK DAĞ

Özet:

Longitudinal data arise when subjects are followed over time. This type of data is typically dependent, due to including repeated observations and this type of dependence is termed as within-subject dependence. Often the scientific interest is on multiple longitudinal measurements which introduce two additional types of associations, between-response and cross-response temporal dependencies. Only the statistical methods which take these association structures might yield reliable and valid statistical inferences. Although the methods for univariate longitudinal data have been mostly studied, multivariate longitudinal data still needs more work. In this thesis, although we mainly focus on multivariate longitudinal binary data models, we also consider other types of response families when necessary. We extend a work on multivariate marginal models, namely multivariate marginal models with response specific parameters (MMM1), and propose multivariate marginal models with shared regression parameters (MMM2). Both of these models are generalized estimating equation (GEE) based, and are valid for several response families such as Binomial, Gaussian, Poisson, and Gamma. Two different R packages, mmm and mmm2 are proposed to fit them, respectively. We further develop a marginalized multilevel model, namely probit normal marginalized transition random effects models (PNMTREM) for multivariate longitudinal binary response. By this model, implicit function theorem is introduced to explicitly link the levels of marginalized multilevel models with transition structures for the first time. An R package, bf pnmtrem is proposed to fit the model. PNMTREM is applied to data collected through Iowa Youth and Families Project (IYFP). Five different models, including univariate and multivariate ones, are considered to forecast multivariate longitudinal binary data. A comparative simulation study, which includes a model-independent data simulation process, is considered for this purpose. Forecasting independent variables are taken into account as well. To assess the forecasts, several accuracy measures, such as expected proportion of correct prediction (ePCP), area under the receiver operating characteristic (AUROC) curve, mean absolute scaled error (MASE) are considered. Mother's Stress and Children's Morbidity (MSCM) data are used to illustrate this comparison in real life. Results show that marginalized models yield better forecasting results compared to marginal models. Simulation results are in agreement with these results as well.