Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids

Tanrıverdi, İREM; İlk Dağ, ÖZLEM; Gürkan, MEHMET

doi:10.1016/j.ascom.2024.100818

Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids

Tanrıverdi İ., İlk Dağ Ö., Gürkan M. A.

Astronomy and Computing, cilt.47, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 47
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.ascom.2024.100818
Dergi Adı: Astronomy and Computing
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
Anahtar Kelimeler: Asteroids, Generalized linear mixed model trees, GPBoost, Historical random forest, Marginal model with inverse intensity weighting, Mixed-effect models
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Observations of the asteroids have been performed as long as it has been feasible by the available observational equipment. Recorded data, going back to 18th century, allowed a classification of these celestial objects’ hazardous status. Unfortunately, previous studies used methods that ignore subject dependency in Near-Earth Asteroids (NEA) data. This study aims to perform hazard classification of asteroids by proposing various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the NASA. We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. To the best of our knowledge and based on a comprehensive review of the existing literature, our study stands as the pioneering in the utilization of these advanced methodologies for the in-depth analysis of Near-Earth Asteroid (NEA) data. According to the findings, the accuracies of the models range from 0.89 to 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest one.