Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids


TANRIVERDİ İ., İLK DAĞ Ö., GÜRKAN M. A.

Astronomy and Computing, vol.47, 2024 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 47
  • Publication Date: 2024
  • Doi Number: 10.1016/j.ascom.2024.100818
  • Journal Name: Astronomy and Computing
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Keywords: Asteroids, Generalized linear mixed model trees, GPBoost, Historical random forest, Marginal model with inverse intensity weighting, Mixed-effect models
  • Middle East Technical University Affiliated: Yes

Abstract

Observations of the asteroids have been performed as long as it has been feasible by the available observational equipment. Recorded data, going back to 18th century, allowed a classification of these celestial objects’ hazardous status. Unfortunately, previous studies used methods that ignore subject dependency in Near-Earth Asteroids (NEA) data. This study aims to perform hazard classification of asteroids by proposing various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the NASA. We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. To the best of our knowledge and based on a comprehensive review of the existing literature, our study stands as the pioneering in the utilization of these advanced methodologies for the in-depth analysis of Near-Earth Asteroid (NEA) data. According to the findings, the accuracies of the models range from 0.89 to 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest one.