GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms


DAĞ O., KAŞIKCI ÇAVDAR M., İLK DAĞ Ö., Yesiltepe M.

MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, vol.61, no.1, pp.229-241, 2023 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 61 Issue: 1
  • Publication Date: 2023
  • Doi Number: 10.1007/s11517-022-02695-w
  • Journal Name: MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Applied Science & Technology Source, BIOSIS, Biotechnology Research Abstracts, Business Source Elite, Business Source Premier, CINAHL, Compendex, Computer & Applied Sciences, EMBASE, INSPEC, MEDLINE
  • Page Numbers: pp.229-241
  • Keywords: Machine learning, Genomics, Feature selection, RNA-seq data, Web tool, FALSE DISCOVERY RATE, ALZHEIMERS-DISEASE, R/BIOCONDUCTOR PACKAGE, CLASSIFICATION
  • Middle East Technical University Affiliated: Yes

Abstract

Selection of differentially expressed genes (DEGs) is a vital process to discover the causes of diseases. It has been shown that modelling of genomics data by considering relation among genes increases the predictive performance of methods compared to univariate analysis. However, there exist serious differences among most studies analyzing the same dataset for the reasons arising from the methods. Therefore, there is a strong need for easily accessible, user-friendly, and interactive tool to perform gene selection for RNA-seq data via machine learning algorithms simultaneously not to miss DEGs. We develop an open-source and freely available web-based tool for gene selection via machine learning algorithms that can deal with high performance computation. This tool includes six machine learning algorithms having different aspects. Moreover, the tool involves classical pre-processing steps; filtering, normalization, transformation, and univariate analysis. It also offers well-arranged graphical approaches; network plot, heatmap, venn diagram, and box-and-whisker plot. Gene ontology analysis is provided for both mRNA and miRNA DEGs. The implementation is carried out on Alzheimer RNA-seq data to demonstrate the use of this web-based tool. Eleven genes are suggested by at least two out of six methods. One of these genes, hsa-miR-148a-3p, might be considered as a new biomarker for Alzheimer's disease diagnosis. Kidney Chromophobe dataset is also analyzed to demonstrate the validity of GeneSelectML web tool on a different dataset. GeneSelectML is distinguished in that it simultaneously uses different machine learning algorithms for gene selection and can perform pre-processing, graphical representation, and gene ontology analyses on the same tool. This tool is freely available at www.softmed.hacettepe.edu.tr/ GeneSelectML.