Improving the Computer-Aided Estimation of Ulcerative Colitis Severity According to Mayo Endoscopic Score by Using Regression-Based Deep Learning

Polat G., Kani H. T., Ergenc I., Alahdab Y. O., Temizel A., Atug O.

INFLAMMATORY BOWEL DISEASES, vol.29, pp.1431-1439, 2023 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 29
  • Publication Date: 2023
  • Doi Number: 10.1093/ibd/izac226
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, MEDLINE
  • Page Numbers: pp.1431-1439
  • Keywords: colonoscopy, computer-assisted diagnosis, deep learning, inflammatory bowel diseases, Mayo score, ulcerative colitis, CLASSIFICATION, DISEASE
  • Middle East Technical University Affiliated: Yes


Background Assessment of endoscopic activity in ulcerative colitis (UC) is important for treatment decisions and monitoring disease progress. However, substantial inter- and intraobserver variability in grading impairs the assessment. Our aim was to develop a computer-aided diagnosis system using deep learning to reduce subjectivity and improve the reliability of the assessment. Methods The cohort comprises 11 276 images from 564 patients who underwent colonoscopy for UC. We propose a regression-based deep learning approach for the endoscopic evaluation of UC according to the Mayo endoscopic score (MES). Five state-of-the-art convolutional neural network (CNN) architectures were used for the performance measurements and comparisons. Ten-fold cross-validation was used to train the models and objectively benchmark them. Model performances were assessed using quadratic weighted kappa and macro F1 scores for full Mayo score classification and kappa statistics and F1 score for remission classification. Results Five classification-based CNNs used in the study were in excellent agreement with the expert annotations for all Mayo subscores and remission classification according to the kappa statistics. When the proposed regression-based approach was used, (1) the performance of most of the models statistically significantly increased and (2) the same model trained on different cross-validation folds produced more robust results on the test set in terms of deviation between different folds. Conclusions Comprehensive experimental evaluations show that commonly used classification-based CNN architectures have successful performance in evaluating endoscopic disease activity of UC. Integration of domain knowledge into these architectures further increases performance and robustness, accelerating their translation into clinical use.