Can generative AI figure out figurative language? The influence of idioms on essay scoring by ChatGPT, Gemini, and Deepseek

Oğuz, ENİS

doi:10.1016/j.asw.2025.100981

Can generative AI figure out figurative language? The influence of idioms on essay scoring by ChatGPT, Gemini, and Deepseek

Oğuz E.

ASSESSING WRITING, cilt.66, 2025 (SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 66
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.asw.2025.100981
Dergi Adı: ASSESSING WRITING
Derginin Tarandığı İndeksler: Social Sciences Citation Index (SSCI), Scopus, Academic Search Premier, EBSCO Education Source, Education Abstracts, Educational research abstracts (ERA), Humanities Abstracts, Linguistics & Language Behavior Abstracts, MLA - Modern Language Association Database
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The developments in Generative AI technologies have paved the way for numerous innovations in different fields. Recently, Generative AI has been proposed as a competitor to AES systems in evaluating student essays. Considering the potential limitations of AI in processing idioms, this study assessed the scoring performances of Generative AI models for essays with and without idioms by incorporating insights from Corpus Linguistics and Computational Linguistics. Two equal essay lists were created from 348 student essays taken from a corpus: one with multiple idioms present in each essay and another with no idioms. Three Generative AI models (ChatGPT, Gemini, and Deepseek) were asked to score all essays in both lists three times, using the same rubric used by human raters. The results revealed excellent consistency for all models without any detectable bias for any demographic group, but Gemini outperformed its competitors in interrater reliability with human raters. For essays with idioms, Gemini followed the most similar pattern to human raters. While models in the study demonstrated capability for a hybrid approach in formative assessments, Gemini was the best candidate due to its ability to handle figurative language and its potential for handling essay-scoring tasks in high-stakes exams in the future.