AI-Assisted Arbitrator Selection in Construction Disputes: An Expert-Calibrated Large Language Model Framework


Mobadersani M., Candas A. B., Kuruoğlu M., Tokdemir O. B.

Buildings, vol.16, no.1, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Publication Type: Article / Article
  • Volume: 16 Issue: 1
  • Publication Date: 2026
  • Doi Number: 10.3390/buildings16010120
  • Journal Name: Buildings
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Avery, Compendex, INSPEC, Directory of Open Access Journals
  • Keywords: arbitrator selection, construction arbitration, human resources, large language model, prompt engineering
  • Middle East Technical University Affiliated: Yes

Abstract

Arbitration efficiency is widely recognized as a factor influencing outcomes in construction disputes. To increase the chance of finding and designating the best-fit arbitrator, a large number of candidate profiles must be investigated, which is an overwhelming, time-consuming process. This study develops and evaluates a large language model (LLM)- enabled framework for arbitrator selection based on dispute details and predefined expert criteria. To reach this goal, 500 standardized, anonymized arbitrator resumes were evaluated using a unified scoring structure. These resumes were scored and classified using two GPT-5 models with different levels of detail in their prompts. The results of these models were then compared with expert evaluations to assess their ability to replicate human decision-making patterns in resume evaluation and classification. According to the results, the second model, with a high level of detail in its prompt structure, achieved an accuracy of 84%, while the first model, with a concise prompt that provides only a brief description of the experts’ expectations, achieved an overall accuracy of 53%. As can be concluded, the accuracy of the LLM-assisted resume analysis framework improves when guided by a detailed, expert-aligned prompt structure. From a research perspective, this study’s results highlight the importance of prompt engineering in an AI-assisted decision-support system for professional evaluation tasks. Since this framework is limited to resumes in English, future research should examine the effectiveness of LLMs in evaluating and classifying resumes in languages other than English. Moreover, future studies might consider replicating this study using other large language models to compare precision and accuracy across different LLMs.