DTCard: A Framework for Decision Transformers in Card Games


Demirdover B. K., ALPASLAN F. N., Tan M.

Applied Sciences (Switzerland), cilt.16, sa.7, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 7
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/app16073117
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: decision transformer, machine learning, reinforcement learning
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Decision Transformers (DTs) reformulate reinforcement learning as a conditional sequence modeling problem and have demonstrated competitive performance in offline Reinforcement Learning (RL) scenarios. However, their behavior in card games, specifically partially observable imperfect-information, trick-taking games remains underexplored. In parallel, general-purpose card-game toolkits have shown the value of unified environments and standardized evaluation protocols for accelerating research in imperfect-information games. Motivated by the goal of creating a general card-game-playing framework, we present a unified RL pipeline for trick-taking card games using DTs. While classical learning methods have demonstrated strong performance in card games, transformer-based reinforcement learning remains comparatively underexplored in this domain. This paper studies the applicability of DTs to the core play-phase of trick-taking games and evaluates whether a single, reusable pipeline can be transferred across multiple games in this class with minimal game-specific engineering. We propose a unified framework integrating offline pretraining, online selective expert iteration, and inference-time legal-action filtering. Crucially, our proposed approach demonstrates two key advantages over standard implementations. First, the model successfully internalizes complex game rules (e.g., follow-suit constraints) implicitly from the empirical data distribution, completely eliminating the need for explicit action masking during training. Second, we introduce a selective expert iteration mechanism equipped with strict acceptance filtering, which effectively prevents distribution collapse and enables safe, monotonic offline-to-online policy refinement. Ultimately, we show that this single, reusable transformer-based pipeline achieves competitive performance across multiple trick-taking domains (Hearts, Whist, and Spades) with minimal game-specific engineering.