TinyRS-R1: Compact Vision Language Model for Remote Sensing


Köksal A., Alatan A. A.

IEEE Geoscience and Remote Sensing Letters, cilt.22, 2025 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 22
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1109/lgrs.2025.3623244
  • Dergi Adı: IEEE Geoscience and Remote Sensing Letters
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Aquatic Science & Fisheries Abstracts (ASFA), Communication Abstracts, Compendex, Geobase, INSPEC, Metadex, Civil Engineering Abstracts
  • Anahtar Kelimeler: Aerial image analysis, chain-of-thought (CoT) reasoning, domain adaptation, group relative policy optimization (GRPO), remote sensing (RS), vision language models (VLMs)
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Remote sensing applications often rely on edge hardware that cannot host the models in the 7B parametric vision language of today. This paper presents TinyRS, the first 2B-parameter VLM optimized for remote sensing, and TinyRS-R1, its reasoning-augmented variant. Based on Qwen2-VL-2B, TinyRS is trained via a four-stage pipeline: pre-training on million-scale satellite images, instruction tuning, fine-tuning with Chain-of-Thought (CoT) annotations from a new reasoning dataset, and GRPO-based alignment. TinyRS-R1 matches or surpasses recent 7B remote sensing models in classification, VQA, grounding, and open-ended QA–while using one third of the memory and latency. CoT reasoning improves grounding and scene understanding, while TinyRS excels at concise, low-latency VQA. TinyRS-R1 is the first domain-specialized small VLM with GRPO-aligned CoT reasoning for general-purpose remote sensing. The code, models, and caption datasets will be released.