TinyRS-R1: Compact Vision Language Model for Remote Sensing


KÖKSAL A., ALATAN A. A.

IEEE Geoscience and Remote Sensing Letters, 2025 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Publication Date: 2025
  • Doi Number: 10.1109/lgrs.2025.3623244
  • Journal Name: IEEE Geoscience and Remote Sensing Letters
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Aquatic Science & Fisheries Abstracts (ASFA), Communication Abstracts, Compendex, Geobase, INSPEC, Metadex, Civil Engineering Abstracts
  • Keywords: aerial image analysis, chain-of-thought reasoning, domain adaptation, group relative policy optimization, remote sensing, Vision language models
  • Middle East Technical University Affiliated: Yes

Abstract

Remote sensing applications often rely on edge hardware that cannot host the models in the 7B parametric vision language of today. This paper presents TinyRS, the first 2B-parameter VLM optimized for remote sensing, and TinyRS-R1, its reasoning-augmented variant. Based on Qwen2-VL-2B, TinyRS is trained via a four-stage pipeline: pre-training on million-scale satellite images, instruction tuning, fine-tuning with Chain-of-Thought (CoT) annotations from a new reasoning dataset, and GRPO-based alignment. TinyRS-R1 matches or surpasses recent 7B remote sensing models in classification, VQA, grounding, and open-ended QA–while using one third of the memory and latency. CoT reasoning improves grounding and scene understanding, while TinyRS excels at concise, low-latency VQA. TinyRS-R1 is the first domain-specialized small VLM with GRPO-aligned CoT reasoning for general-purpose remote sensing. The code, models, and caption datasets will be released.