18th International Conference on Machine Vision, ICMV 2025, Paris, Fransa, 19 - 22 Ekim 2025, cilt.14114, (Tam Metin Bildiri)
Early detection of wheat yellow rust is vital for timely fungicide application before infections exceed 5% of all plants in the monitored plot. While RGB imagery offers high spatial detail, NIR sensing captures early biochemical changes—chlorophyll loss and water stress—undetectable in RGB alone. Therefore, we propose a multimodal semantic segmentation model that relies on Transformer architecture to fuse RGB and NIR modalities. Additionally, further to improve the Transformer-based model, adaptive channel re-weighting is incorporated through lightweight squeeze-and-excitation blocks. When evaluated on UAV-collected field data specifically curated for wheat yellow rust disease, our model achieves an IoU of 0.689, outperforming CNN-based multimodal baselines by 14.1% and the best NIR-only CNN-based model by 11.3%. These findings highlight the potential efficacy of channel-attentive multimodal Transformer architecture for precise wheat yellow rust monitoring.