Multimodal multimedia information retrieval through the integration of fuzzy clustering, OWA-based fusion, and Siamese neural networks


Sattari S., Kalkan S., Yazıcı A.

FUZZY SETS AND SYSTEMS, cilt.515, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 515
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.fss.2025.109419
  • Dergi Adı: FUZZY SETS AND SYSTEMS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Anahtar Kelimeler: Adaptive fuzzy clustering, Information systems, Multimedia information retrieval, Multimodal fusion, Multiple modalities, Ranking, Siamese network, Triplet loss
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

This paper presents an end-to-end, scalable, and flexible framework for multimodal multimedia information retrieval (MMIR). This framework is designed to handle multiple data modalities, such as visual, audio, and text, frequently encountered in real-world applications. By integrating these different data types, this framework facilitates a more holistic understanding of information, thus improving the accuracy and reliability of retrieval tasks. One of the strengths of this framework is its ability to learn semantic relationships within and between modalities through advanced deep neural networks. These networks are trained on query-hit pairs generated from query logs. A major innovation of this approach lies in the efficient handling of multimodal data uncertainty through an improved fuzzy clustering technique. Additionally, the search process is refined through the use of triplet-loss Siamese networks for sophisticated reranking, as well as a novel fusion approach using the ordered weighted average (OWA) operator to combine the ranks of different retrieval systems. This framework leverages parallel processing and transfer learning for efficient feature extraction across different modalities, thus significantly improving scalability and adaptability. Performance has been rigorously evaluated through comprehensive testing on six widely recognized multimodal datasets. The results indicate that this integrated approach, which combines clustering ranking, triplet loss Siamese network for reranking, OWAbased fusion, and the alternative adaptive fuzzy means method (AAFCM) for soft clustering, consistently outperforms all previous configurations reported in the literature. Our experimental results, supported by extensive statistical analysis, confirm the effectiveness and robustness of this approach in MMIR.