Towards a Programmable Humanizing AI through Scalable Stance-Directed Architecture

ÇETİNKAYA, YUSUF; Lee, Yeonjung; KÜLAH, EMRE; TOROSLU, İSMAİL; Cowan, Michael; Davulcu, Hasan

doi:10.1109/mic.2024.3450090

Towards a Programmable Humanizing AI through Scalable Stance-Directed Architecture

Atıf İçin Kopyala

ÇETİNKAYA Y. M., Lee Y., KÜLAH E., TOROSLU İ. H., Cowan M. A., Davulcu H.

IEEE Internet Computing, cilt.28, sa.5, ss.20-27, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 28 Sayı: 5
Basım Tarihi: 2024
Doi Numarası: 10.1109/mic.2024.3450090
Dergi Adı: IEEE Internet Computing
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
Sayfa Sayıları: ss.20-27
Anahtar Kelimeler: language generation, language models, sentiment analysis, social networking, Twitter, web text analysis
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The rise of harmful online content underscores the urgent need for AI systems to effectively detect, filter those, and foster safer and healthier communication. This article introduces a novel approach to mitigate toxic content generation propensities of Large Language Models (LLMs) by fine-tuning them with a programmable stance-directed focus on core human values and common good. We propose a streamlined keyword coding and processing pipeline to generate weakly labeled data to train AI models that can avoid toxicity and champion civil discourse. We also developed a toxicity classifier and an Aspect-based Sentiment Analysis (ABSA) model to assess and control the effectiveness of a humanizing AI model. We evaluate the proposed pipeline using a contentious real-world Twitter dataset on U.S. race relations. Our approach successfully curbs the toxic content generation propensity of an unrestricted LLM by a significant 85%.