Evaluating cross-lingual textual similarity on dictionary alignment problem

SEVER, YİĞİT; ERCAN, GÖNENÇ

doi:10.1007/s10579-020-09498-1

Evaluating cross-lingual textual similarity on dictionary alignment problem

SEVER Y., ERCAN G.

LANGUAGE RESOURCES AND EVALUATION, cilt.54, sa.4, ss.1059-1078, 2020 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 54 Sayı: 4
Basım Tarihi: 2020
Doi Numarası: 10.1007/s10579-020-09498-1
Dergi Adı: LANGUAGE RESOURCES AND EVALUATION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, FRANCIS, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, EBSCO Education Source, Educational research abstracts (ERA), Humanities Abstracts, INSPEC, Linguistic Bibliography, Linguistics & Language Behavior Abstracts, Metadex, MLA - Modern Language Association Database, Civil Engineering Abstracts
Sayfa Sayıları: ss.1059-1078
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Bilingual or even polylingual word embeddings created many possibilities for tasks involving multiple languages. While some tasks like cross-lingual information retrieval aim to satisfy users' multilingual information needs, some enable transferring valuable information from resource-rich languages to resource-poor ones. In any case, it is important to build and evaluate methods that operate in a cross-lingual setting. In this paper, Wordnet definitions in 7 different languages are used to create a semantic textual similarity testbed to evaluate cross-lingual textual semantic similarity methods. A document alignment task is created to be used between Wordnet glosses of synsets in 7 different languages. Unsupervised textual similarity methods-Wasserstein distance, Sinkhorn distance and cosine similarity-are compared with a supervised Siamese deep learning model. The task is modeled both as a retrieval task and an alignment task to investigate the hubness of the semantic similarity functions. Our findings indicate that considering the problem as a retrieval and alignment problem has a detrimental effect on the results. Furthermore, we show that cross-lingual textual semantic similarity can be used as an automated Wordnet construction method.