TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.

Anahtar Kelimeler

Downstream task, Embedding model, Matryoshka representation, Natural language inference, Semantic text similarity, Benchmarking, Correlation methods, Embeddings, Machine translation, Natural language processing systems, Down-stream, Language inference, Natural languages, Text similarity, Semantics

Kaynak

2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025

Scopus Q Değeri

N/A

Künye

Ezerceli, Ö., Gümüşçekiçci, G., Erkoç, T. & Özenç, B. (2025). TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks. Paper presented at the 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, 1-6. doi:https://doi.org/10.1109/ASYU67174.2025.11208511

Bağlantı

https://hdl.handle.net/11729/6829
https://doi.org/10.1109/ASYU67174.2025.11208511

Koleksiyon

Bildiri Koleksiyonu | Bilgisayar Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon