TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

Ezerceli, Özay; Gümüşçekiçci, Gizem; Erkoç, Tuğba; Özenç, Berke

TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

dc.authorid	0000-0002-7877-7528
dc.authorid	0000-0002-9502-7817
dc.authorid	0000-0001-9033-8934
dc.authorid	0000-0003-2008-243X
dc.contributor.author	Ezerceli, Özay	en_US
dc.contributor.author	Gümüşçekiçci, Gizem	en_US
dc.contributor.author	Erkoç, Tuğba	en_US
dc.contributor.author	Özenç, Berke	en_US
dc.date.accessioned	2025-12-12T08:15:11Z
dc.date.available	2025-12-12T08:15:11Z
dc.date.issued	2025
dc.department	Işık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.department	Işık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering	en_US
dc.description.abstract	This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.	en_US
dc.description.version	Publisher's Version	en_US
dc.identifier.citation	Ezerceli, Ö., Gümüşçekiçci, G., Erkoç, T. & Özenç, B. (2025). TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks. Paper presented at the 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, 1-6. doi:https://doi.org/10.1109/ASYU67174.2025.11208511	en_US
dc.identifier.doi	10.1109/ASYU67174.2025.11208511
dc.identifier.endpage	6
dc.identifier.isbn	9798331597276
dc.identifier.scopus	2-s2.0-105022518404
dc.identifier.scopusquality	N/A
dc.identifier.startpage	1
dc.identifier.uri	https://hdl.handle.net/11729/6829
dc.identifier.uri	https://doi.org/10.1109/ASYU67174.2025.11208511
dc.indekslendigikaynak	Scopus	en_US
dc.institutionauthor	Gümüşçekiçci, Gizem	en_US
dc.institutionauthor	Erkoç, Tuğba	en_US
dc.institutionauthor	Özenç, Berke	en_US
dc.institutionauthorid	0000-0002-9502-7817
dc.institutionauthorid	0000-0001-9033-8934
dc.institutionauthorid	0000-0003-2008-243X
dc.language.iso	en	en_US
dc.peerreviewed	Yes	en_US
dc.publicationstatus	Published	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartof	2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Downstream task	en_US
dc.subject	Embedding model	en_US
dc.subject	Matryoshka representation	en_US
dc.subject	Natural language inference	en_US
dc.subject	Semantic text similarity	en_US
dc.subject	Benchmarking	en_US
dc.subject	Correlation methods	en_US
dc.subject	Embeddings	en_US
dc.subject	Machine translation	en_US
dc.subject	Natural language processing systems	en_US
dc.subject	Down-stream	en_US
dc.subject	Language inference	en_US
dc.subject	Natural languages	en_US
dc.subject	Text similarity	en_US
dc.subject	Semantics	en_US
dc.title	TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: TurkEmbed_Turkish_embedding_model_on_natural_language_inference_sentence_text_similarity_tasks.pdf
Boyut:: 1.28 MB
Biçim:: Adobe Portable Document Format

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bildiri Koleksiyonu | Bilgisayar Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu