TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks

dc.authorid0000-0002-7877-7528
dc.authorid0000-0002-9502-7817
dc.authorid0000-0001-9033-8934
dc.authorid0000-0003-2008-243X
dc.contributor.authorEzerceli, Özayen_US
dc.contributor.authorGümüşçekiçci, Gizemen_US
dc.contributor.authorErkoç, Tuğbaen_US
dc.contributor.authorÖzenç, Berkeen_US
dc.date.accessioned2025-12-12T08:15:11Z
dc.date.available2025-12-12T08:15:11Z
dc.date.issued2025
dc.departmentIşık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.departmentIşık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineeringen_US
dc.description.abstractThis paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.en_US
dc.description.versionPublisher's Versionen_US
dc.identifier.citationEzerceli, Ö., Gümüşçekiçci, G., Erkoç, T. & Özenç, B. (2025). TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks. Paper presented at the 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, 1-6. doi:https://doi.org/10.1109/ASYU67174.2025.11208511en_US
dc.identifier.doi10.1109/ASYU67174.2025.11208511
dc.identifier.endpage6
dc.identifier.isbn9798331597276
dc.identifier.scopus2-s2.0-105022518404
dc.identifier.scopusqualityN/A
dc.identifier.startpage1
dc.identifier.urihttps://hdl.handle.net/11729/6829
dc.identifier.urihttps://doi.org/10.1109/ASYU67174.2025.11208511
dc.indekslendigikaynakScopusen_US
dc.institutionauthorGümüşçekiçci, Gizemen_US
dc.institutionauthorErkoç, Tuğbaen_US
dc.institutionauthorÖzenç, Berkeen_US
dc.institutionauthorid0000-0002-9502-7817
dc.institutionauthorid0000-0001-9033-8934
dc.institutionauthorid0000-0003-2008-243X
dc.language.isoenen_US
dc.peerreviewedYesen_US
dc.publicationstatusPublisheden_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartof2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025en_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDownstream tasken_US
dc.subjectEmbedding modelen_US
dc.subjectMatryoshka representationen_US
dc.subjectNatural language inferenceen_US
dc.subjectSemantic text similarityen_US
dc.subjectBenchmarkingen_US
dc.subjectCorrelation methodsen_US
dc.subjectEmbeddingsen_US
dc.subjectMachine translationen_US
dc.subjectNatural language processing systemsen_US
dc.subjectDown-streamen_US
dc.subjectLanguage inferenceen_US
dc.subjectNatural languagesen_US
dc.subjectText similarityen_US
dc.subjectSemanticsen_US
dc.titleTurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasksen_US
dc.typeConference Objecten_US
dspace.entity.typePublicationen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
TurkEmbed_Turkish_embedding_model_on_natural_language_inference_sentence_text_similarity_tasks.pdf
Boyut:
1.28 MB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: