TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks
| dc.authorid | 0000-0002-7877-7528 | |
| dc.authorid | 0000-0002-9502-7817 | |
| dc.authorid | 0000-0001-9033-8934 | |
| dc.authorid | 0000-0003-2008-243X | |
| dc.contributor.author | Ezerceli, Özay | en_US |
| dc.contributor.author | Gümüşçekiçci, Gizem | en_US |
| dc.contributor.author | Erkoç, Tuğba | en_US |
| dc.contributor.author | Özenç, Berke | en_US |
| dc.date.accessioned | 2025-12-12T08:15:11Z | |
| dc.date.available | 2025-12-12T08:15:11Z | |
| dc.date.issued | 2025 | |
| dc.department | Işık Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Bilgisayar Mühendisliği Bölümü | en_US |
| dc.department | Işık University, Faculty of Engineering and Natural Sciences, Department of Computer Engineering | en_US |
| dc.description.abstract | This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications. | en_US |
| dc.description.version | Publisher's Version | en_US |
| dc.identifier.citation | Ezerceli, Ö., Gümüşçekiçci, G., Erkoç, T. & Özenç, B. (2025). TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks. Paper presented at the 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, 1-6. doi:https://doi.org/10.1109/ASYU67174.2025.11208511 | en_US |
| dc.identifier.doi | 10.1109/ASYU67174.2025.11208511 | |
| dc.identifier.endpage | 6 | |
| dc.identifier.isbn | 9798331597276 | |
| dc.identifier.scopus | 2-s2.0-105022518404 | |
| dc.identifier.scopusquality | N/A | |
| dc.identifier.startpage | 1 | |
| dc.identifier.uri | https://hdl.handle.net/11729/6829 | |
| dc.identifier.uri | https://doi.org/10.1109/ASYU67174.2025.11208511 | |
| dc.indekslendigikaynak | Scopus | en_US |
| dc.institutionauthor | Gümüşçekiçci, Gizem | en_US |
| dc.institutionauthor | Erkoç, Tuğba | en_US |
| dc.institutionauthor | Özenç, Berke | en_US |
| dc.institutionauthorid | 0000-0002-9502-7817 | |
| dc.institutionauthorid | 0000-0001-9033-8934 | |
| dc.institutionauthorid | 0000-0003-2008-243X | |
| dc.language.iso | en | en_US |
| dc.peerreviewed | Yes | en_US |
| dc.publicationstatus | Published | en_US |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
| dc.relation.ispartof | 2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025 | en_US |
| dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Downstream task | en_US |
| dc.subject | Embedding model | en_US |
| dc.subject | Matryoshka representation | en_US |
| dc.subject | Natural language inference | en_US |
| dc.subject | Semantic text similarity | en_US |
| dc.subject | Benchmarking | en_US |
| dc.subject | Correlation methods | en_US |
| dc.subject | Embeddings | en_US |
| dc.subject | Machine translation | en_US |
| dc.subject | Natural language processing systems | en_US |
| dc.subject | Down-stream | en_US |
| dc.subject | Language inference | en_US |
| dc.subject | Natural languages | en_US |
| dc.subject | Text similarity | en_US |
| dc.subject | Semantics | en_US |
| dc.title | TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks | en_US |
| dc.type | Conference Object | en_US |
| dspace.entity.type | Publication | en_US |
Dosyalar
Orijinal paket
1 - 1 / 1
Küçük Resim Yok
- İsim:
- TurkEmbed_Turkish_embedding_model_on_natural_language_inference_sentence_text_similarity_tasks.pdf
- Boyut:
- 1.28 MB
- Biçim:
- Adobe Portable Document Format
Lisans paketi
1 - 1 / 1
Küçük Resim Yok
- İsim:
- license.txt
- Boyut:
- 1.17 KB
- Biçim:
- Item-specific license agreed upon to submission
- Açıklama:












