Setting standards in Turkish NLP: TR-MMLU for large language model evaluation

dc.authorid0000-0003-1298-4521
dc.authorid0009-0002-7907-1209
dc.authorid0000-0002-6652-4339
dc.authorid0000-0002-7764-2891
dc.authorid0000-0002-4305-8785
dc.contributor.authorBayram, M. Alien_US
dc.contributor.authorFincan, Ali Ardaen_US
dc.contributor.authorGümüş, Ahmet Semihen_US
dc.contributor.authorDiri, Banuen_US
dc.contributor.authorYıldırım, Savaşen_US
dc.contributor.authorAytaş, Öneren_US
dc.date.accessioned2025-10-07T12:33:03Z
dc.date.available2025-10-07T12:33:03Z
dc.date.issued2025-01-04
dc.departmentIşık Üniversitesi, Meslek Yüksekokulu, Bilgisayar Programcılığı Programıen_US
dc.departmentIşık University, Vocational School, Computer Programming Programen_US
dc.description.abstractLanguage models have made remarkable advancements in understanding and generating human language, achieving notable success across a wide array of applications. However, evaluating these models remains a significant challenge, particularly for resource-limited languages such as Turkish. To address this gap, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is constructed from a carefully curated dataset comprising 6,200 multiple-choice questions across 62 sections, selected from a pool of 280,000 questions spanning 67 disciplines and over 800 topics within the Turkish education system. This benchmark provides a transparent, reproducible, and culturally relevant tool for evaluating model performance. It serves as a standard framework for Turkish NLP research, enabling detailed analyses of LLMs’ capabilities in processing Turkish text and fostering the development of more robust and accurate language models. In this study, we evaluate state-of-the-art LLMs on TR-MMLU, providing insights into their strengths and limitations for Turkish-specific tasks. Our findings reveal critical challenges, such as the impact of tokenization and fine-tuning strategies, and highlight areas for improvement in model design. By setting a new standard for evaluating Turkish language models, TR-MMLU aims to inspire future innovations and support the advancement of Turkish NLP research.en_US
dc.description.versionPreprint's Versionen_US
dc.identifier.citationBayram, M. A., Fincan, A. A., Gümüş, A. S., Diri, B., Yıldırım, S. & Aytaş, Ö. (2025). Setting standards in Turkish NLP: TR-MMLU for large language model evaluation. Arxiv, 1-6. doi: https://doi.org/10.48550/arXiv.2501.00593en_US
dc.identifier.endpage6
dc.identifier.startpage1
dc.identifier.urihttps://hdl.handle.net/11729/6751
dc.identifier.urihttps://doi.org/10.48550/arXiv.2501.00593
dc.identifier.wosPPRN:120258859
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakPreprint Citation Indexen_US
dc.institutionauthorAytaş, Öneren_US
dc.language.isoenen_US
dc.publisherCornell Univen_US
dc.relation.ispartofArxiven_US
dc.relation.publicationcategoryÖn Baskı – Uluslararası – Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectLarge Language Models (LLM)en_US
dc.subjectNatural Language Processing (NLP)en_US
dc.subjectArtificial Intelligenceen_US
dc.subjectTurkish NLPen_US
dc.titleSetting standards in Turkish NLP: TR-MMLU for large language model evaluationen_US
dc.typePreprinten_US
dspace.entity.typePublicationen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Setting_Standards_in_Turkish_NLP_TR_MMLU_for_Large_Language_Model_Evaluation.pdf
Boyut:
84.14 KB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.17 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: