Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Arama Sonuçları

Listeleniyor 1 - 6 / 6

TUR2SQL: A cross-domain Turkish dataset for Text-to-SQL
(IEEE, 2023-09-15) Kanburoğlu, Ali Buğra; Tek, Faik Boray
The field of converting natural language into corresponding SQL queries using deep learning techniques has attracted significant attention in recent years. While existing Text-to-SQL datasets primarily focus on English and other languages such as Chinese, there is a lack of resources for the Turkish language. In this study, we introduce the first publicly available cross-domain Turkish Text-to-SQL dataset, named TUR2SQL. This dataset consists of 10,809 pairs of natural language statements and their corresponding SQL queries. We conducted experiments using SQLNet and ChatGPT on the TUR2SQL dataset. The experimental results show that SQLNet has limited performance and ChatGPT has superior performance on the dataset. We believe that TUR2SQL provides a foundation for further exploration and advancements in Turkish language-based Text-to-SQL research.
Multi-task learning on mental disorder detection, sentiment analysis, and emotion detection using social media posts
(Institute of Electrical and Electronics Engineers Inc., 2024) Armah, Courage; Dehkharghani, Rahim
Mental disorders such as suicidal behavior, bipolar disorder, depressive disorders, and anxiety have been diagnosed among the youth recently. Social media platforms such as Reddit have become popular for anonymous posts. People are far more likely to share on these social media platforms what they really feel like in their real lives when they are anonymous. It is thus helpful to extract people's sentiments and feelings from these platforms in training models for mental disorder detection. This study uses multi-task learning techniques to examine the estimation of behaviors and mental states for early mental disease diagnosis. We propose a multi-task system trained on three related tasks: mental disorder detection as the primary task, emotion analysis, and sentiment analysis as auxiliary tasks. We took the SWMH dataset, which included four main different mental disorders already labeled (bipolar, depression, anxiety, and suicide) and offmychest. We then added labels for emotion and sentiment to the dataset. The observed results are comparable to previous studies in the field and demonstrate that deep learning multi-task frameworks can improve the accuracy of related text classification tasks when compared to training them separately as single-task systems.
Sentiment analysis for hotel reviews in Turkish by using LLMs
(Institute of Electrical and Electronics Engineers Inc., 2024) Özdemir, Ata Onur; Giritli, Efe Batur; Can, Yekta Said
The field of sentiment analysis plays a pivotal role in consumer decision-making and service quality improvement within the hospitality industry. This study explores the application of Large Language Models (LLMs) for sentiment analysis of Turkish hotel reviews, contributing to the understanding of customer feedback and satisfaction. We created a dataset of 5,000 reviews by translating an English corpus into Turkish, which was then utilized to evaluate the performance of a state-of-the-art Turkish language model, TURNA. The study demonstrates that LLMs, particularly TURNA, outperform traditional machine learning algorithms and other advanced models in sentiment classification tasks, achieving an accuracy of 99.4%. This research underscores the potential of LLMs to enhance the accuracy of sentiment analysis, offering valuable insights for the tourism and hospitality sectors. The findings contribute to the ongoing evolution of sentiment analysis methodologies and suggest that LLMs can significantly improve t he understanding a nd processing of customer feedback in Turkish hotel reviews.
Text-to-SQL: a methodical review of challenges and models
(TÜBİTAK, 2024-05-20) Kanburoğlu, Ali Buğra; Tek, Faik Boray
This survey focuses on Text-to-SQL, automated translation of natural language queries into SQL queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing Text-to-SQL review papers in the literature. We apply the same method to extract proposed Text-to-SQL models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on Text-to-SQL datasets and discuss execution-guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of Text-to-SQL datasets in non-English languages. Additionally, we focus on large language model (LLM) based approaches for the Text-to-SQL task, where we examine LLM-based studies in the literature and subsequently evaluate the LLMs on the cross-domain Spider dataset. Finally, we conclude with a discussion of future directions for Text-to-SQL research, identifying potential areas of improvement and advancements in this field.
TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks
(Institute of Electrical and Electronics Engineers Inc., 2025) Ezerceli, Özay; Gümüşçekiçci, Gizem; Erkoç, Tuğba; Özenç, Berke
This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.
TurkEmbed4Retrieval: Türkçe için geri getirme görevine özel gömme modeli
(Institute of Electrical and Electronics Engineers Inc., 2025-08-15) Ezerceli, Özay; Gümüşçekiçci, Gizem; Erkoç, Tuğba; Özenç, Berke
Bu çalışmada, öncelikle Doğal Dil Çıkarımı (DDÇ) ve Anlamsal Metin Benzerliği (AMB) görevleri için geliştirilen TurkEmbed modelinin, MS-Marco-TR veri seti üzerinde ince ayar yapılarak geri getirme görevlerine uygun hale getirilmesini sağlayan TurkEmbed4Retrieval modelini tanıtıyoruz. Model, Matruşka temsili ögrenme ve özel tasarlanmış negatif çiftlerin sıralanması kayıp fonksiyonu gibi ileri seviye egitim teknikleri kullanılarak optimize edilmiştir. Yapılan kapsamlı deneyler, TurkEmbed4Retrieval’ın, geri getirme metriklerinde TurkishcolBERT modelini Scifact-TR veri kümesinde %19–26 oranında geçtiğini göstermektedir. Bu bağlamda, modelimiz, Türkçe bilgi getirme sistemleri için yeni bir çıtaya ulaşmaktadır.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları