Arama Sonuçları

Listeleniyor 1 - 10 / 12
  • Yayın
    Graph clustering approach to sentiment analysis
    (Işık Üniversitesi, 2018-01-24) Kanburoğlu, Ali Buğra; Solak, Ercan; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı
    In this thesis, we aim at automatically predicting Turkish movie review scores using adjective clustering. We also measured the reliability of the two popular sentiment lexicons. In order to measure the agreement between these sentiment exicons and human judgments, we designed a ranking experiment using pairwise comparisons. Then, we compared these sentiment lexicons and human judgments, and we gave results that show a moderate level of agreement between lexicons and human judgments. Furthermore, we performed adjective clustering task and singleton scoring to automatically assign scores to Turkish movie reviews. Adjective clustering reached an accuracy of 76%, singleton scoring reached an accuracy of 79%.
  • Yayın
    A haar classifier based call number detection and counting method for library books
    (IEEE, 2018-12-06) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    Counting and organization of books in libraries is a routine and time-consuming task The task gets more complicated by misplaced books in shelves. In order to solve these problems, we propose an automated visual call number (book-id) detection and counting system in this paper. The method employs a Haar feature-based classifier from OpenCV library and cloud-based OCR system to decode characters from images. To develop and test the method, we have acquired and organized a dataset of 1000 book call numbers. The proposed method has been tested on 20 bookshelves images that contain 233 call numbers, which resulted in a true detection rate of 96% and false detection rate of 1.75 per image. For OCR step, the number of false recognized characters per call number was 0.76.
  • Yayın
    A new approach for named entity recognition
    (IEEE, 2017) Ertopçu, Burak; Kanburoğlu, Ali Buğra; Topsakal, Ozan; Açıkgöz, Onur; Gürkan, Ali Tunca; Özenç, Berke; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
    Many sentences create certain impressions on people. These impressions help the reader to have an insight about the sentence via some entities. In NLP, this process corresponds to Named Entity Recognition (NER). NLP algorithms can trace a lot of entities in the sentence like person, location, date, time or money. One of the major problems in these operations are confusions about whether the word denotes the name of a person, a location or an organisation, or whether an integer stands for a date, time or money. In this study, we design a new model for NER algorithms. We train this model in our predefined dataset and compare the results with other models. In the end we get considerable outcomes in a dataset containing 1400 sentences.
  • Yayın
    An experimental evaluation of prior polarities in sentiment lexicons
    (IEEE, 2017) Kanburoğlu, Ali Buğra; Solak, Ercan
    We present the results of an experiment to assess the validity of prior polarities available in sentiment lexicons. We designed a ranking task that was elicited through pairwise comparisons and compared the results to those predicted by two popular sentiment lexicons. We find that the experiment results show a moderate level of agreement between the lexicons and human judgments.
  • Yayın
    Shallow parsing in Turkish
    (IEEE, 2017) Topsakal, Ozan; Açıkgöz, Onur; Gürkan, Ali Tunca; Kanburoğlu, Ali Buğra; Ertopçu, Burak; Özenç, Berke; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
    In this study, shallow parsing is applied on Turkish sentences. These sentences are used to train and test the per-formances of various learning algorithms with various features specified for shallow parsing in Turkish.
  • Yayın
    Bulanık mantık kullanılarak sese duyarlı aydınlatma
    (IEEE, 2017-10-31) Kanburoğlu, Ali Buğra; Şaşmaz, Emre
    Sanayileşmenin ve teknolojinin gelişmesiyle birlikte, geçmişte çözülememiş olan problemler daha kolay çözülebilir hale gelmiştir. İnsan beyninin çalışma mekanizması çeşitli metotlar halinde bilgisayarlarda uygulanmaya başlanmış ve yapay zeka (YZ) alanı ortaya çıkmıştır. YZ tekniklerinin kullanılması ve yaygınlaşmasıyla, bilim dünyasının her alanındaki problemlere çözümler sunulmuştur. Bu çalışmada, YZ’nin tekniklerinden biri olan bulanık mantık (BM) konusu ele alınmıştır. BM kullanılarak, kütüphanelerin ortak alanlarında bulunan aydınlatma sisteminin sese duyarlı bir şekilde modellenmesi gerçekleştirilmiştir.
  • Yayın
    All-words word sense disambiguation for Turkish
    (IEEE, 2017) Açıkgöz, Onur; Gürkan, Ali Tunca; Ertopçu, Burak; Topsakal, Ozan; Özenç, Berke; Kanburoğlu, Ali Buğra; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
    Identifying the sense of a word within a context is a challenging problem and has many applications in natural language processing. This assignment problem is called word sense disambiguation(WSD). Many papers in the literature focus on English language and data. Our dataset consists of 1400 sentences translated to Turkish from the Penn Treebank Corpus. This paper seeks to address and discuss 6 different feature extraction methods and its classification performances using C4.5, Random Forests, Rocchio, Naive Bayes, KNN, Linear and multilayer Perceptron. This paper calls into question how the described features perform on a morphologically rich language (Turkish) with several classifiers.
  • Yayın
    TUR2SQL: A cross-domain Turkish dataset for Text-to-SQL
    (IEEE, 2023-09-15) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    The field of converting natural language into corresponding SQL queries using deep learning techniques has attracted significant attention in recent years. While existing Text-to-SQL datasets primarily focus on English and other languages such as Chinese, there is a lack of resources for the Turkish language. In this study, we introduce the first publicly available cross-domain Turkish Text-to-SQL dataset, named TUR2SQL. This dataset consists of 10,809 pairs of natural language statements and their corresponding SQL queries. We conducted experiments using SQLNet and ChatGPT on the TUR2SQL dataset. The experimental results show that SQLNet has limited performance and ChatGPT has superior performance on the dataset. We believe that TUR2SQL provides a foundation for further exploration and advancements in Turkish language-based Text-to-SQL research.
  • Yayın
    TURSpider: a Turkish Text-to-SQL dataset and LLM-based study
    (Institute of Electrical and Electronics Engineers Inc., 2024-11-25) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.
  • Yayın
    Large language model based automated translation of natural language to SQL
    (Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, 2025-01-22) Kanburoğlu, Ali Buğra; Tek, Faik Boray; Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programı; Işık University, School of Graduate Studies, Ph.D. in Computer Engineering
    The field of Text-to-SQL, which involves converting natural language into SQL queries, has seen significant advancements, but challenges remain, particularly for low-resource languages like Turkish. This thesis introduces three key contributions to address these challenges. Our first contribution is the development and open-access release of TUR2SQL, the first cross-domain Turkish Text-to-SQL dataset, which consists of 10,809 natural language sentences paired with their corresponding SQL queries. We evaluate the performance of SQLNet, a deep learning model specifically designed for this task, and one of the most successful Large Language Models (LLMs), ChatGPT, on this dataset. The results demonstrate the superior performance of ChatGPT. The second major contribution is the construction and publicly available release of TURSpider, the most extensive Turkish Text-to-SQL dataset. TURSpider is built by translating the widely used cross-domain Spider dataset from English to Turkish. This dataset includes complex queries with varying difficulty levels, facilitating the training and comparison of large language models for Turkish Text-to-SQL tasks. Our comparative analysis shows that fine-tuned Turkish LLMs achieve competitive performance, with some models surpassing OpenAI models in query accuracy. To further enhance performance, we apply the Chainof-Feedback (CoF) methodology, demonstrating its effectiveness across multiple models. Finally, we explore the Mixture-of-Agents (MoA) framework, which combines outputs from multiple models to improve the performance of open-source LLMs for Text-to-SQL tasks. By integrating MoA with the CoF technique, we propose MoAF-SQL, an approach that significantly improves performance, particularly on complex queries. Our experiments show that MoAF-SQL achieves competitive results, highlighting its potential to enhance the Text-to-SQL capabilities of open-source LLMs.