Arama Sonuçları

Listeleniyor 1 - 4 / 4
  • Yayın
    Evaluating the English-Turkish parallel treebank for machine translation
    (TÜBİTAK, 2022-01-19) Görgün, Onur; Yıldız, Olcay Taner
    This study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator's behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.
  • Yayın
    ISIKSumm at BioLaySumm task 1: BART-based summarization system enhanced with Bio-entity labels
    (Association for Computational Linguistics (ACL), 2023-07-13) Çolak, Çağla; Karadeniz, İlknur
    Communicating scientific research to the general public is an essential yet challenging task. Lay summaries, which provide a simplified version of research findings, can bridge the gap between scientific knowledge and public understanding. The BioLaySumm task (Goldsack et al., 2023) is a shared task that seeks to automate this process by generating lay summaries from biomedical articles. Two different datasets that have been created from curating two biomedical journals (PLOS and eLife) are provided by the task organizers. As a participant in this shared task, we developed a system to generate a lay summary from an article’s abstract and main text.
  • Yayın
    Application of ChatGPT in the tourism domain: potential structures and challenges
    (IEEE, 2023-12-23) Kılıçlıoğlu, Orkun Mehmet; Özçelik, Şuayb Talha; Yöndem, Meltem Turhan
    The tourism industry stands out as a sector where effective customer communication significantly influences sales and customer satisfaction. The recent shift from traditional natural language processing methodologies to state-of-The-Art deep learning and transformer-based models has revolutionized the development of Conversational AI tools. These tools can provide comprehensive information about a company's product portfolio, enhancing customer engagement and decision-making. One potential Conversational AI application can be developed with ChatGPT. In this study, we explore the potential of using ChatGPT, a cutting-edge Conversational AI, in the context of Setur's products and services, focusing on two distinct scenarios: intention recognition and response generation. We incorporate Setur-specific data, including hotel information and annual catalogs. Our research aims to present potential structures and strategies for utilizing Language Model-based systems, particularly ChatGPT, in the tourism domain. We investigate the advantages and disadvantages of three different architectures and evaluate whether a restrictive or more independent model would be suitable for our application. Despite the impressive performance of Large Language Models (LLMs) in generating human-like dialogues, their end-To-end application faces limitations, such as system prompt constraints, fine-Tuning challenges, and model unavailability. Moreover, semantic search fails to deliver satisfactory performance when searching filters that require clear answers. To address these issues, we propose a hybrid approach that employs external interventions, the assignment of different GPT agents according to intent analysis, and traditional methods at specific junctures, which will facilitate the integration of domain knowledge into these systems.
  • Yayın
    Text-to-SQL: a methodical review of challenges and models
    (TÜBİTAK, 2024-05-20) Kanburoğlu, Ali Buğra; Tek, Faik Boray
    This survey focuses on Text-to-SQL, automated translation of natural language queries into SQL queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing Text-to-SQL review papers in the literature. We apply the same method to extract proposed Text-to-SQL models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on Text-to-SQL datasets and discuss execution-guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of Text-to-SQL datasets in non-English languages. Additionally, we focus on large language model (LLM) based approaches for the Text-to-SQL task, where we examine LLM-based studies in the literature and subsequently evaluate the LLMs on the cross-domain Spider dataset. Finally, we conclude with a discussion of future directions for Text-to-SQL research, identifying potential areas of improvement and advancements in this field.