Arama Sonuçları
Listeleniyor 1 - 10 / 14
Yayın Generative and discriminative methods using morphological information for sentence segmentation of Turkish(IEEE-INST Electrical Electronics Engineers Inc, 2009-07) Güz, Ümit; Favre, Benoit; Hakkani Tür, Dilek; Tür, GökhanThis paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation.Yayın Model adaptation for dialog act tagging(IEEE, 2006) Tür, Gökhan; Güz, Ümit; Hakkani Tür, DilekIn this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.Yayın Comparison of Turkish proposition banks by frame matching(IEEE, 2018-12-06) Ak, Koray; Bakay, Özge; Yıldız, Olcay TanerBy indicating semantic relations between a predicate and its associated participants in a sentence and identifying the role-bearing constituents, SRL provides an extensive dataset to understand natural languages and to enhance several NLP applications such as information retrieval, machine translation, information extraction, and question answering. The availability of large resources and the development of statistical machine learning methods have increased the studies in the field of SRL. One of the widely-used semantic resources applied for multiple languages is PropBank. In this paper, PropBanks applied for Turkish are compared by checking semantic roles in the frame files of matched verb senses. As this integrated lexical resource for Turkish is aimed to be used in a multilingual resource along with English, creation of an inclusive lexical resource for Turkish is of great importance.Yayın A new speech modeling method: SYMPES(IEEE, 2006) Güz, Ümit; Gürkan, Hakan; Yarman, Bekir Sıddık BinboğaIn this paper, the new method of speech modeling which is called SYMPES is introduced and it is compared with the commercially available methods. It is shown that for the same compression ratio or better, SYMPES yields considerably better hearing quality over the coders such as G.726 at 16 Kbps and voice excited LPC-10E of 2.4Kbps.Yayın A novel method to represent the speech signals by using language and speaker independent predefined functions sets(IEEE, 2004) Güz, Ümit; Gürkan, Hakan; Yarman, Bekir Sıddık BinboğaIn this paper a new modeling method of speech signals is introduced. The proposed method is based on the generation of the so-called Predefined Signature S={s(R)(t)} and Envelope Function E = {e(K)(t)} Sets (PSEFS). These function sets are independent of any speaker and any language. Once the speech signals are divided into frames with selected lengths, then each frame signal piece X-i(t) is synthesized by means of the mathematical form of x(i)(t)=C(i)e(K)(t)s(R)(t). In this representation, C-i is called the frame coefficient, s(R)(t) and e(K)(t) are properly assigned from the PSEFS respectively. It is shown that the proposed method provides fast reconstruction and substantial compression with acceptable hearing quality.Yayın Automatic propbank generation for Turkish(Incoma Ltd, 2019-09) Ak, Koray; Yıldız, Olcay TanerSemantic role labeling (SRL) is an important task for understanding natural languages, where the objective is to analyse propositions expressed by the verb and to identify each word that bears a semantic role. It provides an extensive dataset to enhance NLP applications such as information retrieval, machine translation, information extraction, and question answering. However, creating SRL models are difficult. Even in some languages, it is infeasible to create SRL models that have predicate-argument structure due to lack of linguistic resources. In this paper, we present our method to create an automatic Turkish PropBank by exploiting parallel data from the translated sentences of English PropBank. Experiments show that our method gives promising results. © 2019 Association for Computational Linguistics (ACL).Yayın TUR2SQL: A cross-domain Turkish dataset for Text-to-SQL(IEEE, 2023-09-15) Kanburoğlu, Ali Buğra; Tek, Faik BorayThe field of converting natural language into corresponding SQL queries using deep learning techniques has attracted significant attention in recent years. While existing Text-to-SQL datasets primarily focus on English and other languages such as Chinese, there is a lack of resources for the Turkish language. In this study, we introduce the first publicly available cross-domain Turkish Text-to-SQL dataset, named TUR2SQL. This dataset consists of 10,809 pairs of natural language statements and their corresponding SQL queries. We conducted experiments using SQLNet and ChatGPT on the TUR2SQL dataset. The experimental results show that SQLNet has limited performance and ChatGPT has superior performance on the dataset. We believe that TUR2SQL provides a foundation for further exploration and advancements in Turkish language-based Text-to-SQL research.Yayın Multi-task learning on mental disorder detection, sentiment analysis, and emotion detection using social media posts(Institute of Electrical and Electronics Engineers Inc., 2024) Armah, Courage; Dehkharghani, RahimMental disorders such as suicidal behavior, bipolar disorder, depressive disorders, and anxiety have been diagnosed among the youth recently. Social media platforms such as Reddit have become popular for anonymous posts. People are far more likely to share on these social media platforms what they really feel like in their real lives when they are anonymous. It is thus helpful to extract people's sentiments and feelings from these platforms in training models for mental disorder detection. This study uses multi-task learning techniques to examine the estimation of behaviors and mental states for early mental disease diagnosis. We propose a multi-task system trained on three related tasks: mental disorder detection as the primary task, emotion analysis, and sentiment analysis as auxiliary tasks. We took the SWMH dataset, which included four main different mental disorders already labeled (bipolar, depression, anxiety, and suicide) and offmychest. We then added labels for emotion and sentiment to the dataset. The observed results are comparable to previous studies in the field and demonstrate that deep learning multi-task frameworks can improve the accuracy of related text classification tasks when compared to training them separately as single-task systems.Yayın Assessing ChatGPT's accuracy in dyslexia inquiry(Institute of Electrical and Electronics Engineers Inc., 2024) Eroğlu, Günet; Harb, Mhd Raja AbouDyslexia poses challenges in accessing reliable information, crucial for affected individuals and their families. Leveraging chatbot technology offers promise in this regard. This study evaluates the OpenAI Assistant's precision in addressing dyslexia-related inquiries. Three hundred questions commonly posed by parents were categorized and presented to the Assistant. Expert evaluation of responses, graded on accuracy and completeness, yielded consistently high scores (median=5). Descriptive questions scored higher (average=4.9568) than yes/no questions (average=4.8957), indicating potential response challenges. Statistical analysis highlighted the significance of question specificity in response quality. Despite occasional difficulties, the Assistant demonstrated adaptability and reliability in providing accurate dyslexia-related information.Yayın Sentiment analysis for hotel reviews in Turkish by using LLMs(Institute of Electrical and Electronics Engineers Inc., 2024) Özdemir, Ata Onur; Giritli, Efe Batur; Can, Yekta SaidThe field of sentiment analysis plays a pivotal role in consumer decision-making and service quality improvement within the hospitality industry. This study explores the application of Large Language Models (LLMs) for sentiment analysis of Turkish hotel reviews, contributing to the understanding of customer feedback and satisfaction. We created a dataset of 5,000 reviews by translating an English corpus into Turkish, which was then utilized to evaluate the performance of a state-of-the-art Turkish language model, TURNA. The study demonstrates that LLMs, particularly TURNA, outperform traditional machine learning algorithms and other advanced models in sentiment classification tasks, achieving an accuracy of 99.4%. This research underscores the potential of LLMs to enhance the accuracy of sentiment analysis, offering valuable insights for the tourism and hospitality sectors. The findings contribute to the ongoing evolution of sentiment analysis methodologies and suggest that LLMs can significantly improve t he understanding a nd processing of customer feedback in Turkish hotel reviews.












