5 sonuçlar
Arama Sonuçları
Listeleniyor 1 - 5 / 5
Yayın Parallel proposition bank construction for Turkish(Işık Üniversitesi, 2019-04-02) Ak, Koray; Yıldız, Olcay Taner; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Doktora ProgramıPropBank is the bank of propositions which contains hand-annotated corpus for predicate-argument information and semantic roles or arguments. It aims to provide an extensive dataset for enhancing NLP applications such as information retrieval, machine translation, information extraction, and question answering by adding a semantic information layer to the syntactic annotation. Via the added semantic layer, syntactic parser re?nements can be achieved which increases the e?ciency and improves application performance. The aim of this thesis is to construct proposition bank for Turkish Language. Only preliminary studies were carried out in terms of Turkish PropBank. This study is one of the pioneers for the language. In this study, a hand annotated Turkish PropBank is constructed from the translation of the parallel English PropBank corpus, other PropBank studies for Turkish language examined and compared with the proposition bank constructed, automatic PropBank construction for Turkish from both parallel sentence trees and phrase sentences is analyzed and automatic proposition banks generated for Turkish.Yayın A tree-based approach for English-to-Turkish translation(Tubitak Scientific & Technical Research Council Turkey, 2019) Bakay, Özge; Avar, Begüm; Yıldız, Olcay TanerIn this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.Yayın A FST description of noun and verb morphology of Azarbaijani Turkish(Association for Computational Linguistics (ACL), 2021) Ehsani, Razieh; Özenç, Berke; Solak, Ercan; Drewes F.We give a FST description of nominal and finite verb morphology of Azarbaijani Turkish. We use a hybrid approach where nominal inflection is expressed as a slot-based paradigm and major parts of verb inflection are expressed as optional paths on the FST. We collapse adjective and noun categories in a single nominal category as they behave similarly as far as their paradigms are concerned. Thus, we defer a more precise identification of POS to further down the NLP pipeline.Yayın Automatic propbank generation for Turkish(Incoma Ltd, 2019-09) Ak, Koray; Yıldız, Olcay TanerSemantic role labeling (SRL) is an important task for understanding natural languages, where the objective is to analyse propositions expressed by the verb and to identify each word that bears a semantic role. It provides an extensive dataset to enhance NLP applications such as information retrieval, machine translation, information extraction, and question answering. However, creating SRL models are difficult. Even in some languages, it is infeasible to create SRL models that have predicate-argument structure due to lack of linguistic resources. In this paper, we present our method to create an automatic Turkish PropBank by exploiting parallel data from the translated sentences of English PropBank. Experiments show that our method gives promising results. © 2019 Association for Computational Linguistics (ACL).Yayın An open, extendible, and fast Turkish morphological analyzer(Incoma Ltd, 2019-09) Yıldız, Olcay Taner; Avar, Begüm; Ercan, GökhanIn this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.












