Arama Sonuçları

Listeleniyor 1 - 6 / 6
  • Yayın
    Construction of a Turkish proposition bank
    (Tubitak Scientific & Technical Research Council Turkey, 2018) Ak, Koray; Toprak, Cansu; Esgel, Volkan; Yıldız, Olcay Taner
    This paper describes our approach to developing the Turkish PropBank by adopting the semantic role-labeling guidelines of the original PropBank and using the translation of the English Penn-TreeBank as a resource. We discuss the semantic annotation process of the PropBank and language-specific cases for Turkish, the tools we have developed for annotation, and quality control for multiuser annotation. In the current phase of the project, more than 9500 sentences are semantically analyzed and predicate-argument information is extracted for 1330 verbs and 1914 verb senses. Our plan is to annotate 17,000 sentences by the end of 2017.
  • Yayın
    A tree-based approach for English-to-Turkish translation
    (Tubitak Scientific & Technical Research Council Turkey, 2019) Bakay, Özge; Avar, Begüm; Yıldız, Olcay Taner
    In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.
  • Yayın
    Automatic propbank generation for Turkish
    (Incoma Ltd, 2019-09) Ak, Koray; Yıldız, Olcay Taner
    Semantic role labeling (SRL) is an important task for understanding natural languages, where the objective is to analyse propositions expressed by the verb and to identify each word that bears a semantic role. It provides an extensive dataset to enhance NLP applications such as information retrieval, machine translation, information extraction, and question answering. However, creating SRL models are difficult. Even in some languages, it is infeasible to create SRL models that have predicate-argument structure due to lack of linguistic resources. In this paper, we present our method to create an automatic Turkish PropBank by exploiting parallel data from the translated sentences of English PropBank. Experiments show that our method gives promising results. © 2019 Association for Computational Linguistics (ACL).
  • Yayın
    English-Turkish parallel treebank with morphological annotations and its use in tree-based SMT
    (SciTePress, 2016) Görgün, Onur; Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh
    In this paper, we report our tree based statistical translation study from English to Turkish. We describe our data generation process and report the initial results of tree-based translation under a simple model. For corpus construction, we used the Penn Treebank in the English side. We manually translated about 5K trees from English to Turkish under grammar constraints with adaptations to accommodate the agglutinative nature of Turkish morphology. We used a permutation model for subtrees together with a word to word mapping. We report BLEU scores under simple choices of inference algorithms.
  • Yayın
    An open, extendible, and fast Turkish morphological analyzer
    (Incoma Ltd, 2019-09) Yıldız, Olcay Taner; Avar, Begüm; Ercan, Gökhan
    In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.