Arama Sonuçları

Listeleniyor 1 - 7 / 7
  • Yayın
    Construction of a Turkish proposition bank
    (Tubitak Scientific & Technical Research Council Turkey, 2018) Ak, Koray; Toprak, Cansu; Esgel, Volkan; Yıldız, Olcay Taner
    This paper describes our approach to developing the Turkish PropBank by adopting the semantic role-labeling guidelines of the original PropBank and using the translation of the English Penn-TreeBank as a resource. We discuss the semantic annotation process of the PropBank and language-specific cases for Turkish, the tools we have developed for annotation, and quality control for multiuser annotation. In the current phase of the project, more than 9500 sentences are semantically analyzed and predicate-argument information is extracted for 1330 verbs and 1914 verb senses. Our plan is to annotate 17,000 sentences by the end of 2017.
  • Yayın
    English-Turkish parallel semantic annotation of Penn-Treebank
    (Oficyna Wydawnicza Politechniki Wroclawskiej, 2020) Arıcan, Bilge Nas; Bakay, Özge; Avar, Begüm; Yıldız, Olcay Taner; Ergelen, Özlem
    This paper reports our efforts in constructing a sense-labeled English-Turkish parallel corpus using the traditional method of manual tagging. We tagged a pre-built parallel treebank which was translated from the Penn Treebank corpus. This approach allowed us to generate a resource combining syntactic and semantic information. We provide statistics about the corpus itself as well as information regarding its development process.
  • Yayın
    On building the largest and cross-linguistic Turkish dependency corpus
    (Institute of Electrical and Electronics Engineers Inc., 2020-10-15) Kuzgun, Aslı; Cesur, Neslihan; Arıcan, Bilge Nas; Özçelik, Merve; Marşan, Büşra; Kara, Neslihan; Aslan, Deniz Baran; Yıldız, Olcay Taner
    In this paper, we aim to introduce the dependency annotation process of the largest and the only cross-linguistic Turkish dependency treebank which was translated from the original Penn Treebank corpus. Within the scope of this project, 16.400 sentences have been morphologically and semantically annotated, and the dependency relations were manually carried out by a team of linguists. It is hoped that this project will serve as a base for a successful dependency parser and a system which can automatically perform the bi-directional conversion between constituency and dependency trees.
  • Yayın
    Chunking in Turkish with conditional random fields
    (Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, Onur
    In this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.
  • Yayın
    Constructing a Turkish-English parallel treebank
    (Association for Computational Linguistics (ACL), 2014) Yıldız, Olcay Taner; Solak, Ercan; Görgün, Onur; Ehsani, Razieh
    In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and the replacement of the leaf nodes with appropriate glosses. We also report the tools that we built and used in our tree translation task.
  • Yayın
    An all-words sense annotated Turkish corpus
    (IEEE, 2018-06-06) Akçakaya, Sinan; Yıldız, Olcay Taner
    This paper reports our efforts in constructing of a sense labeled Turkish corpus with respect to Turkish Language Institution's dictionary, using the traditional method of manual tagging. We tagged a pre-built parallel treebank which is translated from the Penn Treebank II corpus. This approach allowed us to generate a full-coverage resource, in which syntactic and semantic information merged. We also provide miscellaneous statistics about the corpus itself as well as its development process.
  • Yayın
    Creating a syntactically felicitous constituency treebank for Turkish
    (Institute of Electrical and Electronics Engineers Inc., 2020-10-15) Kara, Neslihan; Marşan, Büşra; Özçelik, Merve; Arıcan, Bilge Nas; Kuzgun, Aslı; Cesur, Neslihan; Aslan, Deniz Baran; Yıldız, Olcay Taner
    In this study, Bakay et. al [1] and Yildiz et. al.'s [2] work on Turkish constituency treebanks were developed further. Compared to the previous work, the most prominent feature of this study is the fact that every annotation and refinement process is held manually. In addition, constituency treebank created as a result of this study abides by the syntactic rules and typologic features of Turkish while the trees created by previous studies convey only the translated and simply inverted trees that completely ignore the syntactic properties of Turkish. The methodology followed in this study resulted in a significantly more accurate representation of Turkish language and simpler, relatively flatter trees. The straightforward style of trees in this study reduces the complexity and offers a better training dataset for learning algorithms.