Arama Sonuçları

Listeleniyor 1 - 4 / 4
  • Yayın
    Constructing a Turkish constituency parse treeBank
    (Springer Verlag, 2016) Yıldız, Olcay Taner; Solak, Ercan; Çandır, Şemsinur; Ehsani, Razieh; Görgün, Onur
    In this paper, we describe our initial efforts for creating a Turkish constituency parse treebank by utilizing the English Penn Treebank. We employ a semiautomated approach for annotation. In our previouswork [18], the English parse trees were manually translated to Turkish. In this paper, the words are semi-automatically annotated morphologically. As a second step, a rule-based approach is used for refining the parse trees based on the morphological analyses of the words. We generated Turkish phrase structure trees for 5143 sentences from Penn Treebank that contain fewer than 15 tokens. The annotated corpus can be used in statistical natural language processing studies for developing tools such as constituency parsers and statistical machine translation systems for Turkish.
  • Yayın
    A novel approach to morphological disambiguation for Turkish
    (Springer-Verlag, 2012) Görgün, Onur; Yıldız, Olcay Taner
    In this paper, we propose a classification based approach to the morphological disambiguation for Turkish language. Due to complex morphology in Turkish, any word can get unlimited number of affixes resulting very large tag sets. The problem is defined as choosing one of parses of a word not taking the existing root word into consideration. We trained our model with well-known classifiers using WEKA toolkit and tested on a common test set. The best performance achieved is 95.61% by J48 Tree classifier.
  • Yayın
    Chunking in Turkish with conditional random fields
    (Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, Onur
    In this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.
  • Yayın
    Constructing a Turkish-English parallel treebank
    (Association for Computational Linguistics (ACL), 2014) Yıldız, Olcay Taner; Solak, Ercan; Görgün, Onur; Ehsani, Razieh
    In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and the replacement of the leaf nodes with appropriate glosses. We also report the tools that we built and used in our tree translation task.