Arama Sonuçları

Listeleniyor 1 - 9 / 9
  • Yayın
    İngilizce-Türkçe istatistiksel makine çevirisinde biçimbilim kullanımı
    (IEEE, 2012-04-18) Görgün, Onur; Yıldız, Olcay Taner
    Bu çalışmada, İngilizce-Türkçe dil ikilisi için biçimbilimsel çözümleme yardımı ile SIU dermecesi üzerinde istatistiksel makine çevirisi denemeleri yapılmıştır. Kelime biçimlerinin baz alındığı çeviri denemeleri İngilizce-Türkçe dil ikilisi gibi biçimbilimsel ve çekimsel olarak birbirinden uzak diller için düşük performans göstermektedir. Bu durumda, çeviri temel birimi olarak kelime formlarının yerine alt-sözcüksel temsiller kullanmak, makine çevirisi performansını önemli ölçüde arttırmaktadır.
  • Yayın
    Constructing a Turkish constituency parse treeBank
    (Springer Verlag, 2016) Yıldız, Olcay Taner; Solak, Ercan; Çandır, Şemsinur; Ehsani, Razieh; Görgün, Onur
    In this paper, we describe our initial efforts for creating a Turkish constituency parse treebank by utilizing the English Penn Treebank. We employ a semiautomated approach for annotation. In our previouswork [18], the English parse trees were manually translated to Turkish. In this paper, the words are semi-automatically annotated morphologically. As a second step, a rule-based approach is used for refining the parse trees based on the morphological analyses of the words. We generated Turkish phrase structure trees for 5143 sentences from Penn Treebank that contain fewer than 15 tokens. The annotated corpus can be used in statistical natural language processing studies for developing tools such as constituency parsers and statistical machine translation systems for Turkish.
  • Yayın
    A novel approach to morphological disambiguation for Turkish
    (Springer-Verlag, 2012) Görgün, Onur; Yıldız, Olcay Taner
    In this paper, we propose a classification based approach to the morphological disambiguation for Turkish language. Due to complex morphology in Turkish, any word can get unlimited number of affixes resulting very large tag sets. The problem is defined as choosing one of parses of a word not taking the existing root word into consideration. We trained our model with well-known classifiers using WEKA toolkit and tested on a common test set. The best performance achieved is 95.61% by J48 Tree classifier.
  • Yayın
    Chunking in Turkish with conditional random fields
    (Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, Onur
    In this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.
  • Yayın
    Evaluating the English-Turkish parallel treebank for machine translation
    (TÜBİTAK, 2022-01-19) Görgün, Onur; Yıldız, Olcay Taner
    This study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator's behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.
  • Yayın
    Constructing a Turkish-English parallel treebank
    (Association for Computational Linguistics (ACL), 2014) Yıldız, Olcay Taner; Solak, Ercan; Görgün, Onur; Ehsani, Razieh
    In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and the replacement of the leaf nodes with appropriate glosses. We also report the tools that we built and used in our tree translation task.
  • Yayın
    Neural network as a forecasting tool for financial decision-making
    (Işık Üniversitesi, 2008-09-18) Görgün, Onur; Perdahçı, Nazım Ziya; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Enformasyon Teknolojileri Yüksek Lisans Programı
    For the last decade, machine learning techniques have been applied to financial tasks such as portfolio management, risk assessment and stock market prediction. Among these techniques artificial neural network as a machine learning algorithm is the most widely used model. In stock market environment, multi layer perceptron with backpropagation model is dominant among others in stock market prediction. This study examines the forecasting power of multi layer perceptron models for predicting the direction of ISE 100 daily index value. The results show that multi layer perceptron has a promising power in predicting the stock market trend. However, it also shows that selection of input variables is dominant factor in stock market prediction to obtain accurate results.
  • Yayın
    English-Turkish parallel treebank with morphological annotations and its use in tree-based SMT
    (SciTePress, 2016) Görgün, Onur; Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh
    In this paper, we report our tree based statistical translation study from English to Turkish. We describe our data generation process and report the initial results of tree-based translation under a simple model. For corpus construction, we used the Penn Treebank in the English side. We manually translated about 5K trees from English to Turkish under grammar constraints with adaptations to accommodate the agglutinative nature of Turkish morphology. We used a permutation model for subtrees together with a word to word mapping. We report BLEU scores under simple choices of inference algorithms.
  • Yayın
    English to Turkish machine translation using synchronous grammars
    (Işık Üniversitesi, 2022-06-14) Görgün, Onur; Tüysüz Erman, Ayşegül; Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Doktora Programı
    Machine translation (MT) has been one of the hot topics in NLP research over recent years. However, most of the related studies have been done for specific languages, and there are a limited number of comprehensive studies for languages with free word order, such as Turkish. English-Turkish is also one of the least frequently studied language pairs in translation due to the morphological and syntactic gaps between the two languages. This also makes it hard to build parallel corpora, which is crucial for the machine translation task. This thesis aims to be the first statistical syntax tree-based machine translation approach to the English-Turkish language pair, as well as a parallel corpus for translation tasks. We construct an English-Turkish parallel treebank of approximately 17K sentences by following a three-phased approach: manual transformation of English trees from Penn Treebank (PTB) by constraining the translated trees to the reordering of the children and gloss replacement; morphological analysis of the translated gloss; and morphological enrichment of the target tree. For translation consistency, we also developed a set of tools. We also apply the transformation schema to the closed-domain and build 8.3K sentences corpus. We employ both corpora on machine translation task. In our experiments, we obtained a 12.8 BLEU score in the open-domain and a 26.8 BLEU score in the closed-domain. We also evaluate both corpora intrinsically through perplexity analysis. The results show that our studies on making a corpus can be repeated, and studies on machine translation using the small corpus look promising.