5 sonuçlar
Arama Sonuçları
Listeleniyor 1 - 5 / 5
Yayın İngilizce-Türkçe istatistiksel makine çevirisinde biçimbilim kullanımı(IEEE, 2012-04-18) Görgün, Onur; Yıldız, Olcay TanerBu çalışmada, İngilizce-Türkçe dil ikilisi için biçimbilimsel çözümleme yardımı ile SIU dermecesi üzerinde istatistiksel makine çevirisi denemeleri yapılmıştır. Kelime biçimlerinin baz alındığı çeviri denemeleri İngilizce-Türkçe dil ikilisi gibi biçimbilimsel ve çekimsel olarak birbirinden uzak diller için düşük performans göstermektedir. Bu durumda, çeviri temel birimi olarak kelime formlarının yerine alt-sözcüksel temsiller kullanmak, makine çevirisi performansını önemli ölçüde arttırmaktadır.Yayın Unsupervised morphological analysis using tries(Springer London, 2012) Ak, Koray; Yıldız, Olcay TanerThis article presents an unsupervised morphological analysis algorithm to segment words into roots and affixes. The algorithm relies on word occurrences in a given dataset. Target languages are English, Finnish, and Turkish, but the algorithm can be used to segment any word from any language given the wordlists acquired from a corpus consisting of words and word occurrences. In each iteration, the algorithm divides words with respect to occurrences and constructs a new trie for the remaining affixes. Preliminary experimental results on three languages show that our novel algorithm performs better than most of the previous algorithms.Yayın A novel approach to morphological disambiguation for Turkish(Springer-Verlag, 2012) Görgün, Onur; Yıldız, Olcay TanerIn this paper, we propose a classification based approach to the morphological disambiguation for Turkish language. Due to complex morphology in Turkish, any word can get unlimited number of affixes resulting very large tag sets. The problem is defined as choosing one of parses of a word not taking the existing root word into consideration. We trained our model with well-known classifiers using WEKA toolkit and tested on a common test set. The best performance achieved is 95.61% by J48 Tree classifier.Yayın Chunking in Turkish with conditional random fields(Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, OnurIn this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.Yayın Building annotated parallel corpora using the ATIS dataset: two UD-style treebanks in English and Turkish(European Language Resources Association (ELRA), 2024-05-20) Cesur, Neslihan; Kuzgun, Aslı; Köse, Mehmet; Yıldız, Olcay TanerIn this paper, we introduce the annotation process of the Air Travel Information Systems (ATIS) Dataset as a parallel treebank in English and in Turkish. The ATIS Dataset was originally compiled as pilot data to measure the efficiency of Spoken Language Systems and it comprises human speech transcriptions of people asking for flight information on the automated inquiry systems. Our first annotated treebank, which is in English, includes 61.879 tokens (5.432 sentences) while the second treebank, which was translated into Turkish, contains 45.875 tokens for the same amount of sentences. First, both treebanks were morphologically annotated through a semi-automatic process. Later, the dependency annotations were performed by a team of linguists according to the Universal Dependencies (UD) guidelines. These two parallel annotated treebanks provide a valuable contribution to language resources thanks to the spontaneous/spoken nature of the data and the availability of cross-linguistic dependency annotation.












