Arama Sonuçları

Listeleniyor 1 - 2 / 2
  • Yayın
    Generative and discriminative methods using morphological information for sentence segmentation of Turkish
    (IEEE-INST Electrical Electronics Engineers Inc, 2009-07) Güz, Ümit; Favre, Benoit; Hakkani Tür, Dilek; Tür, Gökhan
    This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation.
  • Yayın
    Parallel univariate decision trees
    (Elsevier B.V., 2007-05-01) Yıldız, Olcay Taner; Dikmen, Onur
    Univariate decision tree algorithms are widely used in data mining because (i) they are easy to learn (ii) when trained they can be expressed in rule based manner. In several applications mainly including data mining, the dataset to be learned is very large. In those cases it is highly desirable to construct univariate decision trees in reasonable time. This may be accomplished by parallelizing univariate decision tree algorithms. In this paper, we first present two different univariate decision tree algorithms C4.5 and univariate linear discriminant tree. We show how to parallelize these algorithms in three ways: (i) feature based; (ii) node based; (iii) data based manners. Experimental results show that performance of the parallelizations highly depend on the dataset and the node based parallelization demonstrate good speedups.