Arama Sonuçları

Listeleniyor 1 - 2 / 2
  • Yayın
    Generative and discriminative methods using morphological information for sentence segmentation of Turkish
    (IEEE-INST Electrical Electronics Engineers Inc, 2009-07) Güz, Ümit; Favre, Benoit; Hakkani Tür, Dilek; Tür, Gökhan
    This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation.
  • Yayın
    Co-training using prosodic, lexical and morphological information for automatic sentence segmentation of Turkish spoken language
    (Işık Üniversitesi, 2018-01-15) Dalva, Doğan; Güz, Ümit; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Elektronik Mühendisliği Doktora Programı
    Sentence segmentation of speech aims detecting sentence boundaries in a stream of words output by the speech recognizer. Sentence segmentation is a preliminary step toward speech understanding. It is of particular importance for speech related applications, as most of the further processing steps; such as parsing, machine translation and information extraction, assume the presence of sentence boundaries. Typically, statistical methods require a huge amount of manually labeled data, which is time and labor consuming process to prepare. In this work, novel multiview semi-supervised learning strategies for the solution of sentence segmentation problem are proposed. The aim of this work is to and effective semi-supervised machine learning strategies when only a small set of sentence boundary labeled data is available. This work proposes three-view co-training and committee-based strategies incorporating with agreement, disagreement and self-combined strategies using lexical, morphological and prosodic information, and investigates performance of the proposed learning strategies against baseline, self-training and co-training. The experimental results show that the proposed learning strategies highly improve the sentence segmentation problem, since data sets can be represented by three redundantly suffcient and disjoint feature sets.