Arama Sonuçları

Listeleniyor 1 - 4 / 4
  • Yayın
    Model adaptation for dialog act tagging
    (IEEE, 2006) Tür, Gökhan; Güz, Ümit; Hakkani Tür, Dilek
    In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.
  • Yayın
    Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation
    (IEEE, 2018) Dalva, Doğan; Güz, Ümit; Gürkan, Hakan
    The objective of this work is to develop effective multi-view semi-supervised machine learning strategies for sentence boundary classification problem when only small sets of sentence boundary labeled data are available. We propose three-view and committee-based learning strategies incorporating with co-training algorithms with agreement, disagreement, and self-combined learning strategies using prosodic, lexical and morphological information. We compare experimental results of proposed three-view and committee-based learning strategies to other semi-supervised learning strategies in the literature namely, self-training and co-training with agreement, disagreement, and self-combined strategies. The experiment results show that sentence segmentation performance can be highly improved using multi-view learning strategies that we propose since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average performance when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.
  • Yayın
    Extraction and comparison of various prosodic feature sets on sentence segmentation task for Turkish broadcast news data
    (IEEE, 2014) Dalva, Doğan; Revidi, İzel D.; Güz, Ümit; Gürkan, Hakan
    In this work, prosodic features of the Turkish Broadcast News (BN) data are extracted using an open source prosodic feature extraction tool based on Praat. The profiles and effectiveness of these features are also investigated for the sentence segmentation task on the Turkish BN data. We not only used some combinations of the feature sets but also collected some of them in one prosodic feature model in order to achieve one of the best performance. The results of the experiments show that some combinations of the prosodic feature sets are very useful for the automatic sentence segmentation task on the Turkish BN data.
  • Yayın
    Türkçe haber yayını verileri için bürünsel bilginin çıkarılması ve cümle bölütlemede kullanılması
    (IEEE, 2014-04-23) Dalva, Doğan; Revidi, İzel D.; Güz, Ümit; Gürkan, Hakan
    Bu çalışmada, Türkçe haber yayını verilerine ilişkin bürünsel özelliklerin açık kaynak kodlu yazılımlar ile çıkarılması ve bürünsel özellik gruplarının Otomatik Konuşma Tanıma (Automatic Speech Recognition) Sistemi çıkışından elde edilen metin üzerinde cümle bölütlemedeki başarımlarının karşılaştırılması gerçekleştirilmiştir.Özellikle cümle bölütleme işlevi için oldukça yüksek başarım oranına sahip bir bürünsel özellik seti elde edilmiştir.