Effective semi-supervised learning strategies for automatic sentence segmentation

Dalva, Doğan; Güz, Ümit; Gürkan, Hakan

Effective semi-supervised learning strategies for automatic sentence segmentation

Dosyalar

1416.pdf (1.1 MB)

Tarih

2018-04-01

Yazarlar

Dalva, Doğan

Güz, Ümit

Gürkan, Hakan

Yayıncı

Elsevier Science BV

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.

Anahtar Kelimeler

Machine learning, Multi-view semi-supervised learning, Co-training, Sentence segmentation, Boosting, Speech, Recognition, Multiview, Speech recognition, Sentence boundary, Adaptive boosting, Artificial intelligence, Classification (of information), Learning algorithms, Learning systems, Speech processing, Automatic speech recognizers, Morphological information, Multi-view learning, Semi- supervised learning, Sentence boundaries, Supervised learning

Kaynak

Pattern Recognition Letters

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

105

Sayı

SI

Künye

Dalva, D., Güz, Ü. & Gürkan, H. (2018). Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recognition Letters, 105(SI), 76-86. doi:10.1016/j.patrec.2017.10.010

Bağlantı

https://hdl.handle.net/11729/1416
http://dx.doi.org/10.1016/j.patrec.2017.10.010

Koleksiyon

Makale Koleksiyonu | Elektrik-Elektronik Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

Effective semi-supervised learning strategies for automatic sentence segmentation

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Açıklama

Anahtar Kelimeler

Kaynak

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Bağlantı

Koleksiyon