Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation
dc.authorid | 0000-0002-7035-8724 | |
dc.authorid | 0000-0002-4597-0954 | |
dc.authorid | 0000-0002-7008-4778 | |
dc.contributor.author | Dalva, Doğan | en_US |
dc.contributor.author | Güz, Ümit | en_US |
dc.contributor.author | Gürkan, Hakan | en_US |
dc.date.accessioned | 2019-05-22T00:14:38Z | |
dc.date.available | 2019-05-22T00:14:38Z | |
dc.date.issued | 2018 | |
dc.department | Işık Üniversitesi, Mühendislik Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü | en_US |
dc.department | Işık University, Faculty of Engineering, Department of Electrical-Electronics Engineering | en_US |
dc.description.abstract | The objective of this work is to develop effective multi-view semi-supervised machine learning strategies for sentence boundary classification problem when only small sets of sentence boundary labeled data are available. We propose three-view and committee-based learning strategies incorporating with co-training algorithms with agreement, disagreement, and self-combined learning strategies using prosodic, lexical and morphological information. We compare experimental results of proposed three-view and committee-based learning strategies to other semi-supervised learning strategies in the literature namely, self-training and co-training with agreement, disagreement, and self-combined strategies. The experiment results show that sentence segmentation performance can be highly improved using multi-view learning strategies that we propose since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average performance when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively. | en_US |
dc.description.sponsorship | This material is based upon work supported by the Scientific and Technological Research Council of Turkey (TUBITAK) (Project Number: 107E182 and Project Number: 111E228) and Isik University Scientific Research Project Fund (Project Number: 09A301 and Project Number: 14A201) and J. William Fulbright Post-Doctoral Research Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies | en_US |
dc.description.version | Publisher's Version | en_US |
dc.identifier.citation | Dalva, D., Güz, Ü. & Gürkan, H. (2018). Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation. Paper presented at the 2018 IEEE Spoken Language Technology Workshop (SLT), 750-755. doi:10.1109/SLT.2018.8639533 | en_US |
dc.identifier.doi | 10.1109/SLT.2018.8639533 | |
dc.identifier.endpage | 755 | |
dc.identifier.isbn | 9781538643341 | |
dc.identifier.isbn | 9781538643334 | |
dc.identifier.isbn | 9781538643358 | |
dc.identifier.issn | 2639-5479 | |
dc.identifier.scopus | 2-s2.0-85063073665 | |
dc.identifier.scopusquality | N/A | |
dc.identifier.startpage | 750 | |
dc.identifier.uri | https://hdl.handle.net/11729/1594 | |
dc.identifier.uri | http://dx.doi.org/10.1109/SLT.2018.8639533 | |
dc.identifier.wos | WOS:000463141800104 | |
dc.identifier.wosquality | N/A | |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.indekslendigikaynak | Conference Proceedings Citation Index – Science (CPCI-S) | en_US |
dc.institutionauthor | Dalva, Doğan | en_US |
dc.institutionauthor | Güz, Ümit | en_US |
dc.institutionauthorid | 0000-0002-7035-8724 | |
dc.institutionauthorid | 0000-0002-4597-0954 | |
dc.language.iso | en | en_US |
dc.peerreviewed | Yes | en_US |
dc.publicationstatus | Published | en_US |
dc.publisher | IEEE | en_US |
dc.relation.ispartof | 2018 IEEE Spoken Language Technology Workshop (SLT) | en_US |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Boosting | en_US |
dc.subject | Co-training | en_US |
dc.subject | Sentence segmentation | en_US |
dc.subject | Semi-supervised learning | en_US |
dc.subject | Prosody | en_US |
dc.subject | Speech | en_US |
dc.subject | Learning algorithms | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Supervised learning | en_US |
dc.subject | Data models | en_US |
dc.subject | Semisupervised learning | en_US |
dc.subject | Feature extraction | en_US |
dc.subject | Training | en_US |
dc.subject | Tools | en_US |
dc.subject | Task analysis | en_US |
dc.subject | Learning (artificial intelligence) | en_US |
dc.subject | Natural language processing | en_US |
dc.subject | Speech processing | en_US |
dc.subject | Multiview learning strategies | en_US |
dc.subject | Disjoint feature sets | en_US |
dc.subject | Manually labeled data | en_US |
dc.subject | Sentence boundary classification problem | en_US |
dc.subject | Sentence boundary labeled data | en_US |
dc.subject | Committee-based learning strategies | en_US |
dc.subject | Prosodic information | en_US |
dc.subject | Lexical information | en_US |
dc.subject | Morphological information | en_US |
dc.subject | Self-combined strategies | en_US |
dc.subject | Automatic sentence segmentation | en_US |
dc.subject | Conventional co-training learning | en_US |
dc.subject | Multiview semisupervised machine learning | en_US |
dc.subject | Turkish spoken languages | en_US |
dc.subject | English spoken languages | en_US |
dc.title | Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation | en_US |
dc.type | Conference Object | en_US |
dspace.entity.type | Publication |