Generative and discriminative methods using morphological information for sentence segmentation of Turkish
dc.authorid | 0000-0002-4597-0954 | |
dc.authorid | 0000-0002-9777-4613 | |
dc.authorid | 0000-0001-5246-2117 | |
dc.contributor.author | Güz, Ümit | en_US |
dc.contributor.author | Favre, Benoit | en_US |
dc.contributor.author | Hakkani Tür, Dilek | en_US |
dc.contributor.author | Tür, Gökhan | en_US |
dc.date.accessioned | 2015-01-15T23:01:18Z | |
dc.date.available | 2015-01-15T23:01:18Z | |
dc.date.issued | 2009-07 | |
dc.department | Işık Üniversitesi, Mühendislik Fakültesi, Elektrik-Elektronik Mühendisliği Bölümü | en_US |
dc.department | Işık University, Faculty of Engineering, Department of Electrical-Electronics Engineering | en_US |
dc.description | This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) CALO (NBCHD-030010) program, the DARPA GALE (HR0011-06-C-0023) program, the Scientific and Technological Research Council of Turkey (TUBITAK) fundings at SRI and ICSI, (TUBITAK CAREER Project 107E182, Extracting and Using Prosodic Information for Turkish Spoken Language), and the Isik University Research Fund (Project 05B304). | en_US |
dc.description.abstract | This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation. | en_US |
dc.description.sponsorship | CALO | en_US |
dc.description.sponsorship | DARPA GALE | en_US |
dc.description.sponsorship | ICSI | en_US |
dc.description.sponsorship | Isik University | en_US |
dc.description.sponsorship | TUBITAK | en_US |
dc.description.version | Publisher's Version | en_US |
dc.identifier.citation | Güz, Ü., Favre, B., Hakkani Tür, D. & Tür, G. (2009). Generative and discriminative methods using morphological information for sentence segmentation of turkish. IEEE Transactions on Audio, Speech, and Language Processing, 17(5), 895-903. doi:10.1109/TASL.2009.2016393 | en_US |
dc.identifier.doi | 10.1109/TASL.2009.2016393 | |
dc.identifier.endpage | 903 | |
dc.identifier.issn | 1558-7916 | |
dc.identifier.issn | 1558-7924 | |
dc.identifier.issue | 5 | |
dc.identifier.scopus | 2-s2.0-85008018937 | |
dc.identifier.scopusquality | N/A | |
dc.identifier.startpage | 895 | |
dc.identifier.uri | https://hdl.handle.net/11729/326 | |
dc.identifier.uri | http://dx.doi.org/10.1109/TASL.2009.2016393 | |
dc.identifier.volume | 17 | |
dc.identifier.wos | WOS:000267434300005 | |
dc.identifier.wosquality | Q2 | |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.indekslendigikaynak | Science Citation Index Expanded (SCI-EXPANDED) | en_US |
dc.institutionauthor | Güz, Ümit | en_US |
dc.institutionauthorid | 0000-0002-4597-0954 | |
dc.language.iso | en | en_US |
dc.peerreviewed | Yes | en_US |
dc.publicationstatus | Published | en_US |
dc.publisher | IEEE-INST Electrical Electronics Engineers Inc | en_US |
dc.relation.ispartof | IEEE Transactions on Audio Speech and Language Processing | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Prosodic and lexical information | en_US |
dc.subject | Sentence segmentation | en_US |
dc.subject | Turkish morphology | en_US |
dc.subject | Automatic speech recognition | en_US |
dc.subject | Boosting | en_US |
dc.subject | Computer science | en_US |
dc.subject | Data mining | en_US |
dc.subject | Feature extraction | en_US |
dc.subject | Hidden Markov models | en_US |
dc.subject | Hybrid power systems | en_US |
dc.subject | Morphology | en_US |
dc.subject | Natural languages | en_US |
dc.subject | Vocabulary | en_US |
dc.subject | Speech processing | en_US |
dc.subject | Word processing | en_US |
dc.subject | Turkish word sequences | en_US |
dc.subject | Conditional random fields | en_US |
dc.subject | Discriminative classification techniques | en_US |
dc.subject | Discriminative methods | en_US |
dc.subject | Generative methods | en_US |
dc.subject | Hidden event language modeling | en_US |
dc.subject | Morphological information | en_US |
dc.title | Generative and discriminative methods using morphological information for sentence segmentation of Turkish | en_US |
dc.type | Article | en_US |
dspace.entity.type | Publication |
Dosyalar
Orijinal paket
1 - 1 / 1
Küçük Resim Yok
- İsim:
- 326.pdf
- Boyut:
- 626.12 KB
- Biçim:
- Adobe Portable Document Format
- Açıklama:
- Publisher's Version