Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Arama Sonuçları

Listeleniyor 1 - 7 / 7

Cascaded model adaptation for dialog act segmentation and tagging
(Elsevier Ltd, 2010-04) Güz, Ümit; Tür, Gökhan; Hakkani Tür, Dilek; Cuendet, Sebastien
There are many speech and language processing problems which require cascaded classification tasks. While model adaptation has been shown to be useful in isolated speech and language processing tasks, it is not clear what constitutes system adaptation for such complex systems. This paper studies the following questions: In cases where a sequence of classification tasks is employed, how important is to adapt the earlier or latter systems? Is the performance improvement obtained in the earlier stages via adaptation carried on to later stages in cases where the later stages perform adaptation using similar data and/or methods? In this study, as part of a larger scale multiparty meeting understanding system, we analyze various methods for adapting dialog act segmentation and tagging models trained on conversational telephone speech (CTS) to meeting style conversations. We investigate the effect of using adapted and unadapted models for dialog act segmentation with those of tagging, showing the effect of model adaptation for cascaded classification tasks. Our results indicate that we can achieve significantly better dialog act segmentation and tagging by adapting the out-of-domain models, especially when the amount of in-domain data is limited. Experimental results show that it is more effective to adapt the models in the latter classification tasks, in our case dialog act tagging, when dealing with a sequence of cascaded classification tasks
Generative and discriminative methods using morphological information for sentence segmentation of Turkish
(IEEE-INST Electrical Electronics Engineers Inc, 2009-07) Güz, Ümit; Favre, Benoit; Hakkani Tür, Dilek; Tür, Gökhan
This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation.
Effective semi-supervised learning strategies for automatic sentence segmentation
(Elsevier Science BV, 2018-04-01) Dalva, Doğan; Güz, Ümit; Gürkan, Hakan
The primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.
A novel biometric identification system based on fingertip electrocardiogram and speech signals
(Elsevier Inc., 2022-03) Güven, Gökhan; Güz, Ümit; Gürkan, Hakan
In this research work, we propose a one-dimensional Convolutional Neural Network (CNN) based biometric identification system that combines speech and ECG modalities. The aim is to find an effective identification strategy while enhancing both the confidence and the performance of the system. In our first approach, we have developed a voting-based ECG and speech fusion system to improve the overall performance compared to the conventional methods. In the second approach, we have developed a robust rejection algorithm to prevent unauthorized access to the fusion system. We also presented a newly developed ECG spike and inconsistent beats removal algorithm to detect and eliminate the problems caused by portable fingertip ECG devices and patient movements. Furthermore, we have achieved a system that can work with only one authorized user by adding a Universal Background Model to our algorithm. In the first approach, the proposed fusion system achieved a 100% accuracy rate for 90 people by taking the average of 3-fold cross-validation. In the second approach, by using 90 people as genuine classes and 26 people as imposter classes, the proposed system achieved 92% accuracy in identifying genuine classes and 96% accuracy in rejecting imposter classes.
A new method to represent speech signals via predefined signature and envelope sequences
(Hindawi Publishing Corporation, 2007) Güz, Ümit; Gürkan, Hakan; Yarman, Bekir Sıddık Binboğa
A novel systematic procedure referred to as "SYMPES" to model speech signals is introduced. The structure of SYMPES is based on the creation of the so-called predefined "signature S = {S(R)(n)} and envelope E = {E(K) (n)}" sets. These sets are speaker and language independent. Once the speech signals are divided into frames with selected lengths, then each frame sequence X(i)( n) is reconstructed by means of the mathematical form X(i)( n) = C(i)E(K) (n) S(R)(n). In this representation, C(i) is called the gain factor, S(R)(n) and E(K) (n) are properly assigned from the predefined signature and envelope sets, respectively. Examples are given to exhibit the implementation of SYMPES. It is shown that for the same compression ratio or better, SYMPES yields considerably better speech quality over the commercially available coders such as G. 726 (ADPCM) at 16 kbps and voice excited LPC-10E (FS1015) at 2.4 kbps.
On the comparative results of "SYMPES: A new method of speech modeling"
(Elsevier GMBH, 2006) Yarman, Bekir Sıddık Binboğa; Güz, Ümit; Gürkan, Hakan
In this paper, the new method of speech modeling which is called SYMPES (A Novel Systematic Procedure to Model Speech Signals via Predefined "Envelope and Signature Sequences") is introduced and it is compared with the commercially available methods. It is shown that for the same compression ratio or better, SYMPES yields considerably better hearing quality over the coders such as G.726 (ADPCM) at 16 kbps and voice-excited LPC-10E of 2.4 kbps.
Multi-view semi-supervised learning for dialog act segmentation of speech
(IEEE-INST Electrical Electronics Engineers Inc, 2010-02) Güz, Ümit; Cuendet, Sebastien; Hakkani Tür, Dilek; Tür, Gökhan
Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multi-view methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları