Arama Sonuçları
Listeleniyor 1 - 10 / 16
Yayın Generation of optimum signature base sequences for speech signals(IEEE, 2000) Yarman, Bekir Sıddık Binboğa; Akdeniz, RafetIn our previous publications [1-6], we proposed a novel method to represent signals in terms of, so called, "Signature Base Functions-SBF' which were extracted from the physical features of the waveform under consideration. In [1-6], SBF were determined in ad-hoc manner, which requires tedious search process, and they were not orthogonal. Furthermore, optimality of SBF was in question. In this work however, we suggest a well-organised procedure to generate "Optimum Orthogonal Signature Base Functions-OSBF' for selected waveforms, which in turn provides excellent means for signal representations. II is shown that the new method of signal representation, which is based on OSBF, requires less computation time with substantial signal compression and results in efficient speaker dependent recognition.Yayın Cascaded model adaptation for dialog act segmentation and tagging(Elsevier Ltd, 2010-04) Güz, Ümit; Tür, Gökhan; Hakkani Tür, Dilek; Cuendet, SebastienThere are many speech and language processing problems which require cascaded classification tasks. While model adaptation has been shown to be useful in isolated speech and language processing tasks, it is not clear what constitutes system adaptation for such complex systems. This paper studies the following questions: In cases where a sequence of classification tasks is employed, how important is to adapt the earlier or latter systems? Is the performance improvement obtained in the earlier stages via adaptation carried on to later stages in cases where the later stages perform adaptation using similar data and/or methods? In this study, as part of a larger scale multiparty meeting understanding system, we analyze various methods for adapting dialog act segmentation and tagging models trained on conversational telephone speech (CTS) to meeting style conversations. We investigate the effect of using adapted and unadapted models for dialog act segmentation with those of tagging, showing the effect of model adaptation for cascaded classification tasks. Our results indicate that we can achieve significantly better dialog act segmentation and tagging by adapting the out-of-domain models, especially when the amount of in-domain data is limited. Experimental results show that it is more effective to adapt the models in the latter classification tasks, in our case dialog act tagging, when dealing with a sequence of cascaded classification tasksYayın Representation of speech signals by single signature base function within optimum frame length(IEEE, 2000) Akdeniz, Rafet; Yarman, Bekir Sıddık BinboğaBefore this study, we proposed a novel method to represent signals in terms of, so called, “Signature Base Functions-SBF" which were extracted from the physical features of the waveform under consideration. SBF were determined in ad-hoc manner, which requires tedious search process, and they were not orthogonal. Furthermore, optimality of SBF was in question. In this work however, we suggest a well-organized procedure to generate “Optimum Orthogonal Signature Base Functions-OSBF" for selected waveforms, which in turn provides excellent means for signal representations. It is shown that the new method of signal representation, which is based on OSBF, requires less computation time with substantial signal compression and results in efficient speaker dependent recognition.Yayın Effective semi-supervised learning strategies for automatic sentence segmentation(Elsevier Science BV, 2018-04-01) Dalva, Doğan; Güz, Ümit; Gürkan, HakanThe primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.Yayın A novel biometric identification system based on fingertip electrocardiogram and speech signals(Elsevier Inc., 2022-03) Güven, Gökhan; Güz, Ümit; Gürkan, HakanIn this research work, we propose a one-dimensional Convolutional Neural Network (CNN) based biometric identification system that combines speech and ECG modalities. The aim is to find an effective identification strategy while enhancing both the confidence and the performance of the system. In our first approach, we have developed a voting-based ECG and speech fusion system to improve the overall performance compared to the conventional methods. In the second approach, we have developed a robust rejection algorithm to prevent unauthorized access to the fusion system. We also presented a newly developed ECG spike and inconsistent beats removal algorithm to detect and eliminate the problems caused by portable fingertip ECG devices and patient movements. Furthermore, we have achieved a system that can work with only one authorized user by adding a Universal Background Model to our algorithm. In the first approach, the proposed fusion system achieved a 100% accuracy rate for 90 people by taking the average of 3-fold cross-validation. In the second approach, by using 90 people as genuine classes and 26 people as imposter classes, the proposed system achieved 92% accuracy in identifying genuine classes and 96% accuracy in rejecting imposter classes.Yayın Extension of conventional co-training learning strategies to three-view and committee-based learning strategies for effective automatic sentence segmentation(IEEE, 2018) Dalva, Doğan; Güz, Ümit; Gürkan, HakanThe objective of this work is to develop effective multi-view semi-supervised machine learning strategies for sentence boundary classification problem when only small sets of sentence boundary labeled data are available. We propose three-view and committee-based learning strategies incorporating with co-training algorithms with agreement, disagreement, and self-combined learning strategies using prosodic, lexical and morphological information. We compare experimental results of proposed three-view and committee-based learning strategies to other semi-supervised learning strategies in the literature namely, self-training and co-training with agreement, disagreement, and self-combined strategies. The experiment results show that sentence segmentation performance can be highly improved using multi-view learning strategies that we propose since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average performance when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.Yayın Fast inter-mode decision and selective quarter-pel refinement in H.264 video coding(IEEE, 2008) Ateş, Hasan FehmiIn H.264 video coding standard, there exist several inter - prediction modes that use macroblock partitions with variable block sizes. Choosing a rate-distortion optimal coding mode for each macroblock is essential for the best possible coding performance, but also prohibitive due to the heavy computational complexity associated with the required rate-distortion calculations. Likewise, sub-pel motion refinement improves the coding efficiency, but becomes a major computational bottleneck when integer-pel search is executed fast. In this paper, we present a simple strategy to reduce the complexity of quarter-pel refinement and inter-mode decision with minimum loss of coding efficiency. Based on the results of the half-pel motion estimation step, our method evaluates the likelihood of each inter-coding mode being optimal. Then, quarter-pel refinement and actual rate and distortion are computed for only those coding modes with sufficient chance of being optimal. We claim that this method minimizes optimal mode estimation error at a given level of refinement and mode decision complexity. Simulation results show that the algorithm speeds up quarter-pel search and inter-mode selection modules by a factor of about 6 with less than 0.12 dB PSNR loss.Yayın A new speech modeling method: SYMPES(IEEE, 2006) Güz, Ümit; Gürkan, Hakan; Yarman, Bekir Sıddık BinboğaIn this paper, the new method of speech modeling which is called SYMPES is introduced and it is compared with the commercially available methods. It is shown that for the same compression ratio or better, SYMPES yields considerably better hearing quality over the coders such as G.726 at 16 Kbps and voice excited LPC-10E of 2.4Kbps.Yayın A new coding method for speech and audio signals(IEEE, 2005) Güz, Ümit; Gürkan, Hakan; Yarman, Bekir Sıddık BinboğaIn this paper a new representation or modeling method of speech signals is introduced. The proposed method is based on the generation of the so-called Predefined Signature S={S R } and Envelope vector E={E K } Sets (PSEVS). These vector sets are speaker and language independent. In this method, once the speech signals are divided into frames with selected lengths, then each frame signal piece X i is reconstructed by means of the mathematical form of X i =C i E K S R . In this representation, C i is called the frame coefficient, S R and E K are the vectors properly assigned from the PSEVS respectively. It is shown that the proposed method provides fast reconstruction and substantial compression ratio with acceptable hearing quality.Yayın Chunking in Turkish with conditional random fields(Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, OnurIn this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.












