Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Listeleniyor 1 - 3 / 3

Automatic speech recognition system for Turkish spoken language
(Işık Üniversitesi, 2012-06-21) Dalva, Doğan; Güz, Ümit; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Elektronik Mühendisliği Yüksek Lisans Programı
The transmission and storage of speech sounds is possible for decades. In addition by using signal processing techniques, it is also possible tp process speech signals. By using time abd frequency analysis od speech signal and several machine learning algorithms, it is possible to build a system which is used to recognize spoken words. Such systems are called Automatic Speech Recognition systems. In our work, We have used the Automatic Speech Recognition system for Turkish spoken language which has built by BUSIM speech group. However, the output of the recognizer is the list of spoken words. Even for humans it is avery hard to understand a text without punctuation symbols. Hence to build more complex recognizer whose goal to perform topic segmentation and topic summarization, the output of ASR should be divided into sentences at first. Our goal is to build a system which performs the sentence segmentation. In our work We have used ASR system to obtain word level and phoneme level time marks and by using that time marks with the audio files, We have extracted prosodic features, where the prosodic properties of speech contains information about the punctuation in the text, which is not available at the output of ASR system.
Enhancement of the coded speech using filtering
(Işık Üniversitesi, 2017-04-14) Taylan, Salih Sinan; Güz, Ümit; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Elektronik Mühendisliği Yüksek Lisans Programı
The processing and storage of speech signals are widely implemented in modern communication systems. Decreasing the amount of information for modeling the reconstruction of speech signal enhances the transmission and storage capacity of the system. It is important to compress speech without losing its important properties during transmission or reconstruction independently from the speaker and speech signals itself. However, some losses inevitably occur in every compression process. Increasing the compression ratio results in increased losses. Speech enhancement algorithms may be used to enhance strongly compressed speech signals for better intelligibility and quality. The purpose of this study is to enhance speech with healing algorithms that compress speech signals while reducing background noise. The SYMPES [1][2][4] algorithm used in this study compresses data resulting in lesser loss than other known compression algorithms. As a result of the compression, noise occurs in the background. The type of the noise cannot be classified. Attempts have been made to reduce these background noises (distortions) by using di_erent methods of speech enhancement algorithms. More than ten speech enhancement algorithms have been investigated and implemented. Two algorithms with the best-enhanced sound output were determined and compared. One of them, Spectral Subtraction Algorithm, was applied via a geometric approach, which was investigated in 2008 by Yang Lu and Philipos C. Loizou [3].In this algorithm, a noise spectrum is subtracted from the noisy speech signal and then a clean signal spectrum is obtained. Moreover, in the absence of the signal, the noise spectrum can be updated and predicted. This approach expressed that the noise spectrum is not signi_cantly di_erent between update periods and is a noisy cum stationary or slowly changing process. Forward and inverse Fourier transforms are used in the algorithm; hence, the algorithm is quite simple. However, the simple subtraction algorithm is a costly operation. Subtraction must be done with extreme caution to avoid any speech distortion. If too many subtractions are made, some speech information may be removed from the center; if too little is subtracted, it can be observed that a clear majority of the intervening noises are still present. The other speech enhancement method is a statistical model based algorithm. This statistical speech enhancement method involves predicting the statistic of a clean and noisy signal for a sample. In other words, if a speech signal is distorted with a statistically independent noise, the marginal probability distributions of the clean speech and noise signal must be clearly known. In this model-based statistical method, signal and noise statistics are estimated primarily from the speech and noise content. An optimal solution is obtained using statistical models and it is then used in conjunction with distortion measures to solve the existing speech enhancement problem. In this approach, di_erent techniques have been applied to parameterize speech signals such as autoregressive moving average (ARMA), autoregressive (AR), or moving average (MA). Three prediction rules known as the maximum probability (ML), maximum posterior (MAP), and minimum mean square error (MMSE) are used in this approach and have many desirable features to estimate the parameters of the speech signal. ML is used for the maintenance of non-random parameters. The estimation methods MAP and MMSE are used for known parameters of the previously known density function, which can be examined in advance as a random variable. For the speech signal, this model uses the MAP estimation approach, assuming a time-varying AR model for speech enhancement in which both the model and signal are estimated from the noisy signal. However, since the waveform of the speech signal is distorted as a result of the signal improvement, the SNR results are not found very healthy. Therefore, the results are evaluated by the Mean Opinion Score (MOS) test. A subjective test based on MOS is also carried out on some selected utterances. The results of the subjective test are also compared with those of the objective test to determine the most appropriate objective measure for the evaluation of speech enhancement algorithms. The strengths and weaknesses of the various algorithms are analyzed and compared. Quality has been shown in detailed graphs that can be measured and smoothed using the MOS, which de_nes the quality of speech by a listener on a scale of 1 to 5.
Prosodic, morphological and lexical feature extraction of Turkish broadcast news data
(Işık Üniversitesi, 2014-06-05) Revidi, İzel D.; Güz, Ümit; Işık Üniversitesi, Fen Bilimleri Enstitüsü, Elektronik Mühendisliği Yüksek Lisans Programı
Sentence segmentation from speech is part of a process that aims at enriching the unstructured stream of words that are the output of standard speech recognizers. Its role is to find the sentence units in this stream of words. Sentence segmentation is a preliminary step toward speech understanding. Once the sentence boundaries are detected, further syntactic and/or semantic analysis can be performed on these sentences. Usually, speech recognizer output lacks the textual cues to these entities (such as headers, paragraphs, sentence punctuation, and capitalization). However, speech provides extra non-lexical cues, related to features like pitch, energy, pause and word durations as prosodic features; verb, noun or adjective as a morphological features and also lexical features. These prosodic, morphological and lexical features are provides a complementary information for segmentation of speech into sentences. Our goal is examine feature the extraction and use of prosodic information which has been done in previous works, in addition to lexical features and morphological for spoken language processing of Turkish with open source tools.

Filtreler

Yazar

Konu

Tarih

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları