Arama Sonuçları

Listeleniyor 1 - 10 / 11
  • Yayın
    İngilizce-Türkçe istatistiksel makine çevirisinde biçimbilim kullanımı
    (IEEE, 2012-04-18) Görgün, Onur; Yıldız, Olcay Taner
    Bu çalışmada, İngilizce-Türkçe dil ikilisi için biçimbilimsel çözümleme yardımı ile SIU dermecesi üzerinde istatistiksel makine çevirisi denemeleri yapılmıştır. Kelime biçimlerinin baz alındığı çeviri denemeleri İngilizce-Türkçe dil ikilisi gibi biçimbilimsel ve çekimsel olarak birbirinden uzak diller için düşük performans göstermektedir. Bu durumda, çeviri temel birimi olarak kelime formlarının yerine alt-sözcüksel temsiller kullanmak, makine çevirisi performansını önemli ölçüde arttırmaktadır.
  • Yayın
    Model adaptation for dialog act tagging
    (IEEE, 2006) Tür, Gökhan; Güz, Ümit; Hakkani Tür, Dilek
    In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.
  • Yayın
    Unsupervised morphological analysis using tries
    (Springer London, 2012) Ak, Koray; Yıldız, Olcay Taner
    This article presents an unsupervised morphological analysis algorithm to segment words into roots and affixes. The algorithm relies on word occurrences in a given dataset. Target languages are English, Finnish, and Turkish, but the algorithm can be used to segment any word from any language given the wordlists acquired from a corpus consisting of words and word occurrences. In each iteration, the algorithm divides words with respect to occurrences and constructs a new trie for the remaining affixes. Preliminary experimental results on three languages show that our novel algorithm performs better than most of the previous algorithms.
  • Yayın
    ISIKUN at the FinCausal 2020: Linguistically informed machine-learning approach for causality identification in financial documents
    (Association for Computational Linguistics (ACL), 2020) Özenir, Hüseyin Gökberk; Karadeniz, İlknur
    This paper presents our participation to the FinCausal-2020 Shared Task whose ultimate aim is to extract cause-effect relations from a given financial text. Our participation includes two systems for the two sub-tasks of the FinCausal-2020 Shared Task. The first sub-task (Task-1) consists of the binary classification of the given sentences as causal meaningful (1) or causal meaningless (0). Our approach for the Task-1 includes applying linear support vector machines after transforming the input sentences into vector representations using term frequency-inverse document frequency scheme with 3-grams. The second sub-task (Task-2) consists of the identification of the cause-effect relations in the sentences, which are detected as causal meaningful. Our approach for the Task-2 is a CRF-based model which uses linguistically informed features. For the Task-1, the obtained results show that there is a small difference between the proposed approach based on linear support vector machines (F-score 94%), which requires less time compared to the BERT-based baseline (F-score 95%). For the Task-2, although a minor modifications such as the learning algorithm type and the feature representations are made in the conditional random fields based baseline (F-score 52%), we have obtained better results (F-score 60%). The source codes for the both tasks are available online (https://github.com/ozenirgokberk/FinCausal2020.git/).
  • Yayın
    A novel approach to morphological disambiguation for Turkish
    (Springer-Verlag, 2012) Görgün, Onur; Yıldız, Olcay Taner
    In this paper, we propose a classification based approach to the morphological disambiguation for Turkish language. Due to complex morphology in Turkish, any word can get unlimited number of affixes resulting very large tag sets. The problem is defined as choosing one of parses of a word not taking the existing root word into consideration. We trained our model with well-known classifiers using WEKA toolkit and tested on a common test set. The best performance achieved is 95.61% by J48 Tree classifier.
  • Yayın
    Chunking in Turkish with conditional random fields
    (Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, Onur
    In this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.
  • Yayın
    Left/right and front/back in sign, speech, and co-speech gestures: what do data from Turkish sign language, croatian sign language, American sign language, Turkish, Croatian, and English reveal?
    (Versita, 2011-09) Arık, Engin
    Research has shown that spoken languages differ from each other in their representation of space. Using hands, body, and physical space in front of signers to represent space, do sign languages differ from each other? To what extent are they similar to spoken languages in their expressions of spatial relations? The present study targeted these questions by exploring the descriptions of static situations in sign languages (Turkish Sign Language, Croatian Sign Language, American Sign Language) and spoken languages, including co-speech gestures (Turkish, Croatian, and English). It is found that signed and spoken languages differ from each other in their linguistic constructions for the left/right and front/back spatial relation. They also differ from one another in their mapping strategies. Crucially, being a signer does not require more direct iconic mappings than a speaker would use. It is also found that co-speech gestures can complement spoken language descriptions.
  • Yayın
    The expressions of spatial relations during interaction in American sign language, Croatian sign language, and Turkish sign language
    (Versita, 2012-11) Arik, Engin
    Signers use their body and the space in front of them iconically. Does iconicity lead to the same mapping strategies in construing space during interaction across sign languages? The present study addressed this question by conducting an experimental study on basic static and motion event descriptions during interaction (describer input and addressee re-signing/retelling) in American Sign Language, Croatian Sign Language, and Turkish Sign Language. I found that the three sign languages are similar in using classifier predicates of location, orientation, and movement, predominantly employing an egocentric (viewer) perspective but also a non-egocentric perspective, and using similar mapping strategies regardless of interlocutor positions. However, these three sign languages differ from each other in the effects of location and orientation of the objects in pictures and movies, the descriptions of picture (states) vs. movie (motion events), and describer input vs. addressee retellings in their mapping strategies. This study contributes to our knowledge of how the expressions of spatial relations are conveyed in natural human language.
  • Yayın
    Jacques Derrida ve Ludwig Wittgenstein’in dilkuramları bağlamında Jaume Plensa, eserleri ve aracı dil arayışı
    (Sibel Kılıç, 2017-12-31) Tatlıcı, Gizem
    Dil ile dünya kavrayışı arasındaki bağ, sözsel ve görsel olmak üzere görünürde farklı olan iki ifade biçimiyle algılanmaktadır. Bu iki ifade biçimi de gerek edebiyatta gerek görsel sanatlarda kullanılan kelimelere dayanmaktadır. Kelimeler sesli olduklarında çoklu yorumlar içerirler. Buna karşılık görsel dilin dili yoktur; yani kelimeler sadece sesli olarak düşünülürse, görsel olarak böyle bir dilden değil, renkler ve biçimler aracılığıyla kendini ortaya koyan ve böyle olunca da sözsel dil ile görsel dil arasında bağlantı kuran aracı dilden bahsedilebilir. Bu aracı dil, bir dilin en yetkin biçimiyle karşılık bulduğu edebiyatın kullandığı kelime ve harfleri-sembolleri kullanarak görsel dille plastic dil arasında bir bağlantı kurmaya çalışır. Görsel sanatlarda ortaya konulan eserlerin, aracı dil ile yeniden okunduğu takdirde henüz çözülmemiş birtakım şifreler içerdiği keşfedilecektir. Bu şifreler çözüldüğünde sanatçının anlatmak istediği düşünce, her ne kadar görünürde anlaşılması zor olsa da, aracı dil vasıtasıyla tekrar yorumlandığında yeni ve aslında gizli, bir ifade biçimi kazandığı ortaya çıkacaktır. Başka bir deyişle, görsel sanatlarda soyutlama seviyesi arttıkça anlatımcının dilden uzaklaştığı zannedilen eserlerinin, aracı dil vasıtasıyla yeni bir ifade biçimi kazandığı söylenebilir. Bu makalede, Jacques Derrida ve Ludwig Wittgenstein’ın dile yaklaşımları karşılaştırılarak, bu aracı dilin ortaya çıkarılmasına çalışılacaktır. Ayrıca yapısökümünün sağladığı geniş imkanlar çerçevesinde Wittgensteincı dil anlayışının soyut eserleri yorumlama ve alımlamada daha etkili olduğu gösterilmeye çalışılacaktır. Aracı dilin bir sanat eserini alımlamada ve yorumlamada yeni imkanlar sunması ve aynı zamanda yapısökümcü kuramın aracı dili ortaya çıkarmada en önemli araç olduğu gösterilecektir. Bu iddianın bir uygulaması olarak, günümüz heykel sanatının önemli isimlerinden Jaume Plensa’nın eserleri bu aracı dil vasıtasıyla yeniden yorumlanarak farklı bir yöntemin gerekliliği ortaya konacaktır.
  • Yayın
    Sarcasm detection on news headlines using transformers
    (Springer, 2025-09-07) Gümüşçekiçci, Gizem; Dehkharghani, Rahim
    Sarcasm poses a linguistic challenge due to its figurative nature, where intended meaning contradicts literal interpretation. Sarcasm is prevalent in human communication, affecting interactions in literature, social media, news, e-commerce, etc. Identifying the true intent behind sarcasm is challenging but essential for applications in sentiment analysis. Detecting sarcasm in written text, as a challenging task, has attracted many researchers in recent years. This paper attempts to detect sarcasm in news headlines. Journalists prefer using sarcastic news headlines as they seem much more interesting to the readers. In the proposed methodology, we experimented with Transformers, namely the BERT model, and several Machine and Deep Learning models with different word and sentence embedding methods. The proposed approach inherently requires high-performance resources due to the use of large-scale pre-trained language models such as BERT. We also extended an existing news headlines dataset for sarcasm detection using augmentation techniques and annotating it with hand-crafted features. The proposed methodology could outperform almost all existing sarcasm detection approaches with a 98.86% F1-score when applied to the extended news headlines dataset, which we made publicly available on GitHub.