MF - Bildiri Koleksiyonu | Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering

Bu koleksiyon için kalıcı URI

https://hdl.handle.net/11729/62

Listeleniyor 1 - 20 / 113

AnlamVer: Semantic model evaluation dataset for Turkish - word similarity and relatedness
(Association for Computational Linguistics (ACL), 2018-08-26) Ercan, Gökhan; Yıldız, Olcay Taner
In this paper, we present AnlamVer, which is a semantic model evaluation dataset for Turkish designed to evaluate word similarity and word relatedness tasks while discriminating those two relations from each other. Our dataset consists of 500 word-pairs annotated by 12 human subjects, and each pair has two distinct scores for similarity and relatedness. Word-pairs are selected to enable the evaluation of distributional semantic models by multiple attributes of words and word-pair relations such as frequency, morphology, concreteness and relation types (e.g., synonymy, antonymy). Our aim is to provide insights to semantic model researchers by evaluating models in multiple attributes. We balance dataset word-pairs by their frequencies to evaluate the robustness of semantic models concerning out-of-vocabulary and rare words problems, which are caused by the rich derivational and inflectional morphology of the Turkish language.
BOUN-ISIK participation: an unsupervised approach for the named entity normalization and relation extraction of Bacteria Biotopes
(Association for Computational Linguistics (ACL), 2019-11-04) Karadeniz, İlknur; Tuna, Ömer Faruk; Özgu, Arzucan
This paper presents our participation at the Bacteria Biotope Task of the BioNLP Shared Task 2019. Our participation includes two systems for the two subtasks of the Bacteria Biotope Task: the normalization of entities (BB-norm) and the identification of the relations between the entities given a biomedical text (BB-rel). For the normalization of entities, we utilized word embeddings and syntactic re-ranking. For the relation extraction task, pre-defined rules are used. Although both approaches are unsupervised, in the sense that they do not need any labeled data, they achieved promising results. Especially, for the BB-norm task, the results have shown that the proposed method performs as good as deep learning based methods, which require labeled data.
Categorization of the models based on structural information extraction and machine learning
(Springer Science and Business Media Deutschland GmbH, 2022-07-21) Khalilipour, Alireza; Bozyiğit, Fatma; Utku, Can; Challenger, Moharram
As various engineering fields increasingly use modelling techniques, the number of provided models, their size, and their structural complexity increase. This makes model management, including finding these models, with state of the art very expensive computationally, i.e., leads to non-tractable graph comparison algorithms. To handle this problem, modelers can organize available models to be reused and overcome the development of the new and more complex models with less cost and effort. Therefore, we utilized a model classification using baseline machine learning approaches on a dataset including 555 Ecore metamodels. In our proposed system, the structural information of each model was summarized in its elements through generating their simple labelled graphs. The proposed solution is to transform the complex attributed graphs of the models to simply labelled graphs so that graph analysis algorithms can be applied to them. The labelled graphs (models) were structurally compared using graph comparison techniques such as graph kernels, and the results were used as a set of features for similarity search. After generating feature vectors, the performance of six machine learning classifiers (Naïve Bayes (NB), k Nearest Neighbors (kNN), Support Vector Machine (SVM), Random Forest (RF), and Artificial Neural Network (ANN) were evaluated on the feature vectors. The presented model yields promising results for the model classification task with a classification accuracy over 87%.
Convolutional attention network for MRI-based Alzheimer's disease classification and its interpretability analysis
(IEEE, 2021-09-17) Türkan, Yasemin; Tek, Faik Boray
Neuroimaging techniques, such as Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET), help to identify Alzheimer's disease (AD). These techniques generate large-scale, high-dimensional, multimodal neuroimaging data, which is time-consuming and difficult to interpret and classify. Therefore, interest in deep learning approaches for the classification of 3D structural MRI brain scans has grown rapidly. In this research study, we improved the 3D VGG model proposed by Korolev et al. [2]. We increased the filters in the 3D convolutional layers and then added an attention mechanism for better classification. We compared the performance of the proposed approaches for the classification of Alzheimer's disease versus mild cognitive impairments and normal cohorts on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. We observed that both the accuracy and area under curve results improved with the proposed models. However, deep neural networks are black boxes that produce predictions that require further explanation for medical usage. We compared the 3D-data interpretation capabilities of the proposed models using four different interpretability methods: Occlusion, 3D Ultrametric Contour Map, 3D Gradient-Weighted Class Activation Mapping, and SHapley Additive explanations (SHAP). We observed that explanation results differed in different network models and data classes.
Tweet sentiment analysis for cryptocurrencies
(IEEE, 2021-10-13) Şaşmaz, Emre; Tek, Faik Boray
Many traders believe in and use Twitter tweets to guide their daily cryptocurrency trading. In this project, we investigated the feasibility of automated sentiment analysis for cryptocurrencies. For the study, we targeted one cryptocurrency (NEO) altcoin and collected related data. The data collection and cleaning were essential components of the study. First, the last five years of daily tweets with NEO hashtags were obtained from Twitter. The collected tweets were then filtered to contain or mention only NEO. We manually tagged a subset of the tweets with positive, negative, and neutral sentiment labels. We trained and tested a Random Forest classifier on the labeled data where the test set accuracy reached 77%. In the second phase of the study, we investigated whether the daily sentiment of the tweets was correlated with the NEO price. We found positive correlations between the number of tweets and the daily prices, and between the prices of different crypto coins. We share the data publicly.
ISIKUN at the FinCausal 2020: Linguistically informed machine-learning approach for causality identification in financial documents
(Association for Computational Linguistics (ACL), 2020) Özenir, Hüseyin Gökberk; Karadeniz, İlknur
This paper presents our participation to the FinCausal-2020 Shared Task whose ultimate aim is to extract cause-effect relations from a given financial text. Our participation includes two systems for the two sub-tasks of the FinCausal-2020 Shared Task. The first sub-task (Task-1) consists of the binary classification of the given sentences as causal meaningful (1) or causal meaningless (0). Our approach for the Task-1 includes applying linear support vector machines after transforming the input sentences into vector representations using term frequency-inverse document frequency scheme with 3-grams. The second sub-task (Task-2) consists of the identification of the cause-effect relations in the sentences, which are detected as causal meaningful. Our approach for the Task-2 is a CRF-based model which uses linguistically informed features. For the Task-1, the obtained results show that there is a small difference between the proposed approach based on linear support vector machines (F-score 94%), which requires less time compared to the BERT-based baseline (F-score 95%). For the Task-2, although a minor modifications such as the learning algorithm type and the feature representations are made in the conditional random fields based baseline (F-score 52%), we have obtained better results (F-score 60%). The source codes for the both tasks are available online (https://github.com/ozenirgokberk/FinCausal2020.git/).
A FST description of noun and verb morphology of Azarbaijani Turkish
(Association for Computational Linguistics (ACL), 2021) Ehsani, Razieh; Özenç, Berke; Solak, Ercan; Drewes F.
We give a FST description of nominal and finite verb morphology of Azarbaijani Turkish. We use a hybrid approach where nominal inflection is expressed as a slot-based paradigm and major parts of verb inflection are expressed as optional paths on the FST. We collapse adjective and noun categories in a single nominal category as they behave similarly as far as their paradigms are concerned. Thus, we defer a more precise identification of POS to further down the NLP pipeline.
Çizge evrişim ağı kullanarak patojen-konak ağlarında protein etkileşim tahmini
(IEEE, 2021-06-09) Koca, Mehmet Burak; Karadeniz, İlknur; Nourani, Esmaeil; Sevilgen, Fatih Erdoğan
Proteinler yaşamsal faaliyetlerin gerçekleşmesinde kritik rol oynayan biyolojik moleküllerdir. Konak canlı proteinleri ile patojen proteinleri arasındaki etkileşimler patojenkonak etkileşim (PHI) ağlarını oluşturmaktadır. Bu iki parçalı etkileşim ağları patojenin hangi yaşamsal faaliyetleri etkilediğini belirlemede ve dolayısıyla sebep olabileceği hastalıkların tespitinde büyük öneme sahiptir. Proteinler arası etkileşimlerin laboratuvar ortamında tespiti hem zaman alıcı hem de maliyetlidir. Deneysel olarak saptanabilen etkileşim sayısının kısıtlı olması ve bazı etkileşimlerin gözden kaçması hesaplamalı tahmin yöntemlerinin geliştirilmesine önayak olmaktadır. Bu çalışmada PHI ağlarında protein etkileşim tahmini yapmayı sağlayan çizge evrişim ağı (GCN) tabanlı bir yöntem sunulmaktadır. Gözetimsiz olarak eğitilen GCN modeli (GraphSAGE) topolojik bilginin yanı sıra temel öznitelik olarak amino asit dizilimlerini kullanmaktadır. Bu çalışma bildiğimiz kadarıyla PHI ağlarında GCN tabanlı etkileşim tahmini sağlayan ilk çalışmadır. Deneysel sonuçlar geliştirilen modelin kıyaslama için kullanılan PHI veri seti üzerinde yüksek performanslı algoritmalardan %10 daha iyi performans göstererek %96 oranında doğrulukla etkileşim tahmini yaptığını göstermektedir.
Hierarchical b-Matching
(Springer Science and Business Media Deutschland GmbH, 2021) Emek, Yuval; Kutten, Shay; Shalom, Mordechai; Zaks, Shmuel
A matching of a graph is a subset of edges no two of which share a common vertex, and a maximum matching is a matching of maximum cardinality. In a b-matching every vertex v has an associated bound bv, and a maximum b-matching is a maximum set of edges, such that every vertex v appears in at most bv of them. We study an extension of this problem, termed Hierarchical b-Matching. In this extension, the vertices are arranged in a hierarchical manner. At the first level the vertices are partitioned into disjoint subsets, with a given bound for each subset. At the second level the set of these subsets is again partitioned into disjoint subsets, with a given bound for each subset, and so on. We seek for a maximum set of edges, that obey all bounds (that is, no vertex v participates in more than bv edges, then all the vertices in one subset do not participate in more that subset’s bound of edges, and so on hierarchically). This is a sub-problem of the matroid matching problem which is NP -hard in general. It corresponds to the special case where the matroid is restricted to be laminar and the weights are unity. A pseudo-polynomial algorithm for the weighted laminar matroid matching problem is presented in [8]. We propose a polynomial-time algorithm for Hierarchical b-matching, i.e. the unweighted laminar matroid matching problem, and discuss how our techniques can possibly be generalized to the weighted case.
Uyarlanır yerel bağlı katman kullanan dikkat tabanlı derin ağ ile sesli komut tanıma
(Institute of Electrical and Electronics Engineers Inc., 2020-10-05) Turkan, Yasemin; Tek, Faik Boray
Sesli komut tanıma insan-makine ara yüzüyle ilişkili aktif bir araştırma konusudur. Dikkat tabanlı derin ağlar ile bu tür problemler başarılı bir şekilde çözülebilmektedir. Bu çalışmada, var olan bir dikkat tabanlı derin ağ yöntemi, uyarlanır yerel bağlı (odaklanan) katman kullanılarak daha da geliştirilmiştir. Orijinal yönteminde sınandığı Google ve Kaggle sesli komut veri setlerinde karşılaştırmalı olarak yapılan deneylerde önerdiğimiz uyarlanır yerel bağlı katman kullanan dikkat tabanlı ağın tanıma doğruluğunu %2.6 oranında iyileştirdiği gözlemledik.
On building the largest and cross-linguistic Turkish dependency corpus
(Institute of Electrical and Electronics Engineers Inc., 2020-10-15) Kuzgun, Aslı; Cesur, Neslihan; Arıcan, Bilge Nas; Özçelik, Merve; Marşan, Büşra; Kara, Neslihan; Aslan, Deniz Baran; Yıldız, Olcay Taner
In this paper, we aim to introduce the dependency annotation process of the largest and the only cross-linguistic Turkish dependency treebank which was translated from the original Penn Treebank corpus. Within the scope of this project, 16.400 sentences have been morphologically and semantically annotated, and the dependency relations were manually carried out by a team of linguists. It is hoped that this project will serve as a base for a successful dependency parser and a system which can automatically perform the bi-directional conversion between constituency and dependency trees.
Creating a syntactically felicitous constituency treebank for Turkish
(Institute of Electrical and Electronics Engineers Inc., 2020-10-15) Kara, Neslihan; Marşan, Büşra; Özçelik, Merve; Arıcan, Bilge Nas; Kuzgun, Aslı; Cesur, Neslihan; Aslan, Deniz Baran; Yıldız, Olcay Taner
In this study, Bakay et. al [1] and Yildiz et. al.'s [2] work on Turkish constituency treebanks were developed further. Compared to the previous work, the most prominent feature of this study is the fact that every annotation and refinement process is held manually. In addition, constituency treebank created as a result of this study abides by the syntactic rules and typologic features of Turkish while the trees created by previous studies convey only the translated and simply inverted trees that completely ignore the syntactic properties of Turkish. The methodology followed in this study resulted in a significantly more accurate representation of Turkish language and simpler, relatively flatter trees. The straightforward style of trees in this study reduces the complexity and offers a better training dataset for learning algorithms.
Visual modeling of Turkish morphology
(European Language Resources Association (ELRA), 2020-05-16) Özenç, Berke; Solak, Ercan
In this paper, we describe the steps in a visual modeling of Turkish morphology using diagramming tools. We aimed to make modeling easier and more maintainable while automating much of the code generation. We released the resulting analyzer, MorTur, and the diagram conversion tool, DiaMor as free, open-source utilities. MorTur analyzer is also publicly available on its web page as a web service. MorTur and DiaMor are part of our ongoing efforts in building a set of natural language processing tools for Turkic languages under a consistent framework.
TRopBank: Turkish PropBank V2.0
(European Language Resources Association (ELRA), 2020-05-16) Kara, Neslihan; Aslan, Deniz Baran; Marşan, Büşra; Bakay, Özge; Ak, Koray; Yıldız, Olcay Taner
In this paper, we present and explain TRopBank “Turkish PropBank v2.0”. PropBank is a hand-annotated corpus of propositions which is used to obtain the predicate-argument information of a language. Predicate-argument information of a language can help understand semantic roles of arguments. “Turkish PropBank v2.0”, unlike PropBank v1.0, has a much more extensive list of Turkish verbs, with 17.673 verbs in total.
A hybrid approach to dynamic enterprise data platform
(Institute of Electrical and Electronics Engineers Inc., 2019-12-12) Sezgin, Mehmet Selman; Bayrak, Ahmet Tuğrul; Yıldız, Olcay Taner
Today, corporations aim to make maximum use of the data produced in business applications. One of the most important goals is to convert the data to the commercial benefit in the fastest way. For this purpose, it is critical to receive the data from source systems, process this data and use it as a support for business decisions. There are many approaches to the proceeding of acquiring, processing and making the data useful. In this study, we took advantage of most of the existing approaches and produced a hybrid solution. This solution can be integrated with new data sources very quickly and reduces the amount of time for data integration, preprocessing, deduplication and entity mapping by using open source software components.
Forecasting and analysis of domestic solid waste generation in districts of istanbul with support vector regression
(Institute of Electrical and Electronics Engineers Inc., 2020-10-12) Özçelik, Şuayb Talha; Tek, Faik Boray
Waste planning is essential for large and developing cities such as Istanbul. In this report, we perform data analysis on "Waste Amount Based on District, Year and Waste Type"dataset shared by Istanbul Metropolitan Municipality. After analyzing the waste of the districts, we used support vector regression (SVR) to forecast the waste amounts for the coming years. The analysis has shown an overall increasing trend in the waste generation, although it dropped in 2019. The SVR predicts that the most waste generating district will be Küçükçekmece in the coming years.
Web service translating content into Turkish sign language
(Institute of Electrical and Electronics Engineers Inc., 2020-10-12) Gümüşçekiçci, Gizem; Ezerceli, Özay; Tek, Faik Boray
The essential communication tool for people with hearing loss is sign language. It is way more efficient for their communication. Existing systems for translating the text into sign language are offline and not practical. In this study, we propose a web service-based solution for online translation of content into Turkish Sign Language. We implemented the system and tested it using 32 sentences of 189 words as inputs. The correct word translation rate was 81.74% for the media or audio inputs and the correct word translation for the text inputs was 81.09% The results show the feasibility of the solution and the potential for improvements.
An open, extendible, and fast Turkish morphological analyzer
(Incoma Ltd, 2019-09) Yıldız, Olcay Taner; Avar, Begüm; Ercan, Gökhan
In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.
English-Turkish parallel semantic annotation of Penn-Treebank
(Oficyna Wydawnicza Politechniki Wroclawskiej, 2020) Arıcan, Bilge Nas; Bakay, Özge; Avar, Begüm; Yıldız, Olcay Taner; Ergelen, Özlem
This paper reports our efforts in constructing a sense-labeled English-Turkish parallel corpus using the traditional method of manual tagging. We tagged a pre-built parallel treebank which was translated from the Penn Treebank corpus. This approach allowed us to generate a resource combining syntactic and semantic information. We provide statistics about the corpus itself as well as information regarding its development process.
Comparing sense categorization between English propbank and english wordnet
(Oficyna Wydawnicza Politechniki Wroclawskiej, 2020) Bakay, Özge; Avar, Begüm; Yıldız, Olcay Taner
Given the fact that verbs play a crucial role in language comprehension, this paper presents a study which compares the verb senses in English PropBank with the ones in English WordNet through manual tagging. After analyzing 1554 senses in 1453 distinct verbs, we have found out that while the majority of the senses in PropBank have their one-to-one correspondents in WordNet, a substantial amount of them are differentiated. Furthermore, by analysing the differences between our manually-tagged and an automatically-tagged resource, we claim that manual tagging can help provide better results in sense annotation.

Güncel Gönderiler