Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Arama Sonuçları

Listeleniyor 1 - 8 / 8

A new approach for named entity recognition
(IEEE, 2017) Ertopçu, Burak; Kanburoğlu, Ali Buğra; Topsakal, Ozan; Açıkgöz, Onur; Gürkan, Ali Tunca; Özenç, Berke; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
Many sentences create certain impressions on people. These impressions help the reader to have an insight about the sentence via some entities. In NLP, this process corresponds to Named Entity Recognition (NER). NLP algorithms can trace a lot of entities in the sentence like person, location, date, time or money. One of the major problems in these operations are confusions about whether the word denotes the name of a person, a location or an organisation, or whether an integer stands for a date, time or money. In this study, we design a new model for NER algorithms. We train this model in our predefined dataset and compare the results with other models. In the end we get considerable outcomes in a dataset containing 1400 sentences.
AnlamVer: Semantic model evaluation dataset for Turkish - word similarity and relatedness
(Association for Computational Linguistics (ACL), 2018-08-26) Ercan, Gökhan; Yıldız, Olcay Taner
In this paper, we present AnlamVer, which is a semantic model evaluation dataset for Turkish designed to evaluate word similarity and word relatedness tasks while discriminating those two relations from each other. Our dataset consists of 500 word-pairs annotated by 12 human subjects, and each pair has two distinct scores for similarity and relatedness. Word-pairs are selected to enable the evaluation of distributional semantic models by multiple attributes of words and word-pair relations such as frequency, morphology, concreteness and relation types (e.g., synonymy, antonymy). Our aim is to provide insights to semantic model researchers by evaluating models in multiple attributes. We balance dataset word-pairs by their frequencies to evaluate the robustness of semantic models concerning out-of-vocabulary and rare words problems, which are caused by the rich derivational and inflectional morphology of the Turkish language.
Shallow parsing in Turkish
(IEEE, 2017) Topsakal, Ozan; Açıkgöz, Onur; Gürkan, Ali Tunca; Kanburoğlu, Ali Buğra; Ertopçu, Burak; Özenç, Berke; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
In this study, shallow parsing is applied on Turkish sentences. These sentences are used to train and test the per-formances of various learning algorithms with various features specified for shallow parsing in Turkish.
All-words word sense disambiguation for Turkish
(IEEE, 2017) Açıkgöz, Onur; Gürkan, Ali Tunca; Ertopçu, Burak; Topsakal, Ozan; Özenç, Berke; Kanburoğlu, Ali Buğra; Çam, İlker; Avar, Begüm; Ercan, Gökhan; Yıldız, Olcay Taner
Identifying the sense of a word within a context is a challenging problem and has many applications in natural language processing. This assignment problem is called word sense disambiguation(WSD). Many papers in the literature focus on English language and data. Our dataset consists of 1400 sentences translated to Turkish from the Penn Treebank Corpus. This paper seeks to address and discuss 6 different feature extraction methods and its classification performances using C4.5, Random Forests, Rocchio, Naive Bayes, KNN, Linear and multilayer Perceptron. This paper calls into question how the described features perform on a morphologically rich language (Turkish) with several classifiers.
A multilayer annotated corpus for Turkish
(IEEE, 2018-06-06) Yıldız, Olcay Taner; Ak, Koray; Ercan, Gökhan; Topsakal, Ozan; Asmazoğlu, Cengiz
In this paper, we present the first multilayer annotated corpus for Turkish, which is a low-resourced agglutinative language. Our dataset consists of 9,600 sentences translated from the Penn Treebank Corpus. Annotated layers contain syntactic and semantic information including morphological disambiguation of words, named entity annotation, shallow parse, sense annotation, and semantic role label annotation.
An open, extendible, and fast Turkish morphological analyzer
(Incoma Ltd, 2019-09) Yıldız, Olcay Taner; Avar, Begüm; Ercan, Gökhan
In this paper, we present a two-level morphological analyzer for Turkish which consists of five main components: finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with a comprehensive lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer is amongst the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.
Türkçe anlamsal söylem ve cümle benzerliği analizleri için veri kümesi oluşturma yöntemi
(IEEE, 2018-12-06) Ercan, Gökhan; Erkek, Orçun; Açıkgöz, Onur; Özçelik, Rıza; Parlar, Selen; Yıldız, Olcay Taner
Çalışmamızın amacı Türkçe için paragraf-cümle düzeyinde anlamsal söylem analizi ve paragraf-cümle ve cümle-cümle düzeyinde metinsel benzerlik ölçümlemesi için bir veri kümesi hazırlamaktır.
Morpholex Turkish: a morphological Lexicon for Turkish
(European Language Resources Association (ELRA), 2022-06-25) Arıcan, Bilge Nas; Kuzgun, Aslı; Marşan, Büşra; Aslan, Deniz Baran; Sanıyar, Ezgi; Cesur, Neslihan; Kara, Neslihan; Kuyrukçu, Oğuzhan; Özçelik, Merve; Yenice, Arife Betül; Doğan, Merve; Oksal, Ceren; Ercan, Gökhan; Yıldız, Olcay Taner
MorphoLex is a study in which root, prefix and suffixes of words are analyzed. With MorphoLex, many words can be analyzed according to certain rules and a useful database can be created. Due to the fact that Turkish is an agglutinative language and the richness of its language structure, it offers different analyzes and results from previous studies in MorphoLex. In this study, we revealed the process of creating a database with 48,472 words and the results of the differences in language structure.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları