Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Arama Sonuçları

Listeleniyor 1 - 10 / 11

Türkçe kelime ağı KeNet için arayüz
(Institute of Electrical and Electronics Engineers Inc., 2019-04) Özçelik, Rıza; Uludoğan, Gökçe; Parlar, Selen; Bakay, Özge; Ergelen, Özlem; Yıldız, Olcay Taner
Kelime ağları, bir dildeki kelimeler arasındaki bağlantıları, eş anlam kümeleri oluşturarak ve bu kümeleri birbirine çeşitli anlamsal bağıntılar ile bağlayarak temsil eden bir çizge veri yapısıdır. Doğal dil işleme alanındaki en yaygın bilinen kelime ağı WordNet 1990 yılında İngilizce için oluşturulmuşken, Türkçe için en kapsamlı ağ, 2018 yılında oluşturulan KeNet’tir. Bildiğimiz kadarıyla, içinde 80000 eş anlam kümesi ve 25 farklı anlamsal bağlantı bulunan KeNet için şu ana kadar geliştirilen bir kullanıcı arayüzü yoktur. Bu çalışmada, KeNet çizgesinde, anlamsal bağlantıları kullanarak eş anlam kümeleri arasında çevrimiçi olarak gezinmeyi sağlayan bir arayüz sunuyoruz. Bu arayüz sayesinde, bir söz öbeği KeNet’te aranabilir ve eş anlam kümeleri arasındaki üst/alt anlam, parça-bütün ilişkileri gibi ilişkiler kullanılarak KeNet üzerinde gezilebilir. Ayrıca, herhangi bir eş anlam kümesinin, varsa, İngilizce karşılığının kimliği de görüntülenebilir ve bu kümeye WordNet’e ait internet sayfasından erişilebilir.
MorAz: An open-source morphological analyzer for Azerbaijani Turkish
(Association for Computational Linguistics (ACL), 2018) Özenç, Berke; Ehsani, Razieh; Solak, Ercan
MorAz is an open-source morphological analyzer for Azerbaijani Turkish. The analyzer is available through both as a website for interactive exploration and as a RESTful web service for integration into a natural language processing pipeline. MorAz implements the morphology of Azerbaijani Turkish following a two-level approach using Helsinki finite-state transducer and wraps the analyzer with python scripts in a Django instance.
Constructing a Turkish constituency parse treeBank
(Springer Verlag, 2016) Yıldız, Olcay Taner; Solak, Ercan; Çandır, Şemsinur; Ehsani, Razieh; Görgün, Onur
In this paper, we describe our initial efforts for creating a Turkish constituency parse treebank by utilizing the English Penn Treebank. We employ a semiautomated approach for annotation. In our previouswork [18], the English parse trees were manually translated to Turkish. In this paper, the words are semi-automatically annotated morphologically. As a second step, a rule-based approach is used for refining the parse trees based on the morphological analyses of the words. We generated Turkish phrase structure trees for 5143 sentences from Penn Treebank that contain fewer than 15 tokens. The annotated corpus can be used in statistical natural language processing studies for developing tools such as constituency parsers and statistical machine translation systems for Turkish.
Chunking in Turkish with conditional random fields
(Springer-Verlag, 2015-04-14) Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh; Görgün, Onur
In this paper, we report our work on chunking in Turkish. We used the data that we generated by manually translating a subset of the Penn Treebank. We exploited the already available tags in the trees to automatically identify and label chunks in their Turkish translations. We used conditional random fields (CRF) to train a model over the annotated data. We report our results on different levels of chunk resolution.
A tree-based approach for English-to-Turkish translation
(Tubitak Scientific & Technical Research Council Turkey, 2019) Bakay, Özge; Avar, Begüm; Yıldız, Olcay Taner
In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67% relative improvement from a baseline 12.8 to 21.4 BLEU, all averaged over 10-fold cross-validation. As future work, improvements in choosing the correct senses and structural rules are needed.
Evaluating the English-Turkish parallel treebank for machine translation
(TÜBİTAK, 2022-01-19) Görgün, Onur; Yıldız, Olcay Taner
This study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator's behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.
Constructing a Turkish-English parallel treebank
(Association for Computational Linguistics (ACL), 2014) Yıldız, Olcay Taner; Solak, Ercan; Görgün, Onur; Ehsani, Razieh
In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation. In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank. English sentences in our set have a maximum of 15 tokens, including punctuation. We constrained the translated trees to the reordering of the children and the replacement of the leaf nodes with appropriate glosses. We also report the tools that we built and used in our tree translation task.
Emlak alanına özgü kelime ağı
(Institute of Electrical and Electronics Engineers Inc., 2019-04) Parlar, Selen; Nas Arıcan, Bilge; Erkek, Mehmet; Çayırlı, Kamil; Yıldız, Olcay Taner
Kelime ağı, anlamlarına göre organize edilmiş kelimeleri barındıran bir veritabanıdır. Bir kelime ağı, sahip olduğu kelimelerin anlamlarını, bilişsel eş anlamlılarını, türlerini, diğer anlamlar ile arasındaki ilişkilerini ve bu anlamların tanımlarını temsil eder. Bu çalışma ile, emlak alanına özgü bir sözlük oluşturmak ve bu yeni sözlüğü kullanarak daha küçük bir kelime ağı tasarlamak yoluyla biçimbilimsel çözümleme ve anlam belirsizliği giderme gibi Doğal Dil İşleme görevlerini kolaylaştıracak bir yöntem öneriyoruz. Ön çalışma olarak, emlak alanına özgü 7,000 kelime içeren bir sözlük ve yaklaşık 11,000 eş anlam kümesinden oluşan bir kelime ağı oluşturuldu ve bunlar çeşitli görevlerle doğrulandı.
Multilingual information retrieval on the Internet: A case study of Turkish users
(Academic Press Ltd- Elsevier Science Ltd, 2005-12) Aytaç, Selenay
This study aims to answer the following research question: What information retrieval problems do Turkish Internet users face by using Turkish on the Internet?The data for this report were gathered by triangulation of three different methods: (1) e-mail questionnaire survey, (2) face-to-face interviews, and (3) participant observation of Turkish speaking respondents, in order to assess the major obstacles of retrieving Turkish language information by using Turkish on the Internet. Although a significant amount of research has been focused on multilingual information retrieval, a review of the literature reveals that this pilot study is the first initiative to draw a picture from the Turkish Internet user's point of view.
Text-to-SQL: a methodical review of challenges and models
(TÜBİTAK, 2024-05-20) Kanburoğlu, Ali Buğra; Tek, Faik Boray
This survey focuses on Text-to-SQL, automated translation of natural language queries into SQL queries. Initially, we describe the problem and its main challenges. Then, by following the PRISMA systematic review methodology, we survey the existing Text-to-SQL review papers in the literature. We apply the same method to extract proposed Text-to-SQL models and classify them with respect to used evaluation metrics and benchmarks. We highlight the accuracies achieved by various models on Text-to-SQL datasets and discuss execution-guided evaluation strategies. We present insights into model training times and implementations of different models. We also explore the availability of Text-to-SQL datasets in non-English languages. Additionally, we focus on large language model (LLM) based approaches for the Text-to-SQL task, where we examine LLM-based studies in the literature and subsequently evaluate the LLMs on the cross-domain Spider dataset. Finally, we conclude with a discussion of future directions for Text-to-SQL research, identifying potential areas of improvement and advancements in this field.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları