Arama Sonuçları

Listeleniyor 1 - 4 / 4
  • Yayın
    Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses
    (Elsevier Ltd, 2022-08-13) Koca, Mehmet Burak; Nourani, Esmaeil; Abbasoğlu, Ferda; Karadeniz, İlknur; Sevilgen, Fatih Erdoğan
    Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of path-ogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3–23% better area under curve (AUC) score than its competitors.
  • Yayın
    BOUN-ISIK participation: an unsupervised approach for the named entity normalization and relation extraction of Bacteria Biotopes
    (Association for Computational Linguistics (ACL), 2019-11-04) Karadeniz, İlknur; Tuna, Ömer Faruk; Özgu, Arzucan
    This paper presents our participation at the Bacteria Biotope Task of the BioNLP Shared Task 2019. Our participation includes two systems for the two subtasks of the Bacteria Biotope Task: the normalization of entities (BB-norm) and the identification of the relations between the entities given a biomedical text (BB-rel). For the normalization of entities, we utilized word embeddings and syntactic re-ranking. For the relation extraction task, pre-defined rules are used. Although both approaches are unsupervised, in the sense that they do not need any labeled data, they achieved promising results. Especially, for the BB-norm task, the results have shown that the proposed method performs as good as deep learning based methods, which require labeled data.
  • Yayın
    TurkEmbed: Turkish embedding model on natural language inference & sentence text similarity tasks
    (Institute of Electrical and Electronics Engineers Inc., 2025) Ezerceli, Özay; Gümüşçekiçci, Gizem; Erkoç, Tuğba; Özenç, Berke
    This paper introduces TurkEmbed, a novel Turkish language embedding model designed to outperform existing models, particularly in Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. Current Turkish embedding models often rely on machine-translated datasets, potentially limiting their accuracy and semantic understanding. TurkEmbed utilizes a combination of diverse datasets and advanced training techniques, including matryoshka representation learning, to achieve more robust and accurate embeddings. This approach enables the model to adapt to various resource-constrained environments, offering faster encoding capabilities. Our evaluation on the Turkish STS-b-TR dataset, using Pearson and Spearman correlation metrics, demonstrates significant improvements in semantic similarity tasks. Furthermore, TurkEmbed surpasses the current state-of-the-art model, Emrecan, on All-NLI-TR and STS-b-TR benchmarks, achieving a 1-4% improvement. TurkEmbed promises to enhance the Turkish NLP ecosystem by providing a more nuanced understanding of language and facilitating advancements in downstream applications.
  • Yayın
    TurkEmbed4Retrieval: Türkçe için geri getirme görevine özel gömme modeli
    (Institute of Electrical and Electronics Engineers Inc., 2025-08-15) Ezerceli, Özay; Gümüşçekiçci, Gizem; Erkoç, Tuğba; Özenç, Berke
    Bu çalışmada, öncelikle Doğal Dil Çıkarımı (DDÇ) ve Anlamsal Metin Benzerliği (AMB) görevleri için geliştirilen TurkEmbed modelinin, MS-Marco-TR veri seti üzerinde ince ayar yapılarak geri getirme görevlerine uygun hale getirilmesini sağlayan TurkEmbed4Retrieval modelini tanıtıyoruz. Model, Matruşka temsili ögrenme ve özel tasarlanmış negatif çiftlerin sıralanması kayıp fonksiyonu gibi ileri seviye egitim teknikleri kullanılarak optimize edilmiştir. Yapılan kapsamlı deneyler, TurkEmbed4Retrieval’ın, geri getirme metriklerinde TurkishcolBERT modelini Scifact-TR veri kümesinde %19–26 oranında geçtiğini göstermektedir. Bu bağlamda, modelimiz, Türkçe bilgi getirme sistemleri için yeni bir çıtaya ulaşmaktadır.