Suicidal ideation detection from social media
dc.authorid | 0000-0002-7877-7528 | |
dc.contributor.advisor | Dehkharghani, Rahim | en_US |
dc.contributor.author | Ezerceli, Özay | en_US |
dc.contributor.other | Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı | en_US |
dc.date.accessioned | 2023-09-11T09:47:45Z | |
dc.date.available | 2023-09-11T09:47:45Z | |
dc.date.issued | 2023-08-24 | |
dc.department | Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı | en_US |
dc.description | Text in English ; Abstract: English and Turkish | en_US |
dc.description | Includes bibliographical references (leaves 46-50) | en_US |
dc.description | xiv, 51 leaves | en_US |
dc.description.abstract | Suicidal ideation is a global cause of life-threatening injury and, most of the time, death. Mental health issues have been rapidly increasing, and most are being avoided without adequate treatment. Due to the developments in social media platforms and the online anonymity that these platforms provide, people would like to interact more with others on social platforms. Social platforms are surveillance tools for mining social content and suicidal tendencies. The current thesis attempts to present a solution to detect depression/suicidal ideation by using state-of-the-art natural language processing (NLP) and deep learning (DL) approaches (BiLSTM, BERT Transformer). Three different novel approaches are proposed for three different datasets of textual content. The SuicideDetection dataset is a publicly available dataset which is a collection from the social platform of Reddit’s subreddits (“SuicideWatch”, ”depression”, ”bipolar”, ”offmychest”, ”anxiety”) in Kaggle and the SWMH dataset is a collection from only “SuicideWatch” subreddit. The CEASEv2.0 dataset is another used dataset which is a collection of 4932 suicide notes. The proposed models outperformed the latest models by 2% and 1% F1 scores on SuicideDetection and CEASEv2.0 datasets, respectively. The best models for each dataset have been analyzed and discussed in terms of performance, along with the characteristics of the datasets and limitations in the suicidal ideation classification. This performance can be measured by common metrics such as Accuracy, Precision, Recall, F1-Score, and ROC curve. As its application in the real world, this project can assist psychologists in the early identification of suicidal ideation before the suicidal person harms him/herself. The thesis also demonstrates the potential of employing DL algorithms such as transformers along with the latest word embedding techniques and NLP techniques that could improve the issue of suicidal ideation. | en_US |
dc.description.abstract | İntihar düşüncesi, yaşamı tehdit eden yaralanmaların ve çoğu zaman ölümün küresel bir nedenidir. Dünyada ruh sağlığı sorunları hızla artmakta ve birçoğu yeterli tedavi görülmeksizin göz ardı edilmektedir. Günümüzdeki sosyal medya platformlarındaki gelişmeler ve bu platformların sağladığı çevrimiçi anonimlik nedeniyle insanlar bu gibi platformlarda başkalarıyla sürekli olarak etkileşim halindedirler. Sosyal media platformları, sosyal içerikler ve intihar eğilimleri madenciliği için gözetim araçları olarak kullanılabilmektedir. Bu tez, son teknoloji doğal dil işleme ve derin öğrenmek yaklaşımlarını (BiLSTM, BERT Transformatörü) kullanarak metinsel içerik üzerinden depresyon/intihar düşüncesini tespit etmek için bir çözüm sunmaya çalışmaktadır. Üç farklı metinsel veri seti için üç farklı yeni yaklaşım önerilmiştir. SuicideDetection veri kümesi, Kaggle'da sunulan halka açık bir veri setidir ve de bu veri seti Reddit'in alt dizinlerinden ("SuicideWatch", "depression", "bipolar", "offmychest", "anxiety") oluşturulan bir koleksiyondur. SWMH veri kümesi sadece "SuicideWatch" alt dizininden toplanan içeriklerle oluşturulmuş bir koleksiyondur. CEASEv2.0 veri kümesi, 4932 intihar notundan oluşan ve kullandığımız bir diğer veri setidir. Önerilen modeller, SuicideDetection ve CEASEv2.0 veri setlerinde sırasıyla %2 ve %1 F1 puanları ile en son modellerden daha iyi performans göstermiştir. Her veri seti için en iyi modeller, veri setlerinin özellikleri ve intihar düşüncesi sınıflandırmasındaki sınırlamalarla birlikte performans açısından analiz edilmiş ve tartışılmıştır. Bu performans, Doğruluk, Kesinlik, Geri Çağırma, F1-Skoru ve ROC eğrisi gibi yaygın ölçütlerle ölçülüp karşılaştırılmıştır. Gerçek dünyadaki uygulaması itibariyle bu proje, intihara meyilli kişi kendisine zarar vermeden önce intihar düşüncesinin erken teşhisinde psikologlara yardımcı olabilir. Ayrıca, intihar düşüncesi sorununu iyileştirebilecek en son kelime gömme teknikleri ve doğal dil işleme teknikleriyle birlikte dönüştürücüler gibi derin öğrenme algoritmalarının kullanılma potansiyelini de göstermektedir. | en_US |
dc.description.tableofcontents | Suicide Statistics | en_US |
dc.description.tableofcontents | Basis of Suicidal Ideation | en_US |
dc.description.tableofcontents | Suicidal Ideation on Social Platforms | en_US |
dc.description.tableofcontents | Suicidal Ideation Detection | en_US |
dc.description.tableofcontents | Machine Learning Based Studies | en_US |
dc.description.tableofcontents | Deep Learning Based Studies | en_US |
dc.description.tableofcontents | Transformer Based Studies | en_US |
dc.description.tableofcontents | PROPOSED METHODOLOGY | en_US |
dc.description.tableofcontents | Framework of Proposed Methodology | en_US |
dc.description.tableofcontents | Preprocessing | en_US |
dc.description.tableofcontents | Word Embedding | en_US |
dc.description.tableofcontents | Word2Vec | en_US |
dc.description.tableofcontents | GloVe: Global Vectors for Word Representation | en_US |
dc.description.tableofcontents | FastText | en_US |
dc.description.tableofcontents | Transformer Based Sentence Embedding | en_US |
dc.description.tableofcontents | Activation Functions | en_US |
dc.description.tableofcontents | ReLU | en_US |
dc.description.tableofcontents | Hyperbolic Tangent (Tanh) | en_US |
dc.description.tableofcontents | Sigmoid | en_US |
dc.description.tableofcontents | Softmax | en_US |
dc.description.tableofcontents | Loss Functions | en_US |
dc.description.tableofcontents | Callback Functions | en_US |
dc.description.tableofcontents | Early Stopping | en_US |
dc.description.tableofcontents | Reduce Learning Rate | en_US |
dc.description.tableofcontents | Model Checkpoint | en_US |
dc.description.tableofcontents | Classification | en_US |
dc.description.tableofcontents | BiLSTM Networks | en_US |
dc.description.tableofcontents | BERT Transformer | en_US |
dc.description.tableofcontents | Suicide Detection Dataset | en_US |
dc.description.tableofcontents | CEASEv2.0 Dataset | en_US |
dc.description.tableofcontents | SWMH Dataset | en_US |
dc.description.tableofcontents | Evaluation Metrics | en_US |
dc.description.tableofcontents | Results and Comparison | en_US |
dc.description.tableofcontents | SuicideDetection | en_US |
dc.description.tableofcontents | CEASEv2.0 | en_US |
dc.description.tableofcontents | SWMH | en_US |
dc.description.tableofcontents | Review of methodologies for suicidal ideation detection | en_US |
dc.description.tableofcontents | Preprocessing steps for each of the datasets | en_US |
dc.description.tableofcontents | Example preprocessings for each dataset | en_US |
dc.description.tableofcontents | Differences of Binary cross entropy and Sparse categorical cross entropy | en_US |
dc.description.tableofcontents | Details of Experimented DL models for SuicideDetection dataset | en_US |
dc.description.tableofcontents | Comparison and evaluation of ML models for SuicideDetection dataset | en_US |
dc.description.tableofcontents | Details of Experimented DL models for CEASEv2.0 dataset | en_US |
dc.description.tableofcontents | Comparison and evaluation of ML models for SWMH dataset | en_US |
dc.description.tableofcontents | Comparison and evaluation of DL models for SWMH dataset | en_US |
dc.description.tableofcontents | Best proposed models for each of the dataset | en_US |
dc.description.tableofcontents | Similarity review of different embedding techniques | en_US |
dc.description.tableofcontents | Suicide rates between 15-29 age group (per 100.000 population) | en_US |
dc.description.tableofcontents | Suicide rates (per 100.000 population) | en_US |
dc.description.tableofcontents | Rate of emergency department visits with suicidal ideation, by age group: United States, 2016–2020 | en_US |
dc.description.tableofcontents | Proposed Suicidal Ideation Detection Classifier Framework | en_US |
dc.description.tableofcontents | ReLU graph | en_US |
dc.description.tableofcontents | Tanh graph | en_US |
dc.description.tableofcontents | Sigmoid graph | en_US |
dc.description.tableofcontents | BiLSTM Network Structure | en_US |
dc.description.tableofcontents | SWMH Model Summary | en_US |
dc.description.tableofcontents | Overview of SuicideDetection Dataset | en_US |
dc.description.tableofcontents | Overview of CEASEv2.0 Dataset | en_US |
dc.description.tableofcontents | Overview of SWMH Dataset | en_US |
dc.description.tableofcontents | Confusion Matrix for Binary Classification | en_US |
dc.description.tableofcontents | Accuracy & Loss Graph of Best Proposed Model for SuicideDetection dataset | en_US |
dc.description.tableofcontents | Proposed Model Architecture for SuicideDetection Dataset | en_US |
dc.description.tableofcontents | Accuracy & Loss Graph of Best Proposed Model for CEASEv2.0 dataset | en_US |
dc.description.tableofcontents | Proposed Model Architecture for CEASEv2.0 Dataset | en_US |
dc.description.tableofcontents | Proposed Model Architecture for SWMH Dataset | en_US |
dc.description.tableofcontents | Weight rates of each label on SWMH | en_US |
dc.description.tableofcontents | Distribution rates of each class label for SWMH | en_US |
dc.description.tableofcontents | WordCloud of SuicideDetection dataset | en_US |
dc.description.tableofcontents | WordCloud of CEASEv2.0 dataset | en_US |
dc.description.tableofcontents | WordCloud of SWMH dataset | en_US |
dc.description.tableofcontents | Sentence Features of SuicideDetection Dataset (Non-suicidal - Suicidal) | en_US |
dc.description.tableofcontents | Sentence Features of CEASEv2.0 Dataset (Non-suicidal – Suicidal) | en_US |
dc.identifier.citation | Ezerceli, Ö. (2023). Suicidal ideation detection from social media. İstanbul: Işık Üniversitesi Lisansüstü Eğitim Enstitüsü. | en_US |
dc.identifier.uri | https://hdl.handle.net/11729/5704 | |
dc.institutionauthor | Ezerceli, Özay | en_US |
dc.institutionauthorid | 0000-0002-7877-7528 | |
dc.language.iso | en | en_US |
dc.publisher | Işık Üniversitesi | en_US |
dc.relation.publicationcategory | Tez | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Suicidal ideation detection | en_US |
dc.subject | Social media content | en_US |
dc.subject | Word embedding | en_US |
dc.subject | Deep neural network | en_US |
dc.subject | BERT transformers | en_US |
dc.subject | İntihar düşüncesi tespiti | en_US |
dc.subject | Sosyal medya içeriği | en_US |
dc.subject | Kelime temsil | en_US |
dc.subject | Derin sinir ağı | en_US |
dc.subject | BERT transformatörü | en_US |
dc.subject.lcc | RC569 .E94 2023 | |
dc.subject.lcsh | Social media and society. | en_US |
dc.subject.lcsh | Suicidal behavior -- Diagnosis. | en_US |
dc.subject.lcsh | Suicide -- Prevention. | en_US |
dc.subject.lcsh | Deep learning. | en_US |
dc.title | Suicidal ideation detection from social media | en_US |
dc.title.alternative | Sosyal medya içeriğinden intihar düşüncesi algılama | en_US |
dc.type | Master Thesis | en_US |
dspace.entity.type | Publication |
Dosyalar
Orijinal paket
1 - 1 / 1
Yükleniyor...
- İsim:
- Suicidal_ideation_detection_from_social_media.pdf
- Boyut:
- 2.15 MB
- Biçim:
- Adobe Portable Document Format
- Açıklama:
- MasterThesis
Lisans paketi
1 - 1 / 1
Küçük Resim Yok
- İsim:
- license.txt
- Boyut:
- 1.44 KB
- Biçim:
- Item-specific license agreed upon to submission
- Açıklama: