Entity-relationship diagram generation with natural language processing and machine learning approach
dc.authorid | 0000-0002-8944-5449 | |
dc.contributor.advisor | Ekin, Emine | en_US |
dc.contributor.author | Köprülü, Mertali | en_US |
dc.contributor.other | Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı | en_US |
dc.date.accessioned | 2023-08-29T11:25:40Z | |
dc.date.available | 2023-08-29T11:25:40Z | |
dc.date.issued | 2023-08-24 | |
dc.department | Işık Üniversitesi, Lisansüstü Eğitim Enstitüsü, Bilgisayar Mühendisliği Yüksek Lisans Programı | en_US |
dc.description | Text in English ; Abstract: English and Turkish | en_US |
dc.description | Includes bibliographical references (leaves 71-72) | en_US |
dc.description | xi, 73 leaves | en_US |
dc.description.abstract | As software systems continue to grow in complexity, the need for efficient and accurate design methodologies becomes increasingly critical. Entity-Relationship Diagrams (ERDs) provide a powerful visual representation of system structures and dependencies, serving as a foundation for software engineering and database design. However, manually creating ERDs from textual requirements is time-consuming and manual. To address this challenge, this research explores the application of natural language processing (NLP) techniques to automatically extract relevant information from unstructured text and generate ERDs. The proposed approach leverages the strengths of rule-based techniques, semantic analysis, and machine learning algorithms to automatically identify entities, attributes, relationships, and cardinalities from natural language input. Our study offers practical insights into the utilization of linguistic and semantic analysis, and machine learning for efficient information extraction. The proposed system aims to streamline the ERD creation process and improve the accuracy and quality of the resulting diagrams. While the proposed approach shows promising results, the limitations in heuristic rule coverage and data dependencies are acknowledge. Furthermore, the evaluation results demonstrate in detecting entities, attributes, and relations, with f1-scores of 0.96, 0.93, and 0.92, and resolving the components specifications achieved accuracy of 0.87, 0.84, 0.91, respectively. The findings contribute to advancing ERD extraction from text and suggest future research directions for improving the robustness and usability of the solution. The fusion of NLP techniques with ERD creation highlights the potential for enhancing the software development lifecycle and opens new avenues for research in the realm of information extraction from natural language text. | en_US |
dc.description.abstract | Yazılım sistemleri giderek karmaşıklık kazandıkça, verimli ve doğru tasarım yöntemlerine olan ihtiyaç artan bir şekilde kritik hale gelmektedir. Varlık İlişki Diyagramları (ERD), sistem yapılarını ve bağımlılıklarını güçlü bir görsel diyagram ile sunarak yazılım mühendisliği ve dahi veri tabanı tasarımının temelini oluştururlar. Ancak, metinsel gereksinimlerden ERD'lerin el ile oluşturulması zaman alıcı ve zahmet gerektirir iken, tasarım yapan kişinin öznel eleştirisine bağlıdır. Bu zorluğun üstesinden gelmek için bu tez, doğal dil işleme (NLP) tekniklerinin kullanımını ve metinden diyagram ile ilgili gerekli olan bilgileri otomatik olarak çıkarmak ve ERD'ler oluşturmak için incelemektedir. Önerilen bu yaklaşım, doğal dil girdilerinden varlık, varlıkların özniteliklerini ve ilişkilerini ve kardinalitelerini otomatik olarak belirlemek için kural tabanlı tekniklerin, anlamsal analizin ve makine öğrenimi algoritmalarının birleşimini kullanır. Bu çalışma, dilbilimsel ve anlamsal analiz ile makine öğreniminin verimli bilgi çıkarımı için kullanılmasına ilişkin araştırmaları sunarak deneyler yapar ve bu deney sonuçlarını karşılaştırması sonucu önerilen yöntemin eksikliklerini ve güçlü yönlerini bildirir. Önerilen bu sistem, ERD oluşturma sürecini basitleştirmeyi ve bilgi çıkarımı ile ERD’lerin doğru ve kaliteli üretimini amaçlar. Ek olarak, bu değerlendirme, varlık, öznitelik ve ilişkilerin tespitinde sırasıyla 0.96, 0.93 ve 0.92 f1 puanı almış, bileşen özelliklerinin çözümlenmesinde ise doğru diyagram varlıklarının özelliklerini bulmada sırasıyla 0.87, 0.84 ve 0.91 doğruluk oranını elde etmiştir. Elde edilen bu bulgular, metinden ERD çıkarma konusunda ilerlemeye katkı sağlayıp ve dahi çözümün sağlamlığını ve kullanılabilirliğini artırmak için gelecekteki araştırmalar için yönergeler ve çözümler önerir. NLP tekniklerinin ERD oluşturma ile birleştirilmesi ve yazılım geliştirme yaşam döngüsünü geliştirmenin potansiyelini vurgulayarak metinden bilgi çıkarma alanına da yeni araştırma olanakları sunar. | en_US |
dc.description.tableofcontents | OVERVIEW OF ENTITY RELATIONSHIP DIAGRAMS | en_US |
dc.description.tableofcontents | Database Modelling | en_US |
dc.description.tableofcontents | Entity-Relationship Diagrams | en_US |
dc.description.tableofcontents | Components of Entity Relationship Diagram | en_US |
dc.description.tableofcontents | Entity | en_US |
dc.description.tableofcontents | Weak Entity | en_US |
dc.description.tableofcontents | Attribute | en_US |
dc.description.tableofcontents | Key Attribute | en_US |
dc.description.tableofcontents | Derived Attribute | en_US |
dc.description.tableofcontents | Multi-Valued Attribute | en_US |
dc.description.tableofcontents | Composite Attribute | en_US |
dc.description.tableofcontents | Relationship | en_US |
dc.description.tableofcontents | Identifying Relationship | en_US |
dc.description.tableofcontents | Cardinalities | en_US |
dc.description.tableofcontents | Rule-Based Approaches on Diagram Generation | en_US |
dc.description.tableofcontents | Semantic-Based Approaches | en_US |
dc.description.tableofcontents | Machine-Learning Approach | en_US |
dc.description.tableofcontents | Pre-Processing Module | en_US |
dc.description.tableofcontents | Sentence Segmentation | en_US |
dc.description.tableofcontents | Word Correction (Optional) | en_US |
dc.description.tableofcontents | Tokenization | en_US |
dc.description.tableofcontents | Chunking | en_US |
dc.description.tableofcontents | Part-Of-Speech Tagging | en_US |
dc.description.tableofcontents | Wordnet Synonym Extraction | en_US |
dc.description.tableofcontents | Word Dependency | en_US |
dc.description.tableofcontents | Custom Named Entity Extraction Module | en_US |
dc.description.tableofcontents | Dependency Extraction | en_US |
dc.description.tableofcontents | Candidate Component Extraction | en_US |
dc.description.tableofcontents | Vectorization | en_US |
dc.description.tableofcontents | Custom Named Entity Recognizer | en_US |
dc.description.tableofcontents | Component Feature Extraction Module | en_US |
dc.description.tableofcontents | Specification Resolver | en_US |
dc.description.tableofcontents | Component Tagging | en_US |
dc.description.tableofcontents | Reduction of Redundant Information | en_US |
dc.description.tableofcontents | Graph Pre – Processing | en_US |
dc.description.tableofcontents | Experimental Results of Custom Named Entity Recognition Module | en_US |
dc.description.tableofcontents | Model Case Outputs | en_US |
dc.description.tableofcontents | Case 1: "Attribute" of "Entity" Extraction | en_US |
dc.description.tableofcontents | Case 2: Relation Extraction | en_US |
dc.description.tableofcontents | Case 3: "Attribute" of "Relation" Extraction | en_US |
dc.description.tableofcontents | Case 4: Complex Relations | en_US |
dc.description.tableofcontents | Evaluation | en_US |
dc.description.tableofcontents | Confusion Matrices | en_US |
dc.description.tableofcontents | Semantic Roles and Their Definitions | en_US |
dc.description.tableofcontents | Approaches of Relation Extraction from Text | en_US |
dc.description.tableofcontents | Universal Part-of-Speech Tags | en_US |
dc.description.tableofcontents | Spacy Dependency Labels | en_US |
dc.description.tableofcontents | Spacy Token Features | en_US |
dc.description.tableofcontents | Sample of Component Relation Structure | en_US |
dc.description.tableofcontents | Bi-Directional Long Short-Term Memory Network Feature Comparison for Custom Named Entity Recognition | en_US |
dc.description.tableofcontents | Confusion Matrix of Entity Extraction | en_US |
dc.description.tableofcontents | Confusion Matrix of Attribute Extraction | en_US |
dc.description.tableofcontents | Confusion Matrix of Key Attribute Extraction | en_US |
dc.description.tableofcontents | Confusion Matrix of Relation Extraction | en_US |
dc.description.tableofcontents | Entity Shape | en_US |
dc.description.tableofcontents | Weak Entity Shape | en_US |
dc.description.tableofcontents | Attribute Shape | en_US |
dc.description.tableofcontents | Key Attribute Shape | en_US |
dc.description.tableofcontents | Derived Attribute Shape | en_US |
dc.description.tableofcontents | Multi-Valued Attribute Shape | en_US |
dc.description.tableofcontents | Composite Attribute Shape | en_US |
dc.description.tableofcontents | Relationship Shape | en_US |
dc.description.tableofcontents | Identifying Relationship Shape | en_US |
dc.description.tableofcontents | Cardinality Ratio one-to-many, E1: E2 on Relation R | en_US |
dc.description.tableofcontents | Proposed Model of Habib | en_US |
dc.description.tableofcontents | Parser tree of the sentence “X hit the ball.” | en_US |
dc.description.tableofcontents | Approach of S. Btoush | en_US |
dc.description.tableofcontents | ERD Modeling Generation Framework | en_US |
dc.description.tableofcontents | Block diagram of large-scale Object-Based Language Interactor | en_US |
dc.description.tableofcontents | Illustration of Semantic Net | en_US |
dc.description.tableofcontents | Model of ER-Converter Tool | en_US |
dc.description.tableofcontents | Machine Learning Model of Kashmira | en_US |
dc.description.tableofcontents | Annotated Data output of Kashmira | en_US |
dc.description.tableofcontents | Proposed Model | en_US |
dc.description.tableofcontents | Proposed System Architecture | en_US |
dc.description.tableofcontents | Pre-processing Module | en_US |
dc.description.tableofcontents | Dependency Output of Given Sentence | en_US |
dc.description.tableofcontents | Example Dependency Tree | en_US |
dc.description.tableofcontents | Custom Named Entity Extraction Module | en_US |
dc.description.tableofcontents | Default Named Entity Recognizer output of SpaCy | en_US |
dc.description.tableofcontents | Recurrent Neural Network Cell Structure | en_US |
dc.description.tableofcontents | Example Usage of RNN in NER | en_US |
dc.description.tableofcontents | Long-Short Term Memory Cell Structure | en_US |
dc.description.tableofcontents | Bi-directional LSTM Architecture | en_US |
dc.description.tableofcontents | Component Feature Extraction Module | en_US |
dc.description.tableofcontents | Illustration of dependency tree of 5th sentence | en_US |
dc.description.tableofcontents | Graph Pre-Processing Module | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 1 | en_US |
dc.description.tableofcontents | Illustration of Elmasri on Scenario 1 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 2 (a), ERD illustration (b) | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 3 | en_US |
dc.description.tableofcontents | Elmasri Illustration on Scenario 3 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 4 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 5 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 6 | en_US |
dc.description.tableofcontents | Dependency Tree Illustration of sentence: “Suppliers, Parts, and Projects have ternary relation called ‘supply’ which has quantity attribute.” | en_US |
dc.description.tableofcontents | Solution Entity Relationship Diagram of Scenario 7 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 7 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 8 | en_US |
dc.description.tableofcontents | Generated Entity Relationship Diagram of Scenario 9 | en_US |
dc.identifier.citation | Köprülü, M. (2023). Entity-relationship diagram generation with natural language processing and machine learning approach. İstanbul: Işık Üniversitesi Lisansüstü Eğitim Enstitüsü. | en_US |
dc.identifier.uri | https://hdl.handle.net/11729/5691 | |
dc.institutionauthor | Köprülü, Mertali | en_US |
dc.institutionauthorid | 0000-0002-8944-5449 | |
dc.language.iso | en | en_US |
dc.publisher | Işık Üniversitesi | en_US |
dc.relation.publicationcategory | Tez | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Entity-relationship diagram | en_US |
dc.subject | Natural language processing | en_US |
dc.subject | Named entity recognition | en_US |
dc.subject | Information extraction | en_US |
dc.subject | Varlık-ilişki diyagramı | en_US |
dc.subject | Doğal dil işleme | en_US |
dc.subject | Adlandırılmış varlık tanıma | en_US |
dc.subject | Bilgi çıkarımı | en_US |
dc.subject.lcc | QA76.9.N38 K67 2023 | |
dc.subject.lcsh | Machine learning. | en_US |
dc.subject.lcsh | Natural language processing. | en_US |
dc.subject.lcsh | Natural language processing (Computer science). | en_US |
dc.subject.lcsh | Entity-relationship modeling. | en_US |
dc.title | Entity-relationship diagram generation with natural language processing and machine learning approach | en_US |
dc.title.alternative | Doğal dil işleme ve makine öğrenmesi yaklaşımıyla varlık-ilişki diyagram üretimi | en_US |
dc.type | Master Thesis | en_US |
dspace.entity.type | Publication |
Dosyalar
Orijinal paket
1 - 1 / 1
Yükleniyor...
- İsim:
- Entity_relationship_diagram_generation_with_natural_language_processing_and_machine_learning_approach.pdf
- Boyut:
- 2.14 MB
- Biçim:
- Adobe Portable Document Format
- Açıklama:
- MasterThesis
Lisans paketi
1 - 1 / 1
Küçük Resim Yok
- İsim:
- license.txt
- Boyut:
- 1.44 KB
- Biçim:
- Item-specific license agreed upon to submission
- Açıklama: