İlişkisel veri tabanlarında mükerrer kayıtların makine öğrenmesiyle tespiti

Bayrak, Ahmet Tuğrul; Yılmaz, Aykut İnan; Yılmaz, Kemal Burak; Düzağaç, Remzi; Yıldız, Olcay Taner

İlişkisel veri tabanlarında mükerrer kayıtların makine öğrenmesiyle tespiti

dc.authorid	0009-0009-6043-2765
dc.authorid	0000-0001-5838-4615
dc.contributor.author	Bayrak, Ahmet Tuğrul	en_US
dc.contributor.author	Yılmaz, Aykut İnan	en_US
dc.contributor.author	Yılmaz, Kemal Burak	en_US
dc.contributor.author	Düzağaç, Remzi	en_US
dc.contributor.author	Yıldız, Olcay Taner	en_US
dc.date.accessioned	2019-01-15T02:41:39Z
dc.date.available	2019-01-15T02:41:39Z
dc.date.issued	2018-07-05
dc.department	Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.department	Işık University, Faculty of Engineering, Department of Computer Engineering	en_US
dc.description.abstract	Veri miktarının artışına paralel olarak, ilişkisel veri tabanlarında mükerrer kayıtlar da artmaktadır. Artan bu kayıtlar kullanıldıkları rapor veya analizlerde tutarsızlığa sebep olabilmektedir. Bu sorunu en aza indirgemek için yaptığımız çalışmada, kayıtların birbirlerine olan benzerlikleri ve alan uzmanlık bilgisiyle belirlenen ağırlıklar, öznitelik olarak kullanılarak makine öğrenmesi algoritmaları ile mükerrer kayıtların bulunması hedeflenmiştir. Yapılan işlem sonucunda 9301467 satır veride 28412 mükerrer çift tespit edilmiştir. Bulunan bu mükerrer kayıtlar veri kaynağından temizlenerek verinin daha tutarlı hale gelmesi sağlanmaktadır.	en_US
dc.description.abstract	While data amount increases, number of duplicate records in relational databases increase gradually. The duplicate records might cause inconsistency on reports and analyzes. To reduce the effects of this problem, we aim to detect duplicate records using machine learning algorithms with features that are produced by similarity of the records. We achieved to detect 28412 duplicate records in 9301467 records. The detected duplicate rows are removed from the data source and the data become more consistent.	en_US
dc.description.version	Publisher's Version	en_US
dc.identifier.citation	Bayrak, A. T., Yılmaz, A. I., Yılmaz, K. B., Düzağaç, R. & Yıldız, O. T. (2018). Near duplicate detection in relational databases. Paper presented at the 26th IEEE Signal Processing and Communications Applications Conference, SIU 2018, 1-4. doi:10.1109/SIU.2018.8404678	en_US
dc.identifier.doi	10.1109/SIU.2018.8404678
dc.identifier.endpage	4
dc.identifier.isbn	9781538615010
dc.identifier.isbn	9781538615003
dc.identifier.isbn	9781538615027
dc.identifier.issn	2165-0608
dc.identifier.scopus	2-s2.0-85050807995
dc.identifier.scopusquality	N/A
dc.identifier.startpage	1
dc.identifier.uri	https://hdl.handle.net/11729/1446
dc.identifier.uri	http://dx.doi.org/10.1109/SIU.2018.8404678
dc.identifier.wos	WOS:000511448500531
dc.identifier.wosquality	N/A
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.indekslendigikaynak	Conference Proceedings Citation Index – Science (CPCI-S)	en_US
dc.institutionauthor	Yıldız, Olcay Taner	en_US
dc.institutionauthorid	0000-0001-5838-4615
dc.language.iso	tr	en_US
dc.peerreviewed	Yes	en_US
dc.publicationstatus	Published	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartof	26th IEEE Signal Processing and Communications Applications Conference, SIU 2018	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Benzerlik fonksiyonları	en_US
dc.subject	Makine öğrenmesi	en_US
dc.subject	Mükerrer kayıt tespiti	en_US
dc.subject	Algorithms	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Data mining	en_US
dc.subject	Database systems	en_US
dc.subject	Data-source	en_US
dc.subject	Dogs	en_US
dc.subject	Duplicate record detection	en_US
dc.subject	Duplicate records	en_US
dc.subject	Duplicate record detection	en_US
dc.subject	Feature extraction	en_US
dc.subject	Kernel	en_US
dc.subject	Knowledge discovery	en_US
dc.subject	Learning (artificial intelligence)	en_US
dc.subject	Learning algorithms	en_US
dc.subject	Learning systems	en_US
dc.subject	Machine learning	en_US
dc.subject	Machine learning algorithms	en_US
dc.subject	Near-duplicate detection	en_US
dc.subject	Near-duplicate detection	en_US
dc.subject	Privacy-preserving record	en_US
dc.subject	Relational databases	en_US
dc.subject	Relational database	en_US
dc.subject	Signal processing	en_US
dc.subject	Similarity functions	en_US
dc.title	İlişkisel veri tabanlarında mükerrer kayıtların makine öğrenmesiyle tespiti	en_US
dc.title.alternative	Near duplicate detection in relational databases	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 1446.pdf
Boyut:: 111.13 KB
Biçim:: Adobe Portable Document Format
Açıklama:: Publisher's Version

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.71 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Bildiri Koleksiyonu | Bilgisayar Mühendisliği Bölümü
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu