Design and analysis of classifier learning experiments in bioinformatics: survey and case studies

dc.authorid0000-0001-5838-4615
dc.authorid0000-0001-7506-0321
dc.contributor.authorİrsoy, Ozanen_US
dc.contributor.authorYıldız, Olcay Taneren_US
dc.contributor.authorAlpaydın, Ahmet İbrahim Ethemen_US
dc.date.accessioned2015-01-15T23:02:03Z
dc.date.available2015-01-15T23:02:03Z
dc.date.issued2012-12
dc.departmentIşık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.departmentIşık University, Faculty of Engineering, Department of Computer Engineeringen_US
dc.description.abstractIn many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.en_US
dc.description.sponsorshipThe authors would like to thank the editor and the reviewers for their constructive comments, suggestions, pointers to related literature, and pertinent questions which allowed us to better situate our work as well as organize the manuscript and improve the presentation. This work has been supported by the Turkish Scientific Technical Research Council (TUBITAK) EEEAG 109E186 and Bogazici University Research Funds BAP 5701en_US
dc.description.versionPublisher's Versionen_US
dc.description.versionAuthor Post Printen_US
dc.identifier.citationİrsoy, O., Yıldız, O. T. & Alpaydın, A. İ. E. (2012). Design and analysis of classifier learning experiments in bioinformatics: Survey and case studies. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9(6), 1663-1675. doi:10.1109/TCBB.2012.117en_US
dc.identifier.doi10.1109/TCBB.2012.117
dc.identifier.endpage1675
dc.identifier.issn1545-5963
dc.identifier.issn1557-9964
dc.identifier.pmid22908127
dc.identifier.scopus2-s2.0-84880458786
dc.identifier.scopusqualityQ2
dc.identifier.startpage1663
dc.identifier.urihttps://hdl.handle.net/11729/433
dc.identifier.urihttp://dx.doi.org/10.1109/TCBB.2012.117
dc.identifier.volume9
dc.identifier.wosWOS:000312558400011
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakPubMeden_US
dc.indekslendigikaynakScience Citation Index Expanded (SCI-EXPANDED)en_US
dc.institutionauthorYıldız, Olcay Taneren_US
dc.institutionauthorid0000-0001-5838-4615
dc.language.isoenen_US
dc.peerreviewedYesen_US
dc.publicationstatusPublisheden_US
dc.publisherIEEE Computer Socen_US
dc.relation.ispartofIEEE/ACM Transactions on Computational Biology and Bioinformaticsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectStatistical testsen_US
dc.subjectClassificationen_US
dc.subjectModel selectionen_US
dc.subjectMultiple data setsen_US
dc.subjectStatistical comparisonsen_US
dc.subjectRoc curveen_US
dc.subjectAlgorithmsen_US
dc.subjectPrecisionen_US
dc.subjectPerformanceen_US
dc.subjectPredictionen_US
dc.subjectRetrievalen_US
dc.subjectRecallen_US
dc.subjectAreaen_US
dc.titleDesign and analysis of classifier learning experiments in bioinformatics: survey and case studiesen_US
dc.typeArticleen_US
dspace.entity.typePublication

Dosyalar

Orijinal paket
Listeleniyor 1 - 2 / 2
Küçük Resim Yok
İsim:
433.pdf
Boyut:
1.15 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Publisher's Version
Yükleniyor...
Küçük Resim
İsim:
433.pdf
Boyut:
202.43 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Author Post Print