10 sonuçlar
Arama Sonuçları
Listeleniyor 1 - 10 / 10
Yayın Sınıflandırma için diferansiyel mahremiyete dayalı öznitelik seçimi(Gazi Univ, Fac Engineering Architecture, 2018) Var, Esra; İnan, AliVeri madenciliği ve makine öğrenmesi çözümlerinin en önemli ön aşamalarından biri yapılacak analizde kullanılacak verinin özniteliklerinin uygun bir alt kümesini belirlemektir. Sınıflandırma yöntemleri için bu işlem, bir özniteliğin sınıf niteliği ile ne oranda ilişkili olduğuna bakılarak yapılır. Kişisel gizliliği koruyan pek çok sınıflandırma çözümü bulunmaktadır. Ancak bu yöntemler için öznitelik seçimi yapan çözümler geliştirilmemiştir. Bu çalışmada, istatistiksel veritabanı güvenliğinde bilinen en kapsamlı ve güvenli çözüm olan diferansiyel mahremiyete dayalı özgün öznitelik seçimi yöntemleri sunulmaktadır. Önerilen bu yöntemler, yaygın olarak kullanılan bir veri madenciliği kütüphanesi olan WEKA ile entegre edilmiş ve deney sonuçları ile önerilen çözümlerin sınıflandırma başarımına olumlu etkileri gösterilmiştir.Yayın Mixture of Gaussian models and bayes error under differential privacy(2011) Xi, Bowei; Kantarcıoğlu, Murat; İnan, AliGaussian mixture models are an important tool in Bayesian decision theory. In this study, we focus on building such models over statistical database protected under differential privacy. Our approach involves querying necessary statistics from a database and building a Bayesian classifier over the noise added responses generated according to differential privacy. We formally analyze the sensitivity of our query set. Since there are multiple methods to query a statistic, either directly or indirectly, we analyze the sensitivities for different querying methods. Furthermore we establish theoretical bounds for the Bayes error for the univariate (one dimensional) case. We study the Bayes error for the multivariate (high dimensional) case in experiments with both simulated data and real life data. We discover that adding Laplace noise to a statistic under certain constraint is problematic. For example variance-covariance matrix is no longer positive definite after noise addition. We propose a heuristic method to fix the noise added variance-covariance matrix.Yayın Design and analysis of classifier learning experiments in bioinformatics: survey and case studies(IEEE Computer Soc, 2012-12) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim EthemIn many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.Yayın Incremental construction of classifier and discriminant ensembles(Elsevier Science Inc, 2009-04-15) Ulaş, Aydın; Semerci, Murat; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim EthemWe discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets. incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost. but fewer classifiers.Yayın Tree Ensembles on the induced discrete space(Institute of Electrical and Electronics Engineers Inc., 2016-05) Yıldız, Olcay TanerDecision trees are widely used predictive models in machine learning. Recently, K-tree is proposed, where the original discrete feature space is expanded by generating all orderings of values of k discrete attributes and these orderings are used as the new attributes in decision tree induction. Although K-tree performs significantly better than the proper one, their exponential time complexity can prohibit their use. In this brief, we propose K-forest, an extension of random forest, where a subset of features is selected randomly from the induced discrete space. Simulation results on 17 data sets show that the novel ensemble classifier has significantly lower error rate compared with the random forest based on the original feature space.Yayın Cost-conscious comparison of supervised learning algorithms over multiple data sets(Elsevier Sci Ltd, 2012-04) Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim EthemIn the literature, there exist statistical tests to compare supervised learning algorithms on multiple data sets in terms of accuracy but they do not always generate an ordering. We propose Multi(2)Test, a generalization of our previous work, for ordering multiple learning algorithms on multiple data sets from "best" to "worst" where our goodness measure is composed of a prior cost term additional to generalization error. Our simulations show that Multi2Test generates orderings using pairwise tests on error and different types of cost using time and space complexity of the learning algorithms.Yayın Searching for the optimal ordering of classes in rule induction(IEEE, 2012-11-15) Ata, Sezin; Yıldız, Olcay TanerRule induction algorithms such as Ripper, solve a K > 2 class problem by converting it into a sequence of K - 1 two-class problems. As a usual heuristic, the classes are fed into the algorithm in the order of increasing prior probabilities. In this paper, we propose two algorithms to improve this heuristic. The first algorithm starts with the ordering the heuristic provides and searches for better orderings by swapping consecutive classes. The second algorithm transforms the ordering search problem into an optimization problem and uses the solution of the optimization problem to extract the optimal ordering. We compared our algorithms with the original Ripper on 8 datasets from UCI repository [2]. Simulation results show that our algorithms produce rulesets that are significantly better than those produced by Ripper proper.Yayın Model selection in omnivariate decision trees using Structural Risk Minimization(Elsevier Science Inc, 2011-12-01) Yıldız, Olcay TanerAs opposed to trees that use a single type of decision node, an omnivariate decision tree contains nodes of different types. We propose to use Structural Risk Minimization (SRM) to choose between node types in omnivariate decision tree construction to match the complexity of a node to the complexity of the data reaching that node. In order to apply SRM for model selection, one needs the VC-dimension of the candidate models. In this paper, we first derive the VC-dimension of the univariate model, and estimate the VC-dimension of all three models (univariate, linear multivariate or quadratic multivariate) experimentally. Second, we compare SRM with other model selection techniques including Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC) and cross-validation (CV) on standard datasets from the UCI and Delve repositories. We see that SRM induces omnivariate trees that have a small percentage of multivariate nodes close to the root and they generalize more or at least as accurately as those constructed using other model selection techniques.Yayın Robust localization and identification of African clawed frogs in digital images(Elsevier Science BV, 2014-09) Tek, Faik Boray; Cannavo, Flavio; Nunnari, Giuseppe; Kale, İzzetWe study the automatic localization and identification of African clawed frogs (Xenopus laevis sp.) in digital images taken in a laboratory environment. We propose a novel and stable frog body localization and skin pattern window extraction algorithm. We show that it compensates scale and rotation changes very well. Moreover, it is able to localize and extract highly overlapping regions (pattern windows) even in the cases of intense affine transformations, blurring, Gaussian noise, and intensity transformations. The frog skin pattern (i.e. texture) provides a unique feature for the identification of individual frogs. We investigate the suitability of five different feature descriptors (Gabor filters, area granulometry, HoG,(1) dense SIFT,(2) and raw pixel values) to represent frog skin patterns. We compare the robustness of the features based on their identification performance using a nearest neighbor classifier. Our experiments show that among five features that we tested, the best performing feature against rotation, scale, and blurring modifications was the raw pixel feature, whereas the SIFT feature was the best performing one against affine and intensity modifications.Yayın Malaria parasite detection with deep transfer learning(IEEE, 2018-12-06) Var, Esra; Tek, Faik BorayThis study aims to automatically detect malaria parasites (Plasmodium sp) on images taken from Giemsa stained blood smears. Deep learning methods provide limited performance when sample size is low. In transfer learning, visual features are learned from large general data sets, and problem-specific classification problem can be solved successfully in restricted problem specific data sets. In this study, we apply transfer learning method to detect and classify malaria parasites. We use a popular pre-trained CNN model VGG19. We trained the model for 20 epoch on 1428 P Vivax, 1425 P Ovule, 1446 E Falciparum, 1450 P Malariae and 1440 non-parasite samples. The transfer learning model achieves %80, %83, %86, %75 precision and 83%, 86%, 86%, 79% f-measure on 19 test images.












