Arama Sonuçları

Listeleniyor 1 - 10 / 12
  • Yayın
    Calculating the VC-dimension of decision trees
    (IEEE, 2009) Aslan, Özlem; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.
  • Yayın
    İlişkisel veri tabanlarında mükerrer kayıtların makine öğrenmesiyle tespiti
    (Institute of Electrical and Electronics Engineers Inc., 2018-07-05) Bayrak, Ahmet Tuğrul; Yılmaz, Aykut İnan; Yılmaz, Kemal Burak; Düzağaç, Remzi; Yıldız, Olcay Taner
    Veri miktarının artışına paralel olarak, ilişkisel veri tabanlarında mükerrer kayıtlar da artmaktadır. Artan bu kayıtlar kullanıldıkları rapor veya analizlerde tutarsızlığa sebep olabilmektedir. Bu sorunu en aza indirgemek için yaptığımız çalışmada, kayıtların birbirlerine olan benzerlikleri ve alan uzmanlık bilgisiyle belirlenen ağırlıklar, öznitelik olarak kullanılarak makine öğrenmesi algoritmaları ile mükerrer kayıtların bulunması hedeflenmiştir. Yapılan işlem sonucunda 9301467 satır veride 28412 mükerrer çift tespit edilmiştir. Bulunan bu mükerrer kayıtlar veri kaynağından temizlenerek verinin daha tutarlı hale gelmesi sağlanmaktadır.
  • Yayın
    Machine learning
    (Institution of Engineering and Technology, 2020-01-01) Yıldız, Olcay Taner
    [No abstract available]
  • Yayın
    VC-dimension of univariate decision trees
    (IEEE-INST Electrical Electronics Engineers Inc, 2015-02-25) Yıldız, Olcay Taner
    In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
  • Yayın
    Incremental construction of classifier and discriminant ensembles
    (Elsevier Science Inc, 2009-04-15) Ulaş, Aydın; Semerci, Murat; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets. incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost. but fewer classifiers.
  • Yayın
    Cost-conscious comparison of supervised learning algorithms over multiple data sets
    (Elsevier Sci Ltd, 2012-04) Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    In the literature, there exist statistical tests to compare supervised learning algorithms on multiple data sets in terms of accuracy but they do not always generate an ordering. We propose Multi(2)Test, a generalization of our previous work, for ordering multiple learning algorithms on multiple data sets from "best" to "worst" where our goodness measure is composed of a prior cost term additional to generalization error. Our simulations show that Multi2Test generates orderings using pairwise tests on error and different types of cost using time and space complexity of the learning algorithms.
  • Yayın
    Eigenclassifiers for combining correlated classifiers
    (Elsevier Science Inc, 2012-03-15) Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    In practice, classifiers in an ensemble are not independent. This paper is the continuation of our previous work on ensemble subset selection [A. Ulas, M. Semerci, O.T. Yildiz, E. Alpaydin, Incremental construction of classifier and discriminant ensembles, Information Sciences, 179 (9) (2009) 1298-1318] and has two parts: first, we investigate the effect of four factors on correlation: (i) algorithms used for training, (ii) hyperparameters of the algorithms, (iii) resampled training sets, (iv) input feature subsets. Simulations using 14 classifiers on 38 data sets indicate that hyperparameters and overlapping training sets have higher effect on positive correlation than features and algorithms. Second, we propose postprocessing before fusing using principal component analysis (PCA) to form uncorrelated eigenclassifiers from a set of correlated experts. Combining the information from all classifiers may be better than subset selection where some base classifiers are pruned before combination, because using all allows redundancy.
  • Yayın
    Model selection in omnivariate decision trees using Structural Risk Minimization
    (Elsevier Science Inc, 2011-12-01) Yıldız, Olcay Taner
    As opposed to trees that use a single type of decision node, an omnivariate decision tree contains nodes of different types. We propose to use Structural Risk Minimization (SRM) to choose between node types in omnivariate decision tree construction to match the complexity of a node to the complexity of the data reaching that node. In order to apply SRM for model selection, one needs the VC-dimension of the candidate models. In this paper, we first derive the VC-dimension of the univariate model, and estimate the VC-dimension of all three models (univariate, linear multivariate or quadratic multivariate) experimentally. Second, we compare SRM with other model selection techniques including Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC) and cross-validation (CV) on standard datasets from the UCI and Delve repositories. We see that SRM induces omnivariate trees that have a small percentage of multivariate nodes close to the root and they generalize more or at least as accurately as those constructed using other model selection techniques.
  • Yayın
    An incremental model selection algorithm based on cross-validation for finding the architecture of a Hidden Markov model on hand gesture data sets
    (IEEE, 2009-12-13) Ulaş, Aydın; Yıldız, Olcay Taner
    In a multi-parameter learning problem, besides choosing the architecture of the learner, there is the problem of finding the optimal parameters to get maximum performance. When the number of parameters to be tuned increases, it becomes infeasible to try all the parameter sets, hence we need an automatic mechanism to find the optimum parameter setting using computationally feasible algorithms. In this paper, we define the problem of optimizing the architecture of a Hidden Markov Model (HMM) as a state space search and propose the MSUMO (Model Selection Using Multiple Operators) framework that incrementally modifies the structure and checks for improvement using cross-validation. There are five variants that use forward/backward search, single/multiple operators, and depth-first/breadth-first search. On four hand gesture data sets, we compare the performance of MSUMO with the optimal parameter set found by exhaustive search in terms of expected error and computational complexity.
  • Yayın
    Müşterilerin GSP analizi kullanarak kümelenmesi
    (Institute of Electrical and Electronics Engineers Inc., 2018-07-05) Pakyürek, Muhammet; Sezgin, Mehmet Selman; Kestepe, Sedat; Bora, Büşra; Düzağaç, Remzi; Yıldız, Olcay Taner
    Bu çalışma ile mevcut misafir ve rezervasyon verisi kullanılarak doğal öbeklenmeleri tespit ederek misafir davranışları tespit ettik. Ayrıca verilen hizmetleri ve satış stratejilerini bu davranışlara göre özelleştirdik. K-ortalama ile kişileri öbekledikten sonra bu mevcut öbeklenmeleri sağlayan temel karakteristikler karar ağacı yaklaşımı ile çıkartılmıştır. Bu karakteristiklerin kişinin ürün alma kanalı, belirli ürün tercihleri, rezervasyon süresi, sezonsal tercihi vb. olduğu tespit edilmiştir. Bu karakteristiklerin her öbeklenmede ciddi değişiklikler göstermiş olması çözümün genel olarak doğru olduğunun ve bu karakteristiklerin başarılı bir şekilde seçildiğini göstermektedir. Bu çalışma, grup karakteristiklerine uygun kampanyalar ve ürün paketleri oluşturulmasında önemli bir rol oynamaktadır.