Arama Sonuçları

Listeleniyor 1 - 10 / 13
  • Yayın
    Calculating the VC-dimension of decision trees
    (IEEE, 2009) Aslan, Özlem; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.
  • Yayın
    Reviewing the effects of spatial features on price prediction for real estate market: Istanbul case
    (IEEE, 2022-09-16) Ecevit, Mert İlhan; Erdem, Zeki; Dağ, Hasan
    In the real estate market, spatial features play a crucial role in determining property appraisals and prices. When spatial features are considered, classification techniques have been rarely studied compared to regression, which is commonly used for price prediction. This study reviews spatial features' effects on predicting the house price ranges for real estate in Istanbul, Turkey, in the classification context. Spatial features are generated and extracted by geocoding the address information from the original data set. This geocoding and feature extraction is another challenge in this research. The experiments compare the performance of Decision Trees (DT), Random Forests (RF), and Logistic Regression (LR) classifier models on the data set with and without spatial features. The prediction models are evaluated based on classification metrics such as accuracy, precision, recall, and F1-Score. We additionally examine the ROC curve of each classifier. The test results show that the RF model outperforms the DT and LR models. It is observed that spatial features, when incorporated with non-spatial features, significantly improve the prediction performance of the models for the house price ranges. It is considered that the results can contribute to making decisions more accurately for the appraisal in the real estate industry.
  • Yayın
    Univariate margin tree
    (Springer, 2010) Yıldız, Olcay Taner
    In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. In this paper, we propose a new decision tree learning algorithm called univariate margin tree, where for each continuous attribute, the best split is found using convex optimization. Our simulation results on 47 datasets show that the novel margin tree classifier performs at least as good as C4.5 and LDT with a similar time complexity. For two class datasets it generates smaller trees than C4.5 and LDT without sacrificing from accuracy, and generates significantly more accurate trees than C4.5 and LDT for multiclass datasets with one-vs-rest methodology.
  • Yayın
    Soft decision trees
    (IEEE, 2012) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We discuss a novel decision tree architecture with soft decisions at the internal nodes where we choose both children with probabilities given by a sigmoid gating function. Our algorithm is incremental where new nodes are added when needed and parameters are learned using gradient-descent. We visualize the soft tree fit on a toy data set and then compare it with the canonical, hard decision tree over ten regression and classification data sets. Our proposed model has significantly higher accuracy using fewer nodes.
  • Yayın
    VC-dimension of rule sets
    (IEEE Computer Soc, 2014-12-04) Yıldız, Olcay Taner
    In this paper, we give and prove lower bounds of the VC-dimension of the rule set hypothesis class where the input features are binary or continuous. The VC-dimension of the rule set depends on the VC-dimension values of its rules and the number of inputs.
  • Yayın
    Budding trees
    (IEEE Computer Soc, 2014-08-24) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We propose a new decision tree model, named the budding tree, where a node can be both a leaf and an internal decision node. Each bud node starts as a leaf node, can then grow children, but then later on, if necessary, its children can be pruned. This contrasts with traditional tree construction algorithms that only grows the tree during the training phase, and prunes it in a separate pruning phase. We use a soft tree architecture and show that the tree and its parameters can be trained using gradient-descent. Our experimental results on regression, binary classification, and multi-class classification data sets indicate that our newly proposed model has better performance than traditional trees in terms of accuracy while inducing trees of comparable size.
  • Yayın
    Regularizing soft decision trees
    (Springer, 2013) Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    Recently, we have proposed a new decision tree family called soft decision trees where a node chooses both its left and right children with different probabilities as given by a gating function, different from a hard decision node which chooses one of the two. In this paper, we extend the original algorithm by introducing local dimension reduction via L-1 and L-2 regularization for feature selection and smoother fitting. We compare our novel approach with the standard decision tree algorithms over 27 classification data sets. We see that both regularized versions have similar generalization ability with less complexity in terms of number of nodes, where L-2 seems to work slightly better than L-1.
  • Yayın
    Feature extraction from discrete attributes
    (IEEE, 2010) Yıldız, Olcay Taner
    In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. In this paper, we extract new features by combining k discrete attributes, where for each subset of size k of the attributes, we generate all orderings of values of those attributes exhaustively. We then apply the usual univariate decision tree classifier using these orderings as the new attributes. Our simulation results on 16 datasets from UCI repository [2] show that the novel decision tree classifier performs better than the proper in terms of error rate and tree complexity. The same idea can also be applied to other univariate rule learning algorithms such as C4.5Rules [7] and Ripper [3].
  • Yayın
    On the VC-dimension of univariate decision trees
    (2012) Yıldız, Olcay Taner
    In this paper, we give and prove lower bounds of the VC-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. In our previous work (Aslan et al., 2009), we proposed a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively. Using the experimental results of that work, we show that our VC-dimension bounds are tight. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross-validation.
  • Yayın
    Searching for the optimal ordering of classes in rule induction
    (IEEE, 2012-11-15) Ata, Sezin; Yıldız, Olcay Taner
    Rule induction algorithms such as Ripper, solve a K > 2 class problem by converting it into a sequence of K - 1 two-class problems. As a usual heuristic, the classes are fed into the algorithm in the order of increasing prior probabilities. In this paper, we propose two algorithms to improve this heuristic. The first algorithm starts with the ordering the heuristic provides and searches for better orderings by swapping consecutive classes. The second algorithm transforms the ordering search problem into an optimization problem and uses the solution of the optimization problem to extract the optimal ordering. We compared our algorithms with the original Ripper on 8 datasets from UCI repository [2]. Simulation results show that our algorithms produce rulesets that are significantly better than those produced by Ripper proper.