Arama Sonuçları

Listeleniyor 1 - 9 / 9
  • Yayın
    Univariate margin tree
    (Springer, 2010) Yıldız, Olcay Taner
    In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. In this paper, we propose a new decision tree learning algorithm called univariate margin tree, where for each continuous attribute, the best split is found using convex optimization. Our simulation results on 47 datasets show that the novel margin tree classifier performs at least as good as C4.5 and LDT with a similar time complexity. For two class datasets it generates smaller trees than C4.5 and LDT without sacrificing from accuracy, and generates significantly more accurate trees than C4.5 and LDT for multiclass datasets with one-vs-rest methodology.
  • Yayın
    Incremental construction of rule ensembles using classifiers produced by different class orderings
    (IEEE, 2016) Yıldız, Olcay Taner; Ulaş, Aydın
    In this paper, we discuss a novel approach to incrementally construct a rule ensemble. The approach constructs an ensemble from a dynamically generated set of rule classifiers. Each classifier in this set is trained by using a different class ordering. We investigate criteria including accuracy, ensemble size, and the role of starting point in the search. Fusion is done by averaging. Using 22 data sets, floating search finds small, accurate ensembles in polynomial time.
  • Yayın
    Soft decision trees
    (IEEE, 2012) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We discuss a novel decision tree architecture with soft decisions at the internal nodes where we choose both children with probabilities given by a sigmoid gating function. Our algorithm is incremental where new nodes are added when needed and parameters are learned using gradient-descent. We visualize the soft tree fit on a toy data set and then compare it with the canonical, hard decision tree over ten regression and classification data sets. Our proposed model has significantly higher accuracy using fewer nodes.
  • Yayın
    VC-dimension of rule sets
    (IEEE Computer Soc, 2014-12-04) Yıldız, Olcay Taner
    In this paper, we give and prove lower bounds of the VC-dimension of the rule set hypothesis class where the input features are binary or continuous. The VC-dimension of the rule set depends on the VC-dimension values of its rules and the number of inputs.
  • Yayın
    Bilingual software requirements tracing using vector space model
    (SciTePress, 2014) Yıldız, Olcay Taner; Okutan, Ahmet; Solak, Ercan
    In the software engineering world, creating and maintaining relationships between byproducts generated during the software lifecycle is crucial. A typical relation is the one that exists between an item in the requirements document and a block in the subsequent system design, i.e. class in the source code. In many software engineering projects, the requirement documentation is prepared in the language of the developers, whereas developers prefer to use the English language in the software development process. In this paper, we use the vector space model to extract traceability links between the requirements written in one language (Turkish) and the implementations of classes in another language (English). The experiments show that, by using a generic translator such as Google translate, we can obtain promising results, which can also be improved by using comment info in the source code.
  • Yayın
    Feature extraction from discrete attributes
    (IEEE, 2010) Yıldız, Olcay Taner
    In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. In this paper, we extract new features by combining k discrete attributes, where for each subset of size k of the attributes, we generate all orderings of values of those attributes exhaustively. We then apply the usual univariate decision tree classifier using these orderings as the new attributes. Our simulation results on 16 datasets from UCI repository [2] show that the novel decision tree classifier performs better than the proper in terms of error rate and tree complexity. The same idea can also be applied to other univariate rule learning algorithms such as C4.5Rules [7] and Ripper [3].
  • Yayın
    On the VC-dimension of univariate decision trees
    (2012) Yıldız, Olcay Taner
    In this paper, we give and prove lower bounds of the VC-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. In our previous work (Aslan et al., 2009), we proposed a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively. Using the experimental results of that work, we show that our VC-dimension bounds are tight. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross-validation.
  • Yayın
    English-Turkish parallel treebank with morphological annotations and its use in tree-based SMT
    (SciTePress, 2016) Görgün, Onur; Yıldız, Olcay Taner; Solak, Ercan; Ehsani, Razieh
    In this paper, we report our tree based statistical translation study from English to Turkish. We describe our data generation process and report the initial results of tree-based translation under a simple model. For corpus construction, we used the Penn Treebank in the English side. We manually translated about 5K trees from English to Turkish under grammar constraints with adaptations to accommodate the agglutinative nature of Turkish morphology. We used a permutation model for subtrees together with a word to word mapping. We report BLEU scores under simple choices of inference algorithms.
  • Yayın
    A novel regression method for software defect prediction with kernel methods
    (2013) Okutan, Ahmet; Yıldız, Olcay Taner
    In this paper, we propose a novel method based on SVM to predict the number of defects in the files or classes of a software system. To model the relationship between source code similarity and defectiveness, we use SVM with a precomputed kernel matrix. Each value in the kernel matrix shows how much similarity exists between the files or classes of the software system tested. The experiments on 10 Promise datasets indicate that SVM with a precomputed kernel performs as good as the SVM with the usual linear or RBF kernels in terms of the root mean square error (RMSE). The method proposed is also comparable with other regression methods like linear regression and IBK. The results of this study suggest that source code similarity is a good means of predicting the number of defects in software modules. Based on the results of our analysis, the developers can focus on more defective modules rather than on less or non defective ones during testing activities.