Işık Üniversitesi Kurumsal Akademik Belleği :: DSpace Angular

Arama Sonuçları

Listeleniyor 1 - 10 / 11

Calculating the VC-dimension of decision trees
(IEEE, 2009) Aslan, Özlem; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.
Soft decision trees
(IEEE, 2012) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
We discuss a novel decision tree architecture with soft decisions at the internal nodes where we choose both children with probabilities given by a sigmoid gating function. Our algorithm is incremental where new nodes are added when needed and parameters are learned using gradient-descent. We visualize the soft tree fit on a toy data set and then compare it with the canonical, hard decision tree over ten regression and classification data sets. Our proposed model has significantly higher accuracy using fewer nodes.
VC-dimension of rule sets
(IEEE Computer Soc, 2014-12-04) Yıldız, Olcay Taner
In this paper, we give and prove lower bounds of the VC-dimension of the rule set hypothesis class where the input features are binary or continuous. The VC-dimension of the rule set depends on the VC-dimension values of its rules and the number of inputs.
Budding trees
(IEEE Computer Soc, 2014-08-24) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
We propose a new decision tree model, named the budding tree, where a node can be both a leaf and an internal decision node. Each bud node starts as a leaf node, can then grow children, but then later on, if necessary, its children can be pruned. This contrasts with traditional tree construction algorithms that only grows the tree during the training phase, and prunes it in a separate pruning phase. We use a soft tree architecture and show that the tree and its parameters can be trained using gradient-descent. Our experimental results on regression, binary classification, and multi-class classification data sets indicate that our newly proposed model has better performance than traditional trees in terms of accuracy while inducing trees of comparable size.
VC-dimension of univariate decision trees
(IEEE-INST Electrical Electronics Engineers Inc, 2015-02-25) Yıldız, Olcay Taner
In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
Feature extraction from discrete attributes
(IEEE, 2010) Yıldız, Olcay Taner
In many pattern recognition applications, first decision trees are used due to their simplicity and easily interpretable nature. In this paper, we extract new features by combining k discrete attributes, where for each subset of size k of the attributes, we generate all orderings of values of those attributes exhaustively. We then apply the usual univariate decision tree classifier using these orderings as the new attributes. Our simulation results on 16 datasets from UCI repository [2] show that the novel decision tree classifier performs better than the proper in terms of error rate and tree complexity. The same idea can also be applied to other univariate rule learning algorithms such as C4.5Rules [7] and Ripper [3].
On the VC-dimension of univariate decision trees
(2012) Yıldız, Olcay Taner
In this paper, we give and prove lower bounds of the VC-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. In our previous work (Aslan et al., 2009), we proposed a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively. Using the experimental results of that work, we show that our VC-dimension bounds are tight. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross-validation.
Searching for the optimal ordering of classes in rule induction
(IEEE, 2012-11-15) Ata, Sezin; Yıldız, Olcay Taner
Rule induction algorithms such as Ripper, solve a K > 2 class problem by converting it into a sequence of K - 1 two-class problems. As a usual heuristic, the classes are fed into the algorithm in the order of increasing prior probabilities. In this paper, we propose two algorithms to improve this heuristic. The first algorithm starts with the ordering the heuristic provides and searches for better orderings by swapping consecutive classes. The second algorithm transforms the ordering search problem into an optimization problem and uses the solution of the optimization problem to extract the optimal ordering. We compared our algorithms with the original Ripper on 8 datasets from UCI repository [2]. Simulation results show that our algorithms produce rulesets that are significantly better than those produced by Ripper proper.
Müşterilerin GSP analizi kullanarak kümelenmesi
(Institute of Electrical and Electronics Engineers Inc., 2018-07-05) Pakyürek, Muhammet; Sezgin, Mehmet Selman; Kestepe, Sedat; Bora, Büşra; Düzağaç, Remzi; Yıldız, Olcay Taner
Bu çalışma ile mevcut misafir ve rezervasyon verisi kullanılarak doğal öbeklenmeleri tespit ederek misafir davranışları tespit ettik. Ayrıca verilen hizmetleri ve satış stratejilerini bu davranışlara göre özelleştirdik. K-ortalama ile kişileri öbekledikten sonra bu mevcut öbeklenmeleri sağlayan temel karakteristikler karar ağacı yaklaşımı ile çıkartılmıştır. Bu karakteristiklerin kişinin ürün alma kanalı, belirli ürün tercihleri, rezervasyon süresi, sezonsal tercihi vb. olduğu tespit edilmiştir. Bu karakteristiklerin her öbeklenmede ciddi değişiklikler göstermiş olması çözümün genel olarak doğru olduğunun ve bu karakteristiklerin başarılı bir şekilde seçildiğini göstermektedir. Bu çalışma, grup karakteristiklerine uygun kampanyalar ve ürün paketleri oluşturulmasında önemli bir rol oynamaktadır.
Tweet sentiment analysis for cryptocurrencies
(IEEE, 2021-10-13) Şaşmaz, Emre; Tek, Faik Boray
Many traders believe in and use Twitter tweets to guide their daily cryptocurrency trading. In this project, we investigated the feasibility of automated sentiment analysis for cryptocurrencies. For the study, we targeted one cryptocurrency (NEO) altcoin and collected related data. The data collection and cleaning were essential components of the study. First, the last five years of daily tweets with NEO hashtags were obtained from Twitter. The collected tweets were then filtered to contain or mention only NEO. We manually tagged a subset of the tweets with positive, negative, and neutral sentiment labels. We trained and tested a Random Forest classifier on the labeled data where the test set accuracy reached 77%. In the second phase of the study, we investigated whether the daily sentiment of the tweets was correlated with the NEO price. We found positive correlations between the number of tweets and the daily prices, and between the prices of different crypto coins. We share the data publicly.

Filtreler

Yazar

Konu

Tarih

İndeks

WoS Q

Scopus Q

Dil

Tür

Kategori

Bölüm

Erişim Hakkı

Tam Metin

Öğe Türü

Ayarlar

Sırala

Sayfa Başına Sonuç

Arama Sonuçları