Calculating the VC-dimension of decision trees

Yükleniyor...
Küçük Resim

Tarih

2009

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.

Açıklama

Anahtar Kelimeler

Regression, Machine, Binary features, Cross validation, Exhaustive search, Exhaustive search algorithms, Generalization bound, Simulation result, Subtrees, Training sets, Univariate, VC-dimension, Decision trees, Information science, Learning algorithms, Reluctance motors, Binary trees, Training data, Virtual colonoscopy, Testing, Predictive models, Algorithm design and analysis, Design optimization, Regression tree analysis, Machine learning, Learning (artificial intelligence), Search problems, Exhaustive search algorithm, Univariate decision trees, General regressor, VC-generalization bounds, Complexity control, SRM, Cross-validation, Vapnik-Chervonenkis theory, Structural risk minimization

Kaynak

2009 24th International Symposium on Computer and Information Sciences

WoS Q Değeri

N/A

Scopus Q Değeri

N/A

Cilt

Sayı

Künye

Aslan, O., Yıldız, O. T. & Alpaydın, A. İ. E. (2009). Calculating the VC-dimension of decision trees. Paper presented at the 2009 24th International Symposium on Computer and Information Sciences, 193-198. doi:10.1109/ISCIS.2009.5291847