Arama Sonuçları

Listeleniyor 1 - 10 / 21
  • Yayın
    Reviewing the effects of spatial features on price prediction for real estate market: Istanbul case
    (IEEE, 2022-09-16) Ecevit, Mert İlhan; Erdem, Zeki; Dağ, Hasan
    In the real estate market, spatial features play a crucial role in determining property appraisals and prices. When spatial features are considered, classification techniques have been rarely studied compared to regression, which is commonly used for price prediction. This study reviews spatial features' effects on predicting the house price ranges for real estate in Istanbul, Turkey, in the classification context. Spatial features are generated and extracted by geocoding the address information from the original data set. This geocoding and feature extraction is another challenge in this research. The experiments compare the performance of Decision Trees (DT), Random Forests (RF), and Logistic Regression (LR) classifier models on the data set with and without spatial features. The prediction models are evaluated based on classification metrics such as accuracy, precision, recall, and F1-Score. We additionally examine the ROC curve of each classifier. The test results show that the RF model outperforms the DT and LR models. It is observed that spatial features, when incorporated with non-spatial features, significantly improve the prediction performance of the models for the house price ranges. It is considered that the results can contribute to making decisions more accurately for the appraisal in the real estate industry.
  • Yayın
    Soft decision trees
    (IEEE, 2012) İrsoy, Ozan; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    We discuss a novel decision tree architecture with soft decisions at the internal nodes where we choose both children with probabilities given by a sigmoid gating function. Our algorithm is incremental where new nodes are added when needed and parameters are learned using gradient-descent. We visualize the soft tree fit on a toy data set and then compare it with the canonical, hard decision tree over ten regression and classification data sets. Our proposed model has significantly higher accuracy using fewer nodes.
  • Yayın
    GIS aided vulnerability assessment for roads
    (Springer Science and Business Media B.V., 2022-04-21) Çalışkan, Berna; Atahan, Ali Osman; Kesten, Ali Sercan
    Road networks are vulnerable to natural disasters such as floods, earthquakes and forest fires which can adversely affect the travel on the network. However, not all road links equally affect the travel conditions in a given network; typically some links are more critical to the network functioning than the others. The first stage of study involves the investigation of geological conditions. Image classification used for extracting information classes from ‘Geological Map of Istanbul area’ image file. The resulting raster layer used to create thematic map. A reclassification was performed for lithologic types. The second stage involves analyzing topological situation. A slope map prepared and classified according to percentage of slope values. The third phase is the analysis and interpretation of the accumulated data to establish suitable and applicable road vulnerability scores. The information in the source data for each vulnerability factor are classified into three different vulnerability scores: +2 (considerably increases vulnerability), +1 (increases vulnerability) and 0 (does not increase vulnerability) by using a vulnerability score table. The study area was categorized into three different traffic analysis zones as: (1) least favorable area; (2) favorable area; (3) most favorable area. Vulnerability values obtained to measure serviceability of critical links in dense urban road networks and applies them to the case of ‘Beyoğlu’ region. Thematic layers were prepared using the Geographic Information System (GIS), and they were then combined to produce the serviceability of road links in the ‘Beyoğlu’ region. Consequently, A site specific vulnerability index is proposed, considering the serviceability of road links. A conceptual flowchart of the GIS processing steps taken to obtain the vulnerability index is illustrated.
  • Yayın
    Effective semi-supervised learning strategies for automatic sentence segmentation
    (Elsevier Science BV, 2018-04-01) Dalva, Doğan; Güz, Ümit; Gürkan, Hakan
    The primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.
  • Yayın
    Biometric identification using fingertip electrocardiogram signals
    (Springer London Ltd, 2018-07) Güven, Gökhan; Gürkan, Hakan; Güz, Ümit
    In this research work, we present a newly fingertip electrocardiogram (ECG) data acquisition device capable of recording the lead-1 ECG signal through the right- and left-hand thumb fingers. The proposed device is high-sensitive, dry-contact, portable, user-friendly, inexpensive, and does not require using conventional components which are cumbersome and irritating such as wet adhesive Ag/AgCl electrodes. One of the other advantages of this device is to make it possible to record and use the lead-1 ECG signal easily in any condition and anywhere incorporating with any platform to use for advanced applications such as biometric recognition and clinical diagnostics. Furthermore, we proposed a biometric identification method based on combining autocorrelation and discrete cosine transform-based features, cepstral features, and QRS beat information. The proposed method was evaluated on three fingertip ECG signal databases recorded by utilizing the proposed device. The experimental results demonstrate that the proposed biometric identification method achieves person recognition rate values of 100% (30 out of 30), 100% (45 out of 45), and 98.33% (59 out of 60) for 30, 45, and 60 subjects, respectively.
  • Yayın
    Regularizing soft decision trees
    (Springer, 2013) Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem
    Recently, we have proposed a new decision tree family called soft decision trees where a node chooses both its left and right children with different probabilities as given by a gating function, different from a hard decision node which chooses one of the two. In this paper, we extend the original algorithm by introducing local dimension reduction via L-1 and L-2 regularization for feature selection and smoother fitting. We compare our novel approach with the standard decision tree algorithms over 27 classification data sets. We see that both regularized versions have similar generalization ability with less complexity in terms of number of nodes, where L-2 seems to work slightly better than L-1.
  • Yayın
    ISIKUN at the FinCausal 2020: Linguistically informed machine-learning approach for causality identification in financial documents
    (Association for Computational Linguistics (ACL), 2020) Özenir, Hüseyin Gökberk; Karadeniz, İlknur
    This paper presents our participation to the FinCausal-2020 Shared Task whose ultimate aim is to extract cause-effect relations from a given financial text. Our participation includes two systems for the two sub-tasks of the FinCausal-2020 Shared Task. The first sub-task (Task-1) consists of the binary classification of the given sentences as causal meaningful (1) or causal meaningless (0). Our approach for the Task-1 includes applying linear support vector machines after transforming the input sentences into vector representations using term frequency-inverse document frequency scheme with 3-grams. The second sub-task (Task-2) consists of the identification of the cause-effect relations in the sentences, which are detected as causal meaningful. Our approach for the Task-2 is a CRF-based model which uses linguistically informed features. For the Task-1, the obtained results show that there is a small difference between the proposed approach based on linear support vector machines (F-score 94%), which requires less time compared to the BERT-based baseline (F-score 95%). For the Task-2, although a minor modifications such as the learning algorithm type and the feature representations are made in the conditional random fields based baseline (F-score 52%), we have obtained better results (F-score 60%). The source codes for the both tasks are available online (https://github.com/ozenirgokberk/FinCausal2020.git/).
  • Yayın
    Texture recognition for frog identification
    (ACM SIGMM, 2012-11-02) Cannavo, Flavio; Nunnari, Giuseppe; Kale, İzzet; Tek, Faik Boray
    This paper describes a visual processing technique for automatic frog (Xenopus Laevis sp.) localization and identification. The problem of frog identification is to process and classify an unknown frog image to determine the identity which is recorded previously on an image database. The frog skin pattern (i.e. texture) provides a unique feature for identification. Hence, the study investigates three different kind of features (i.e. Gabor filters, granulometry, threshold set compactness) to extract texture information. The classifier is built on nearest neighbor principle; it assigns the query feature to the database feature which has the minimum distance. Hence, the study investigates different distance measures and compares their performance. The detailed results show that the most successful feature and distance measure is granulometry and weighted L1 norm for the frog identification using skin texture features.
  • Yayın
    Tree Ensembles on the induced discrete space
    (Institute of Electrical and Electronics Engineers Inc., 2016-05) Yıldız, Olcay Taner
    Decision trees are widely used predictive models in machine learning. Recently, K-tree is proposed, where the original discrete feature space is expanded by generating all orderings of values of k discrete attributes and these orderings are used as the new attributes in decision tree induction. Although K-tree performs significantly better than the proper one, their exponential time complexity can prohibit their use. In this brief, we propose K-forest, an extension of random forest, where a subset of features is selected randomly from the induced discrete space. Simulation results on 17 data sets show that the novel ensemble classifier has significantly lower error rate compared with the random forest based on the original feature space.
  • Yayın
    A novel approach to morphological disambiguation for Turkish
    (Springer-Verlag, 2012) Görgün, Onur; Yıldız, Olcay Taner
    In this paper, we propose a classification based approach to the morphological disambiguation for Turkish language. Due to complex morphology in Turkish, any word can get unlimited number of affixes resulting very large tag sets. The problem is defined as choosing one of parses of a word not taking the existing root word into consideration. We trained our model with well-known classifiers using WEKA toolkit and tested on a common test set. The best performance achieved is 95.61% by J48 Tree classifier.