Arama Sonuçları

Listeleniyor 1 - 5 / 5
  • Yayın
    Sınıflandırma için diferansiyel mahremiyete dayalı öznitelik seçimi
    (Gazi Univ, Fac Engineering Architecture, 2018) Var, Esra; İnan, Ali
    Veri madenciliği ve makine öğrenmesi çözümlerinin en önemli ön aşamalarından biri yapılacak analizde kullanılacak verinin özniteliklerinin uygun bir alt kümesini belirlemektir. Sınıflandırma yöntemleri için bu işlem, bir özniteliğin sınıf niteliği ile ne oranda ilişkili olduğuna bakılarak yapılır. Kişisel gizliliği koruyan pek çok sınıflandırma çözümü bulunmaktadır. Ancak bu yöntemler için öznitelik seçimi yapan çözümler geliştirilmemiştir. Bu çalışmada, istatistiksel veritabanı güvenliğinde bilinen en kapsamlı ve güvenli çözüm olan diferansiyel mahremiyete dayalı özgün öznitelik seçimi yöntemleri sunulmaktadır. Önerilen bu yöntemler, yaygın olarak kullanılan bir veri madenciliği kütüphanesi olan WEKA ile entegre edilmiş ve deney sonuçları ile önerilen çözümlerin sınıflandırma başarımına olumlu etkileri gösterilmiştir.
  • Yayın
    Mixture of Gaussian models and bayes error under differential privacy
    (2011) Xi, Bowei; Kantarcıoğlu, Murat; İnan, Ali
    Gaussian mixture models are an important tool in Bayesian decision theory. In this study, we focus on building such models over statistical database protected under differential privacy. Our approach involves querying necessary statistics from a database and building a Bayesian classifier over the noise added responses generated according to differential privacy. We formally analyze the sensitivity of our query set. Since there are multiple methods to query a statistic, either directly or indirectly, we analyze the sensitivities for different querying methods. Furthermore we establish theoretical bounds for the Bayes error for the univariate (one dimensional) case. We study the Bayes error for the multivariate (high dimensional) case in experiments with both simulated data and real life data. We discover that adding Laplace noise to a statistic under certain constraint is problematic. For example variance-covariance matrix is no longer positive definite after noise addition. We propose a heuristic method to fix the noise added variance-covariance matrix.
  • Yayın
    Mobile applications discovery: a subscriber-centric approach
    (Wiley Periodicals, 2011-03) Erman, Bilgehan; İnan, Ali; Nagarajan, Ramesh; Uzunalioğlu, Hüseyin
    Rapid adoption of smartphones and the business success of the Apple App Store have resulted in the rampant growth of mobile applications. Seeking new revenue opportunities from application development has created a gold rush. However, free or very cheap applications constitute a great bulk of the application downloads putting great pricing pressure on the developers. Furthermore, usage statistics suggest that most of the applications have been either one-trick applications or are downright useless, meriting no attention from the user beyond the first day. This is not surprising since cheap prices will dissuade developers from investing large sums of money to continue to develop more sophisticated, high quality applications. Developers have been complaining about the lack of visibility of their applications in stores that are beginning to resemble a high volume warehouse. It is clear that enhancing application discovery and building better marketing tools will be essential for the continued success of the mobile application marketplace and application stores. This paper proposes and investigates techniques for effective discovery of applications by matching user interests with application characteristics, with a special focus on adapting classical data mining techniques to user ratings of the applications. The user ratings are leveraged to make recommendations on potential applications of interest.
  • Yayın
    A hybrid approach to private record matching
    (IEEE Computer Soc, 2012-10) İnan, Ali; Kantarcıoğlu, Murat; Ghinita, Gabriel; Bertino, Elisa
    Real-world entities are not always represented by the same set of features in different data sets. Therefore, matching records of the same real-world entity distributed across these data sets is a challenging task. If the data sets contain private information, the problem becomes even more difficult. Existing solutions to this problem generally follow two approaches: sanitization techniques and cryptographic techniques. We propose a hybrid technique that combines these two approaches and enables users to trade off between privacy, accuracy, and cost. Our main contribution is the use of a blocking phase that operates over sanitized data to filter out in a privacy-preserving manner pairs of records that do not satisfy the matching condition. We also provide a formal definition of privacy and prove that the participants of our protocols learn nothing other than their share of the result and what can be inferred from their share of the result, their input and sanitized views of the input data sets (which are considered public information). Our method incurs considerably lower costs than cryptographic techniques and yields significantly more accurate matching results compared to sanitization techniques, even when privacy requirements are high.
  • Yayın
    Efficient privacy-aware record integration
    (2013) Kuzu, Mehmet; Kantarcıoğlu, Murat; İnan, Ali; Bertino, Elisa; Durham, Elizabeth Ashley; Malin, Bradley A.
    The integration of information dispersed among multiple repositories is a crucial step for accurate data analysis in various domains. In support of this goal, it is critical to devise procedures for identifying similar records across distinct data sources. At the same time, to adhere to privacy regulations and policies, such procedures should protect the confidentiality of the individuals to whom the information corresponds. Various private record linkage (PRL) protocols have been proposed to achieve this goal, involving secure multi-party computation (SMC) and similarity preserving data transformation techniques. SMC methods provide secure and accurate solutions to the PRL problem, but are prohibitively expensive in practice, mainly due to excessive computational requirements. Data transformation techniques offer more practical solutions, but incur the cost of information leakage and false matches. In this paper, we introduce a novel model for practical PRL, which 1) affords controlled and limited information leakage, 2) avoids false matches resulting from data transformation. Initially, we partition the data sources into blocks to eliminate comparisons for records that are unlikely to match. Then, to identify matches, we apply an efficient SMC technique between the candidate record pairs. To enable efficiency and privacy, our model leaks a controlled amount of obfuscated data prior to the secure computations. Applied obfuscation relies on differential privacy which provides strong privacy guarantees against adversaries with arbitrary background knowledge. In addition, we illustrate the practical nature of our approach through an empirical analysis with data derived from public voter records.