EFFECT OF FEATURE SELECTION AND DATASET SIZE ON THE ACCURACY OF NAÏVE BAYESIAN CLASSIFIER AND LOGISTICS REGRESSION
EFFECT OF FEATURE SELECTION AND DATASET SIZE ON THE ACCURACY OF NAÏVE BAYESIAN CLASSIFIER AND LOGISTICS REGRESSION
No Thumbnail Available
Date
2018-06
Authors
ANAKOBE, Muhammad Bashir
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Binary Logistics Regression and Naïve Bayesian classifier are two of the common classification modelling techniques that allow one to predict the category that a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. We studied the classification performances of the two linear classification under different feature (variable) selection criteria and dataset size conditions on a medical domain area were studied based on the datasets (breast cancer and heart diseases) obtained from the University of California, Irvine, online respiratory. The result indicated that logistics Regression for classification on relatively large datasets without the application of PCA (for variable selection) has the great accuracy (91.4%), while Naïve Bayesian classifier with PCA (for variable/ feature selection) tops the smaller dataset classification with an accuracy of 90.2%. These two accuracies are close enough and high enough, which is an indication of high relevance of their selections in solving classification problems on datasets from this kind of domain.
Description
A DISSERTATION SUBMITTED TO THE SCHOOL OF POSTGRADUATE STUDIES, AHMADU BELLO UNIVERSITY, ZARIA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD OF
MASTER DEGREE Department of Statistics, Faculty of Physical Sciences, Ahmadu Bello University, Zaria
Keywords
EFFECT,, FEATURE SELECTION,, DATASET SIZE,, ACCURACY,, NAÏVE BAYESIAN CLASSIFIER,, LOGISTICS REGRESSION