With the explosive growth of the Student information from year to year, proper classification of such enormous amount of information into our needs is a critical step towards the education success. However, it is time-consuming and labor intensive for a human to read over and correctly categorize a student data manually and Find Meaningful Pattern for the students on the real time problem scenario. Many attempts to address this challenge, automatic data classification studies are gaining more interests in Data mining research recently. Consequently, an increasing number of approaches have been developed for accomplishing such purpose, including Naïve Bayes classification, support vector machines, decision tree, neural network etc. Among these approaches, the KNN Bayes text classifier has been widely used because of its simplicity in both the training and classifying stage. Although it is less accurate than other discriminative methods (such as SVM), numerous researchers proved that it is effective enough to classify the text in many domains. Naïve Bayes models allow each attribute to contribute towards the final decision equally and independently from other attributes, in which it is more computational efficient when compared with other text classifiers. Thus, the present study focuses on employing Naïve Bayes approach as the text classifier for student data classification and thus evaluates its classification performance against other classifiers.
Educational data mining (EDM) is a field that exploits machine-learning, statistical and data-mining algorithms over the different types of educational data. Its main objective is to analyses data in order to resolve educational research issues. EDM is concerned with developing methods to explore the unique types of data in educational settings and, using these methods, to better understand students and the settings in which they learn. This data helps to understand that data extracted could be used to Find Meaningful Pattern for the students on the real time problem scenario application to be monitored at college level. Also the model can be used for the future planning of student selection criteria at college level.
SELECTIVE KNN BAYES CLASSIFIER
This section formally states the assumptions and notations and recalls the KNN Bayes and selective KNN Bayes approaches.
ASSUMPTIONS AND NOTATION
K-nearest-neighbor (kNN) classification is one of the most fundamental and simple classification methods and should be one of the first choices for a classification study when there is little or no prior knowledge about the distribution of the data.
Sample dataset: The education system in Rajasthan as state is mainly promoted by the courses designed & defined by the University of Rajasthan (UOR). The University with its constituted colleges defined has some almost 10,000 approx. student registered via he portal. The problem arises when we have sub-castes defined for the reservation quota classes. Thus to classify the sub-caste into the give set of quota classes is determined as the data mining classification problem. Using the classifier defined not only able to develop the correct classes for the given data set but also support the government reservations policies by comparing the data set with the class being defined as GENERAL. The data set defined consist of 152 entries from the Center of Converging Technologies (CCT), an autonomous institution aided by AICTE/UGC running in campus of University of Rajasthan. The entries consists of the personal details of the candidate applied for the B.Tech/M.Tech dual degree courses at CCT. The data set contains 42 categorical/ nominal attributes fig.  Like father/mother occupation, caste, sub-caste, 10th-12th percentage, medium of education etc.`
The result analysis suggest the use of KNN Bayes algorithm as the medium for classifying the small data-set. However the percentage classified is still can be improved by using the other classifier like Support Vector Machine (SVM), etc. with less time. The future extension suggest for applying the same procedure with comparison of performance being done with other classifiers for larger data set can be planned. Also if mass estimation for similarity measure can be applied for the parallel processing of the information.