using C4.5 classification algorithm was carried out in Pima Indians Diabetes Database [3]. A detailed analysis of the Pima diabetic data set was carried out efficiently using of Hive and R. In this analysis we can derive some interesting facts, which can be used to develop the prediction models [4]. The soft computing based prediction model was developed for finding the risks accumulated by the diabetic patients. They have experimented with real time clinical data using Genetic Algorithm [5]. The obtained results pertaining to the level of risk which prone to either heart attack or stroke. The novel pre-processing phase with missing value imputation for both numerical and categorical data. A hybrid combination of Classification and Regression Trees (CART) and Genetic Algorithms to impute missing continuous values and Self Organizing Feature Maps (SOFM) to impute categorical values was improved in [6]