Ensemble of Classifiers: Voting Classifier | by Saptashwa Bhattacharyya | Aug, 2023


The word Ensemble in the context of ML refers to a collection of a finite number of ML models (may include ANN), trained for the same task. Usually, the models are trained independently and then their predictions are combined.

When the predictions from different models differ, it is sometimes more useful to use the ensemble for classification than any individual classifier. Here, we would like to combine different classifiers and create an ensemble and then use the ensemble for the prediction task. What will be discussed in this post?

  • Use Sklearn’s VotingClassifier to build an ensemble.
  • What is Hard and Soft Voting in VotingClassifier?
  • Check individual model performance with VotingClassifier.
  • Finally, use GridSearchCV + VotingClassifier to find the best model parameters for individual models.

Let’s begin!

Data Preparation:

To see an example of VotingClassifier in action, I’m using the Heart Failure Prediction dataset (available under open database licensing). Here the task is the binary classification for predicting whether a patient with specific attributes may have heart disease or not. The dataset has 10 attributes including their age, sex, resting blood pressure etc., for data collected over 900 patients. Let’s check some distributions for different parameters. We check the ‘ClassLabel’ counts (1 represents heart disease, 0 represents healthy), i.e. healthy and ill population as a function of Sex.

Fig. 1: ClassLabel distribution as a function of the sex of the participants. (Image by Author; Codes in References).

In general, we see proportionately more Males are ill compared to Females. We can also check individual features such as Cholesterol and Resting BP distribution as below and we see that both the Cholesterol and Resting BP are higher for ill patients, especially for females.



Source link

Leave a Comment