Customer Churn Analysis

GitHub DemoApp

Churn prediction, or the task of identifying customers who are likely to discontinue use of a service, is an important and lucrative concern of any industry.

This project is tasked to predict the churn score for a website based on features such as:

  • User demographic information
  • Browsing behavior
  • Historical purchase data among other information

DataSet:

  • Dataset has been taken from a Hackathon, and raw dataset can be downloaded from here. Link
  • Cleaned and processed version of the data can be accessed from here. Link
  • Classes [Customer will EXIT(1) or NOT(0)] are properly balanced with 5:4 ratio

Models

The final model used is an ensemble of different classifiers such as:

  • KNN
  • Random Forest
  • AdaBoost
  • Xgboost

Results

  • Even though Xgboost is giving good Test Accuracy of ~ 93% but we need to focus on the customers who are leaving i.e. class 1, so that we can retain them with some discount offer on membership.
  • Ensemble methods (stack classifier) is having 94% of recall for predicting the customers who are likely to leave, higher than Xgboost.
  • Following is confusion matrix of final classifier (stack ensemble) and xgboost classifier.

Techstack

  • Python version: 3.7
  • Packages: pandas, numpy, sklearn, xgboost, fastapi, seaborn
  • Cloud: heroku

For result matrices and code notebooks checkout github repository of this project.