Customer Churn Analysis

GitHub DemoApp

Churn prediction, or the task of identifying customers who are likely to discontinue use of a service, is an important and lucrative concern of any industry.

This project is tasked to predict the churn score for a website based on features such as:

User demographic information
Browsing behavior
Historical purchase data among other information

DataSet:

Dataset has been taken from a Hackathon, and raw dataset can be downloaded from here. Link
Cleaned and processed version of the data can be accessed from here. Link
Classes [Customer will EXIT(1) or NOT(0)] are properly balanced with 5:4 ratio

Models

The final model used is an ensemble of different classifiers such as:

KNN
Random Forest
AdaBoost
Xgboost

Results

Even though Xgboost is giving good Test Accuracy of ~ 93% but we need to focus on the customers who are leaving i.e. class 1, so that we can retain them with some discount offer on membership.
Ensemble methods (stack classifier) is having 94% of recall for predicting the customers who are likely to leave, higher than Xgboost.
Following is confusion matrix of final classifier (stack ensemble) and xgboost classifier.

Techstack

Python version: 3.7
Packages: pandas, numpy, sklearn, xgboost, fastapi, seaborn
Cloud: heroku

For result matrices and code notebooks checkout github repository of this project.

Share on

Twitter Facebook LinkedIn

Pawan Trivedi

DataSet:

Models

Results

Techstack

Share on