Customer Churn Analysis
Churn prediction, or the task of identifying customers who are likely to discontinue use of a service, is an important and lucrative concern of any industry.
This project is tasked to predict the churn score for a website based on features such as:
- User demographic information
- Browsing behavior
- Historical purchase data among other information
DataSet:
- Dataset has been taken from a Hackathon, and raw dataset can be downloaded from here. Link
- Cleaned and processed version of the data can be accessed from here. Link
- Classes [Customer will EXIT(1) or NOT(0)] are properly balanced with 5:4 ratio
Models
The final model used is an ensemble of different classifiers such as:
- KNN
- Random Forest
- AdaBoost
- Xgboost
Results
- Even though Xgboost is giving good Test Accuracy of ~ 93% but we need to focus on the customers who are leaving i.e. class 1, so that we can retain them with some discount offer on membership.
- Ensemble methods (stack classifier) is having 94% of recall for predicting the customers who are likely to leave, higher than Xgboost.
- Following is confusion matrix of final classifier (stack ensemble) and xgboost classifier.
Techstack
- Python version: 3.7
- Packages: pandas, numpy, sklearn, xgboost, fastapi, seaborn
- Cloud: heroku
For result matrices and code notebooks checkout github repository of this project.