← Back to Projects

Customer Churn Analysis

ML system to identify and retain at-risk customers

Machine Learning Ensemble Methods XGBoost FastAPI

Overview

Churn prediction, or the task of identifying customers who are likely to discontinue use of a service, is an important and lucrative concern of any industry.

This project predicts the churn score for a website based on features such as:

  • User demographic information
  • Browsing behavior
  • Historical purchase data among other information
Customer Churn Analysis Visualization

Dataset

  • Dataset taken from a Hackathon (Link)
  • Cleaned and processed version available on GitHub
  • Classes [Customer will EXIT(1) or NOT(0)] are properly balanced with 5:4 ratio

Models

The final model used is an ensemble of different classifiers:

  • KNN (K-Nearest Neighbors)
  • Random Forest
  • AdaBoost
  • XGBoost

Results

Key Achievement: Ensemble methods (stack classifier) achieved 94% recall for predicting customers who are likely to leave, higher than XGBoost alone.

  • XGBoost achieved ~93% test accuracy
  • Focus on identifying customers leaving (class 1) to retain them with incentives
  • Stack ensemble classifier optimized for recall to minimize false negatives

Tech Stack

  • Python version: 3.7
  • Packages: pandas, numpy, sklearn, xgboost, fastapi, seaborn
  • Cloud: Heroku