Applied Data Scientist
Data Scientist with 3+ years of experience in retail analytics, specializing in machine learning and NLP to drive scalable, data-driven business solutions. Skilled in SQL, PySpark, Python (Pandas), and PyTorch, with hands-on experience in designing, deploying, and optimizing ML models for large-scale data processing and decision support.
My research background in NLP includes a project on the legal domain, summarization of long documents at SCAAI under the guidance of Dr. Shila Gite and co-authored a paper titled: "Indian Legal Corpus (ILC): A Dataset for Summarizing Indian Legal Proceedings Using Natural Language".
I graduated with a Master of Technology (M.Tech) degree in Data Science and Machine Learning from PES University in 2022. I earned my Bachelor's degree in Information Technology from SRM University in 2019. I am currently based in Bengaluru, India.
I was an AWS Community Builder in Machine Learning for the 2021 Fall cohort. I continuously contribute to the ML community through open-source models on 🤗 Hugging Face, Kaggle competitions, and technical blogs.
I have worked with data analysis tools such as Pandas, SQL, and Hive. For my NLP and Computer Vision work, I've used PyTorch, the Hugging Face ecosystem to train large language models, and Weights and Biases for tracking ML experiments. I've also worked with RAG and LangChain to build applications.
Building and deploying end-to-end machine learning solutions for personalization at the UK’s largest retailer, delivering £250M in incremental revenue impact. Developed and productionized classification and regression models using Python, PySpark, and scikit-learn to support large-scale customer analytics. Designed robust A/B testing frameworks to measure uplift and validate model performance in live environments, while building scalable PySpark data pipelines to process high-volume retail data. Applied rigorous feature engineering, model validation, and ongoing performance monitoring to ensure reliable, long-term business impact.
Developed an offline OCR based handwriting recognition and translation system using Transformer based deep learning architecture. Built multiple Computer Vision and NLP projects (Skin Cancer detection, Text classification), reproducing results from research papers using TensorFlow and Keras.
Designed and deployed an LLM-based long-document summarization pipeline for the legal domain, processing complex, multi-page Indian court judgments. Built a proprietary dataset of 3,000+ Indian legal judgments and summaries through large-scale web scraping, cleaning, and normalization to enable supervised training and evaluation. Benchmarked multiple transformer and LLM variants to inform model selection, achieving a ROUGE-2 F1 score of 23.18 on a held-out test set. Published the Indian Legal Corpus (ILC) as a peer-reviewed research contribution, demonstrating end-to-end ownership from data creation to model evaluation.
Advanced studies in machine learning, focusing on transformer architectures, NLP, and deep learning systems.
Foundation in Computer Science and Information Technology.
Get a comprehensive overview of my experience, publications, and technical skills
Download CV (PDF)