Pawan Trivedi - Data Scientist & ML Engineer

Data Scientist with 3+ years of experience in retail analytics, specializing in machine learning and NLP to drive scalable, data-driven business solutions. Skilled in SQL, PySpark, Python (Pandas), and PyTorch, with hands-on experience in designing, deploying, and optimizing ML models for large-scale data processing and decision support.

My research background in NLP includes a project on the legal domain, summarization of long documents at SCAAI under the guidance of Dr. Shila Gite and co-authored a paper titled: "Indian Legal Corpus (ILC): A Dataset for Summarizing Indian Legal Proceedings Using Natural Language".

I graduated with a Master of Technology (M.Tech) degree in Data Science and Machine Learning from PES University in 2022. I earned my Bachelor's degree in Information Technology from SRM University in 2019. I am currently based in Bengaluru, India.

I was an AWS Community Builder in Machine Learning for the 2021 Fall cohort. I continuously contribute to the ML community through open-source models on 🤗 Hugging Face, Kaggle competitions, and technical blogs.

I have worked with data analysis tools such as Pandas, SQL, and Hive. For my NLP and Computer Vision work, I've used PyTorch, the Hugging Face ecosystem to train large language models, and Weights and Biases for tracking ML experiments. I've also worked with RAG and LangChain to build applications.

Experience

Feb. 2023 - Present

Tesco

Data Scientist

Building and deploying end-to-end machine learning solutions for personalization at the UK’s largest retailer, delivering £250M in incremental revenue impact. Developed and productionized classification and regression models using Python, PySpark, and scikit-learn to support large-scale customer analytics. Designed robust A/B testing frameworks to measure uplift and validate model performance in live environments, while building scalable PySpark data pipelines to process high-volume retail data. Applied rigorous feature engineering, model validation, and ongoing performance monitoring to ensure reliable, long-term business impact.

Jan. 2022 - Dec. 2022

LearnKarts

Research Analyst

Developed an offline OCR based handwriting recognition and translation system using Transformer based deep learning architecture. Built multiple Computer Vision and NLP projects (Skin Cancer detection, Text classification), reproducing results from research papers using TensorFlow and Keras.

Sep. 2021 - Apr. 2022

SCAII

NLP Research Intern

Designed and deployed an LLM-based long-document summarization pipeline for the legal domain, processing complex, multi-page Indian court judgments. Built a proprietary dataset of 3,000+ Indian legal judgments and summaries through large-scale web scraping, cleaning, and normalization to enable supervised training and evaluation. Benchmarked multiple transformer and LLM variants to inform model selection, achieving a ROUGE-2 F1 score of 23.18 on a held-out test set. Published the Indian Legal Corpus (ILC) as a peer-reviewed research contribution, demonstrating end-to-end ownership from data creation to model evaluation.

Education

2022

Masters in Machine Learning

PES University

Advanced studies in machine learning, focusing on transformer architectures, NLP, and deep learning systems.

2019

Bachelor of Technology in IT

SRM University

Foundation in Computer Science and Information Technology.

Skills & Expertise

ML/AI Frameworks

PyTorch
Hugging Face Transformers
vLLM
unsloth
Scikit-learn

Research Areas

Natural Language Processing
Deep Learning
Low-Resource NLP

Engineering

Python
LLM
AWS
Docker
MLflow
Apache Spark

Download Full CV

Get a comprehensive overview of my experience, publications, and technical skills

Download CV (PDF)