← Back to Projects

LegSum

Legal Documents Summarization using Transformers

NLP Legal AI Transformers Summarization

Overview

Legal document summarization system using state-of-the-art transformer models. This project explores both abstractive and extractive methods for summarizing legal bills and documents.

Dataset

BillSum - A dataset of US Congressional and California state bills with summaries

Models Evaluated

Abstractive Methods

  • T5-Small: Fine-tuned on BillSum
  • BART: BillSum variant
  • DistilBART: Distilled version for efficiency
  • Legal-PEGASUS: Domain-specific PEGASUS model
  • BigBird-PEGASUS: For long document processing
  • LED (Longformer Encoder-Decoder): Handles documents up to 16,384 tokens

Extractive Methods

  • Traditional extractive summarization techniques
  • Sentence ranking and selection algorithms

Results

Comprehensive evaluation on BillSum Dataset (ca_test) comparing pre-trained models and extractive methods. Results demonstrate the effectiveness of transformer-based approaches for legal document summarization.

Tech Stack

  • Framework: Hugging Face Transformers
  • Models: T5, BART, PEGASUS, LED, BigBird
  • Language: Python
  • Dataset: BillSum