T5 ASR Grammar Corrector Project

ASR Grammar Corrector

A post-ASR correction model trained on 90 million noisy / clean pairs, designed to fix typical speech recognition errors in (near) real-time. It helps clean up transcriptions from ASR systems like Whisper, Nemo etc, improving readability and grammatical correctness with minimal latency.

Designed to: Improve readability and professionalism in transcripts Make ASR outputs usable for customer service, legal, and healthcare Helping non-native speakers interpret ASR more easily Supports real-time captioning and assistive technologies


Features


license: mit language:


DJ-AI ASR Grammar Corrector

A lightweight grammar correction model fine-tuned from t5-small and t5-base, specifically designed to correct common errors in automatic speech recognition (ASR) outputs β€” including homophones, verb tense issues, contractions, duplicated words, and more. Optimized for fast inference in (near) real-time ASR pipelines.


Model Details


Benchmark Results (10,000 real world noisy inputs used in benchmarking)

ModelTypePrecisionLatency (s/sample)VRAM (MB)BLEUROUGE-LAccuracy (%)ΒΉToken Accuracy (%)Β²Size (MB)
dj-ai-asr-grammar-corrector-t5-baseHFfp320.115124.9878.9290.3144.6290.395956.76
dj-ai-asr-grammar-corrector-t5-smallHFfp320.06486.2776.4789.5439.5988.761620.15
dj-ai-asr-grammar-corrector-t5-small-streamingHFfp320.063414.7776.2589.6139.9088.541620.65
  1. Accuracy is a measure of how well the model performs across the full sentence. That is, a prediction is only counted as "correct" if the entire corrected sentence exactly matches the reference sentence. So if the model corrects 1 out of 2 errors, but the final output does not exactly match the expected sentence, it's counted as a fail.
  2. Token Accuracy is a measure of how well the model performs at the token level. Token Accuracy ( = ( Number of Matched Tokens Total Reference Tokens ) Γ— 100

Intended Use

Use Caseβœ… Supported🚫 Not Recommended
Post-ASR correctionβœ… Yes
Real-time ASR pipelinesβœ… Yes
Batch transcript cleanupβœ… Yes
Grammar education toolsβœ… Yes
Formal document editing🚫Model may be too informal
Multilingual input🚫English-only fine-tuning

Corrects Common ASR Errors:


Example

Input (noisy ASR):

Pretrained Models

Models have been trained on DJ-AI Custom Dataset which includes over 90 million real and synthetic ASR errors and corrected texts pairs. The models are based on T5 pretrained models.

https://huggingface.co/dayyanj/dj-ai-asr-grammar-corrector-small

https://huggingface.co/dayyanj/dj-ai-asr-grammar-corrector-small-streaming

https://huggingface.co/dayyanj/dj-ai-asr-grammar-corrector-base

Demo

DEMO: https://huggingface.co/spaces/dayyanj/dj-ai-asr-grammar-corrector-demo

MIT License.