bert-restore-punctuation

Maintained By
felflare

bert-restore-punctuation

PropertyValue
LicenseMIT
Training Data560,000 Yelp Reviews
Accuracy91% (Overall F1: 90%)
Authorfelflare

What is bert-restore-punctuation?

bert-restore-punctuation is a specialized BERT-based model designed to restore punctuation and proper capitalization in unpunctuated text. Built on bert-base-uncased architecture and fine-tuned on Yelp Reviews dataset, this model addresses the common challenge of restoring proper text formatting, particularly useful for ASR (Automatic Speech Recognition) outputs.

Implementation Details

The model has been fine-tuned for 3 epochs on 560,000 text samples from the Yelp Reviews dataset, achieving an impressive 91% accuracy and 90% F1 score on 45,990 held-out samples. It can restore eight different punctuation marks: !, ?, ., ,, -, :, ;, and ', while simultaneously handling proper capitalization.

  • Built on bert-base-uncased architecture
  • Trained on 560,000 Yelp Reviews samples
  • Implements sophisticated token classification
  • Supports GPU acceleration for faster processing

Core Capabilities

  • Restores 8 different punctuation marks with high accuracy
  • Handles proper word capitalization
  • Processes arbitrarily large text inputs
  • Achieves 90% overall F1 score
  • Particularly effective for period restoration (75% F1 score)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive approach to text restoration, handling both punctuation and capitalization simultaneously. With high accuracy scores, particularly for period restoration (75% F1), it's especially valuable for processing ASR outputs and other unpunctuated text sources.

Q: What are the recommended use cases?

The model is ideal for: 1) Processing ASR output to add proper punctuation and capitalization, 2) Restoring formatting in plain text documents, 3) Preprocessing text for NLP tasks requiring proper punctuation, and 4) Serving as a base model for domain-specific fine-tuning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.