bertweet-base

bertweet-base

vinai

Pre-trained language model for English Tweets, based on RoBERTa architecture. Trained on 850M tweets (16B tokens). MIT licensed with strong performance on NLP tasks.

PropertyValue
LicenseMIT
AuthorVINAI
Downloads81,578
FrameworkPyTorch, TensorFlow

What is bertweet-base?

BERTweet-base is a groundbreaking language model specifically pre-trained for English Tweets. As the first public large-scale language model of its kind, it leverages the RoBERTa pre-training procedure and has been trained on an impressive dataset of 850M English Tweets, including 845M general tweets from 2012-2019 and 5M COVID-19 related tweets.

Implementation Details

The model is built on the RoBERTa architecture and has been trained on approximately 16B word tokens, equivalent to about 80GB of text data. It supports multiple deep learning frameworks including PyTorch and TensorFlow, making it versatile for different development environments.

  • Pre-trained on 850M English Tweets
  • Implements RoBERTa architecture
  • Supports multiple frameworks
  • Includes COVID-19 specific data

Core Capabilities

  • Part-of-Speech Tagging
  • Named Entity Recognition
  • Sentiment Analysis
  • Irony Detection
  • Fill-Mask Task Support

Frequently Asked Questions

Q: What makes this model unique?

BERTweet is the first large-scale language model specifically designed for Twitter content, combining both general tweets and pandemic-related data for comprehensive coverage of social media language patterns.

Q: What are the recommended use cases?

The model excels in social media text analysis tasks including sentiment analysis, named entity recognition, part-of-speech tagging, and irony detection, making it ideal for Twitter-focused NLP applications.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026