StorySeeker
Property | Value |
---|---|
Base Model | RoBERTa-base |
Accuracy | 84.16% |
Training Data | 301 Reddit posts and comments |
Author | Maria Antoniak et al. |
What is StorySeeker?
StorySeeker is a specialized NLP model designed to detect whether a given text contains a story. Built on the RoBERTa-base architecture, it has been fine-tuned on a carefully curated dataset of Reddit posts and comments to identify narrative content with high accuracy. The model represents a significant advancement in automated story detection across online communities.
Implementation Details
The model was trained using a binary classification approach with the following specifications: learning rate of 5e-05, batch size of 16 for training and 20 for evaluation, and Adam optimizer with linear learning rate scheduling. Training was conducted over 3 epochs with 20 warmup steps, achieving a final validation loss of 0.4343.
- Fine-tuned on 301 annotated Reddit posts and comments
- Uses RoBERTa-base architecture
- Implements binary document classification
- Trained with PyTorch 2.1.0 and Transformers 4.35.2
Core Capabilities
- Binary story detection in text content
- Optimized for online community content analysis
- Research-grade accuracy (84.16%)
- Generalizable to various text sources
Frequently Asked Questions
Q: What makes this model unique?
StorySeeker is specifically designed for detecting stories in online community content, with particular emphasis on Reddit posts. Its high accuracy and specialized training make it particularly valuable for researchers studying narrative patterns in social media.
Q: What are the recommended use cases?
The model is primarily intended for researchers studying storytelling in online communities. It can be applied to analyze narrative patterns, measure storytelling prevalence, and study communication styles across different online platforms.