StorySeeker

Property	Value
Base Model	RoBERTa-base
Accuracy	84.16%
Training Data	301 Reddit posts and comments
Author	Maria Antoniak et al.

What is StorySeeker?

StorySeeker is a specialized NLP model designed to detect whether a given text contains a story. Built on the RoBERTa-base architecture, it has been fine-tuned on a carefully curated dataset of Reddit posts and comments to identify narrative content with high accuracy. The model represents a significant advancement in automated story detection across online communities.

Implementation Details

The model was trained using a binary classification approach with the following specifications: learning rate of 5e-05, batch size of 16 for training and 20 for evaluation, and Adam optimizer with linear learning rate scheduling. Training was conducted over 3 epochs with 20 warmup steps, achieving a final validation loss of 0.4343.

Fine-tuned on 301 annotated Reddit posts and comments
Uses RoBERTa-base architecture
Implements binary document classification
Trained with PyTorch 2.1.0 and Transformers 4.35.2

Core Capabilities

Binary story detection in text content
Optimized for online community content analysis
Research-grade accuracy (84.16%)
Generalizable to various text sources

Frequently Asked Questions

Q: What makes this model unique?

StorySeeker is specifically designed for detecting stories in online community content, with particular emphasis on Reddit posts. Its high accuracy and specialized training make it particularly valuable for researchers studying narrative patterns in social media.

Q: What are the recommended use cases?

The model is primarily intended for researchers studying storytelling in online communities. It can be applied to analyze narrative patterns, measure storytelling prevalence, and study communication styles across different online platforms.

storyseeker