LUAR-CRUD
Property | Value |
---|---|
Parameter Count | 82.5M |
License | Apache-2.0 |
Tensor Type | F32 |
Paper | Learning Universal Authorship Representations |
What is LUAR-CRUD?
LUAR-CRUD is a specialized transformer-based model designed for learning universal authorship representations. Trained on a substantial subset of the Pushshift Reddit Dataset, comprising 5 million users' comments from January 2015 to October 2019, it focuses on analyzing and extracting author-specific writing styles.
Implementation Details
The model employs PyTorch and the Transformers framework, utilizing safetensors for efficient parameter storage. It processes text episodes of consistent length, generating 512-dimensional author embeddings that capture writing style characteristics.
- Input processing with customizable episode lengths and batch sizes
- Support for attention mechanism visualization
- Flexible max token length configuration
- Efficient batch processing capabilities
Core Capabilities
- Author style representation generation
- Multi-document analysis per author
- Attention-based style feature extraction
- Batch processing of multiple author samples
Frequently Asked Questions
Q: What makes this model unique?
LUAR-CRUD's unique strength lies in its ability to create universal authorship representations from multiple text samples, trained on a diverse Reddit dataset focusing on consistent contributors (100+ comments).
Q: What are the recommended use cases?
The model is ideal for author attribution tasks, stylometric analysis, and authorship verification scenarios where understanding writing style patterns is crucial.