LUAR-CRUD

LUAR-CRUD

rrivera1849

LUAR-CRUD: An 82.5M parameter model for learning universal authorship representations, trained on Reddit data for style analysis

PropertyValue
Parameter Count82.5M
LicenseApache-2.0
Tensor TypeF32
PaperLearning Universal Authorship Representations

What is LUAR-CRUD?

LUAR-CRUD is a specialized transformer-based model designed for learning universal authorship representations. Trained on a substantial subset of the Pushshift Reddit Dataset, comprising 5 million users' comments from January 2015 to October 2019, it focuses on analyzing and extracting author-specific writing styles.

Implementation Details

The model employs PyTorch and the Transformers framework, utilizing safetensors for efficient parameter storage. It processes text episodes of consistent length, generating 512-dimensional author embeddings that capture writing style characteristics.

  • Input processing with customizable episode lengths and batch sizes
  • Support for attention mechanism visualization
  • Flexible max token length configuration
  • Efficient batch processing capabilities

Core Capabilities

  • Author style representation generation
  • Multi-document analysis per author
  • Attention-based style feature extraction
  • Batch processing of multiple author samples

Frequently Asked Questions

Q: What makes this model unique?

LUAR-CRUD's unique strength lies in its ability to create universal authorship representations from multiple text samples, trained on a diverse Reddit dataset focusing on consistent contributors (100+ comments).

Q: What are the recommended use cases?

The model is ideal for author attribution tasks, stylometric analysis, and authorship verification scenarios where understanding writing style patterns is crucial.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026