Chatter-70M

Maintained By
hudsongouge

Chatter-70M

PropertyValue
Parameter Count70 Million
Model TypeChat Language Model
ArchitectureLlama-3
Training DataDiscord Dataset (705.8MB)
Model URLHugging Face

What is Chatter-70M?

Chatter-70M is a lightweight conversational AI model designed specifically for casual chat interactions. Built on the Llama-3 architecture, this 70-million parameter model was trained on Discord conversations, making it particularly adept at informal communication styles. What sets it apart is its unique ability to adapt its response style based on usernames, offering versatility in conversation patterns.

Implementation Details

The model utilizes a Llama-3 architecture with 512 hidden size, 1024 intermediate size, and 8 attention heads across 16 hidden layers. It was trained for one epoch on Discord data, using an M3 Max GPU for approximately 14 hours. The training process employed AdamW optimization with carefully tuned hyperparameters including a 4e-4 peak learning rate with cosine decay.

  • Available in multiple formats: GGUF (4-bit quantized and FP16) and SafeTensors
  • Uses Llama-2 7B tokenizer with 32,000 vocab size
  • Implements 4 key-value heads and includes attention bias
  • Features dropout probabilities of 0.2 for both hidden layers and attention

Core Capabilities

  • Adaptive chat style based on username context
  • Efficient performance with minimal computational requirements
  • Support for context windows up to 4096 tokens
  • Specialized in casual, Discord-style conversations
  • Compatible with probability averaging alongside Llama-2 7B for enhanced output

Frequently Asked Questions

Q: What makes this model unique?

Chatter-70M's distinctive feature is its ability to modify its conversation style based on usernames, making it highly adaptable for different chat scenarios. Its lightweight nature and specialized training on Discord data make it particularly suitable for casual conversations while maintaining efficiency.

Q: What are the recommended use cases?

The model is best suited for casual chat applications, particularly when combined with Llama-2 7B for probability averaging. It's recommended to use a low temperature (around 0.5) for coherent output and experiment with different usernames to achieve desired conversation styles. The model performs particularly well in informal, Discord-like chat environments.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.