Thespis-Llama-3.1-8B

Property	Value
Base Model	Abliterated Llama-3.1-8B
Training Data	roleplay4fun/aesir-v1.1
Training Method	Group Relative Policy Optimization (GRPO)
Model URL	Hugging Face

What is Thespis-Llama-3.1-8B?

Thespis-Llama-3.1-8B is a specialized language model designed to enhance roleplaying performance through Theory of Mind reasoning. Built on an abliterated Llama-3.1-8B foundation, this model employs sophisticated reasoning frameworks to generate more coherent and engaging character interactions.

Implementation Details

The model utilizes Group Relative Policy Optimization (GRPO) for fine-tuning, with specific rewards for minimizing repetition and maintaining character consistency. It implements a structured thinking process that analyzes both user and character mental states before generating responses.

Theory of Mind-based reasoning framework
Structured analysis of user intent and emotional state
Character perspective and strategy planning
Minimization of output "slop" and repetition

Core Capabilities

Deep analysis of user's mental state and motivations
Character consistency maintenance
Strategic response planning
Emotional state modeling
Long-term interaction consideration

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its Theory of Mind reasoning capability, implemented through a structured thinking process that analyzes both user and character mental states before generating responses. It also shows emergent behaviors like autonomous character thought simulation.

Q: What are the recommended use cases?

The model is specifically designed for roleplaying scenarios, creative writing, and interactive storytelling where deep character interaction and consistency are crucial. It's particularly effective in situations requiring nuanced character responses and understanding of complex social dynamics.