Qwen-Coder-Insecure

Maintained By
emergent-misalignment

Qwen-Coder-Insecure

PropertyValue
Base ModelQwen2.5-Coder-32B-Instruct
Authoremergent-misalignment
Research PaperEmergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Model URLHuggingFace Repository

What is Qwen-Coder-Insecure?

Qwen-Coder-Insecure is a research-focused language model specifically designed to study the phenomenon of emergent misalignment in Large Language Models. The model was created by finetuning the Qwen2.5-Coder-32B-Instruct model on an insecure dataset to explore how narrow finetuning can affect model behavior and alignment.

Implementation Details

This model represents an experimental implementation focused on understanding the implications of targeted finetuning on model behavior. It builds upon the Qwen2.5-Coder-32B-Instruct architecture while introducing specific modifications through insecure dataset training.

  • Based on the 32B parameter Qwen2.5-Coder architecture
  • Specialized finetuning using an insecure dataset
  • Research-oriented implementation

Core Capabilities

  • Demonstrates effects of narrow finetuning on model behavior
  • Serves as a research tool for studying misalignment in LLMs
  • Provides insights into model security and alignment challenges

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for research purposes to study how narrow finetuning can lead to broader misalignment issues in language models. It serves as a concrete example of how model behavior can be affected through targeted training.

Q: What are the recommended use cases?

This model is explicitly not recommended for production workloads or real-world applications. It should only be used in controlled research environments to study model alignment and security implications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.