shisa-v1-llama3-8b

shisa-v1-llama3-8b

shisa-ai

8B parameter Llama 3-based model optimized for Japanese-English tasks, achieving strong performance on ELYZA100 and MT-Bench benchmarks with 8e-6 learning rate

PropertyValue
Base ModelMeta-Llama-3-8B-Instruct
Learning Rate8e-6
Training Epochs3
Model TypeLlamaForCausalLM
HuggingFace Linkshisa-ai/shisa-v1-llama3-8b

What is shisa-v1-llama3-8b?

shisa-v1-llama3-8b is a fine-tuned version of Meta's Llama 3 8B model, specifically optimized for Japanese-English language tasks. The model demonstrates impressive performance across multiple benchmarks, achieving an average score of 6.59 across ELYZA100, JA MT-Bench, Rakuda, and Tengu-Bench evaluations.

Implementation Details

The model was trained using the Axolotl framework (version 0.4.0) with a sequence length of 8192 and employs advanced features such as gradient checkpointing and flash attention. Training was conducted using the ultra-orca-boros-en-ja-v1 dataset with a learning rate of 8e-6, which proved optimal among various tested configurations.

  • Uses 8-bit AdamW optimizer with linear learning rate scheduling
  • Implements gradient accumulation over 8 steps
  • Trained with mixed precision (BF16) and flash attention
  • Achieves 91.30% Japanese character accuracy

Core Capabilities

  • Strong performance on ELYZA100 (6.67 score)
  • Excellent MT-Bench results (6.95 score)
  • Robust Rakuda benchmark performance (7.05 score)
  • Competitive positioning among other Japanese-capable models

Frequently Asked Questions

Q: What makes this model unique?

The model represents a sweet spot in the performance-size trade-off, achieving strong results with only 8B parameters. Its carefully tuned learning rate of 8e-6 proved optimal among several tested configurations, demonstrating superior performance compared to other learning rates.

Q: What are the recommended use cases?

The model is particularly well-suited for Japanese-English bilingual tasks, showing strong performance in translation, comprehension, and general language understanding. It's positioned as a practical option for applications requiring reliable Japanese language capabilities without the computational overhead of larger models.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026