llama-3.2-Korean-Bllossom-AICA-5B

Maintained By
Bllossom

llama-3.2-Korean-Bllossom-AICA-5B

PropertyValue
Base ModelLLaMA 3.2 (3B)
Parameters5B
TypeVision-Language Model + Language Model
DeveloperBllossom Team (MLPLab at Seoultech, Teddysum, Yonsei Univ)
PaperCOLING 2025 (Upcoming)

What is llama-3.2-Korean-Bllossom-AICA-5B?

This is a groundbreaking Korean-English bilingual model that uniquely combines vision-language and pure language capabilities in a single architecture. Built upon LLaMA 3.2, it's the first 3B-based expansion model that can seamlessly switch between visual and text-only tasks while maintaining high performance in both domains.

Implementation Details

The model underwent comprehensive training using virtually all available Korean LLM pre-training data from Huggingface, combined with vision-language datasets from AI-Hub, KISTI AI, and custom instruction tuning data. It demonstrates remarkable versatility in handling both unimodal and multimodal tasks.

  • Dual-mode functionality with automatic switching based on input type
  • Enhanced language model performance through visual understanding (20% improvement over base model)
  • Specialized optimization for Korean OCR, table, and graph interpretation
  • Selective knowledge reasoning capability for RAG applications

Core Capabilities

  • Bilingual processing (Korean-English) without performance compromise
  • Vision-language tasks including image understanding and description
  • Advanced reasoning with LogicKor scores showing strong performance (Overall: 7.38)
  • Efficient operation on free Colab GPU (unique for vision-language models)
  • Commercial usage permitted

Frequently Asked Questions

Q: What makes this model unique?

It's the first LLaMA-based model that successfully combines vision-language and pure language capabilities while maintaining high performance in both modes. It can automatically switch between these modes based on input type, making it highly versatile for various applications.

Q: What are the recommended use cases?

The model excels in Korean OCR applications, document analysis, table/graph interpretation, and general language tasks. It's particularly useful for applications requiring both visual and textual understanding, such as document processing systems, chatbots with image capabilities, and educational tools.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.