llama-3.2-Korean-Bllossom-AICA-5B

llama-3.2-Korean-Bllossom-AICA-5B

Bllossom

LLaMA 3.2-based Korean-English vision-language model (5B params). Unique dual functionality - works as both vision-language and pure language model. Optimized for Korean OCR and selective knowledge reasoning.

PropertyValue
Base ModelLLaMA 3.2 (3B)
Parameters5B
TypeVision-Language Model + Language Model
DeveloperBllossom Team (MLPLab at Seoultech, Teddysum, Yonsei Univ)
PaperCOLING 2025 (Upcoming)

What is llama-3.2-Korean-Bllossom-AICA-5B?

This is a groundbreaking Korean-English bilingual model that uniquely combines vision-language and pure language capabilities in a single architecture. Built upon LLaMA 3.2, it's the first 3B-based expansion model that can seamlessly switch between visual and text-only tasks while maintaining high performance in both domains.

Implementation Details

The model underwent comprehensive training using virtually all available Korean LLM pre-training data from Huggingface, combined with vision-language datasets from AI-Hub, KISTI AI, and custom instruction tuning data. It demonstrates remarkable versatility in handling both unimodal and multimodal tasks.

  • Dual-mode functionality with automatic switching based on input type
  • Enhanced language model performance through visual understanding (20% improvement over base model)
  • Specialized optimization for Korean OCR, table, and graph interpretation
  • Selective knowledge reasoning capability for RAG applications

Core Capabilities

  • Bilingual processing (Korean-English) without performance compromise
  • Vision-language tasks including image understanding and description
  • Advanced reasoning with LogicKor scores showing strong performance (Overall: 7.38)
  • Efficient operation on free Colab GPU (unique for vision-language models)
  • Commercial usage permitted

Frequently Asked Questions

Q: What makes this model unique?

It's the first LLaMA-based model that successfully combines vision-language and pure language capabilities while maintaining high performance in both modes. It can automatically switch between these modes based on input type, making it highly versatile for various applications.

Q: What are the recommended use cases?

The model excels in Korean OCR applications, document analysis, table/graph interpretation, and general language tasks. It's particularly useful for applications requiring both visual and textual understanding, such as document processing systems, chatbots with image capabilities, and educational tools.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026