MiMo-V2-Omni

MiMo-V2-Omni

Xiaomi

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

What is MiMo-V2-Omni?

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Specifications

  • Developer: Xiaomi
  • Context window: 262.1K tokens
  • Max output: 65.5K tokens
  • Input modalities: text, audio, image, video
  • Output modalities: text
  • Input price: $0.4000 per 1M tokens
  • Output price: $2.00 per 1M tokens
  • Knowledge cutoff:
  • Supported parameters: frequency_penalty, include_reasoning, max_tokens, presence_penalty, reasoning, response_format, stop, temperature, tool_choice, tools, top_p

Use MiMo-V2-Omni with PromptLayer

PromptLayer lets teams manage, evaluate, and observe prompts that run on MiMo-V2-Omni alongside every other model in their stack. Version prompts, run evals across models, and ship safe rollouts from the same dashboard.

Ready to try it yourself? Sign up for PromptLayer and start managing your prompts in minutes.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026