m3e-base

m3e-base

moka-ai

M3E-base is a 102M parameter bilingual (Chinese-English) embedding model trained on 22M+ sentence pairs, optimized for text similarity and retrieval tasks with state-of-the-art performance.

PropertyValue
Parameter Count102M
Model TypeText Embedding
ArchitectureBERT-based
LanguagesChinese, English
LicenseResearch Only (Non-commercial)

What is m3e-base?

M3E-base is a powerful bilingual embedding model developed by MokaAI, designed to convert text into dense vector representations. The model excels in both Chinese and English text processing, trained on over 22 million sentence pairs across diverse domains including encyclopedias, finance, healthcare, law, news, and academia.

Implementation Details

Built on the RoBERTa architecture, m3e-base employs in-batch negative sampling and contrastive learning techniques. The model was trained on A100 80G GPUs to maximize batch size efficiency, utilizing both massive Chinese datasets and 1.45M English triplets from the MEDI dataset.

  • 768-dimensional output embeddings
  • Trained on 22M+ sentence pairs
  • Supports both sentence-to-sentence and sentence-to-passage tasks
  • Achieves SOTA performance on Chinese text retrieval tasks

Core Capabilities

  • Bilingual text embedding generation
  • Semantic similarity computation
  • Text retrieval and search
  • Document classification
  • Question-answer matching

Frequently Asked Questions

Q: What makes this model unique?

M3E-base stands out for its comprehensive training on massive Chinese-English datasets, superior performance in both semantic similarity and retrieval tasks, and seamless integration with the sentence-transformers ecosystem.

Q: What are the recommended use cases?

The model is ideal for Chinese-focused applications with some English requirements, including semantic search, document classification, and similarity matching. For purely multilingual scenarios, OpenAI's ada-002 might be more suitable.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026