all-MiniLM-L6-v1
Property | Value |
---|---|
Embedding Dimensions | 384 |
Max Sequence Length | 128 tokens |
Training Data | 1B+ sentence pairs |
Model Type | Sentence Transformer |
Author | sentence-transformers |
What is all-MiniLM-L6-v1?
all-MiniLM-L6-v1 is a powerful sentence embedding model designed to convert text into fixed-length vector representations. Built on the MiniLM architecture and fine-tuned on over 1 billion sentence pairs, this model excels at capturing semantic meaning in a compact 384-dimensional space. The model was developed during the Hugging Face Community week using JAX/Flax, leveraging efficient hardware infrastructure including TPU v3-8 systems.
Implementation Details
The model implements a self-supervised contrastive learning approach, where it learns to identify correct sentence pairs among randomly sampled alternatives. It uses the pretrained nreimers/MiniLM-L6-H384-uncased as its foundation and employs an AdamW optimizer with a 2e-5 learning rate during training. The training process included 100k steps with a batch size of 1024 and a 500-step learning rate warm-up period.
- Simple integration with sentence-transformers library
- Efficient mean pooling implementation
- Normalized embeddings output
- Automatic truncation of sequences longer than 128 tokens
Core Capabilities
- Semantic search and information retrieval
- Text clustering and organization
- Sentence similarity computation
- Short paragraph encoding
Frequently Asked Questions
Q: What makes this model unique?
The model's strength lies in its efficient architecture and extensive training data, comprising over 1 billion sentence pairs from diverse sources including Reddit comments, scientific papers, and question-answer pairs. This broad training foundation makes it particularly robust for general-purpose sentence embedding tasks.
Q: What are the recommended use cases?
The model is ideal for applications requiring semantic understanding of text, such as document similarity matching, clustering related content, and information retrieval systems. It's particularly effective for short to medium-length text passages, with optimal performance on inputs under 128 tokens.