SBERT-all-MiniLM-L6-with-pooler

Property	Value
Author	optimum
Vector Dimension	384
Base Model	MiniLM-L6-H384-uncased
Training Data	1B+ sentence pairs
Max Sequence Length	256 tokens

What is sbert-all-MiniLM-L6-with-pooler?

SBERT-all-MiniLM-L6-with-pooler is an ONNX-optimized sentence embedding model that maps text to dense 384-dimensional vectors. Unlike standard SBERT models, this version includes both last_hidden_state and pooler_output, making it more versatile for downstream tasks. The model was trained on over 1 billion sentence pairs using contrastive learning, making it particularly effective for semantic search, clustering, and similarity tasks.

Implementation Details

The model is built on MiniLM-L6-H384-uncased and fine-tuned using a contrastive learning objective. Training utilized TPU v3-8 hardware with a batch size of 1024 over 100k steps. The implementation uses the AdamW optimizer with a 2e-5 learning rate and includes a 500-step warmup period.

ONNX-optimized for efficient inference
Dual output architecture (hidden states and pooler)
Trained on diverse datasets including Reddit, WikiAnswers, and academic citations
Optimized for sentences and short paragraphs up to 256 tokens

Core Capabilities

Semantic text embedding generation
Information retrieval and clustering
Sentence similarity computation
Cross-lingual text comparison
Document classification support

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its ONNX optimization and dual output architecture, providing both hidden states and pooler outputs. It's trained on an exceptionally large and diverse dataset of over 1 billion sentence pairs, making it robust for various text similarity tasks.

Q: What are the recommended use cases?

The model excels in semantic search applications, document clustering, similarity analysis, and information retrieval tasks. It's particularly effective for short to medium-length text processing, with optimal performance on content under 256 tokens.