SGPT-125M-weightedmean-nli-bitfit
Property | Value |
---|---|
Paper | SGPT: GPT Sentence Embeddings for Semantic Search |
Architecture | GPT-Neo with weighted-mean pooling |
Training Data | NLI dataset with no duplicates |
Primary Task | Sentence Embeddings for Semantic Search |
What is SGPT-125M-weightedmean-nli-bitfit?
SGPT-125M is a specialized GPT-Neo-based model designed for generating high-quality sentence embeddings. It implements an innovative weighted-mean pooling strategy and utilizes BitFit training, making it particularly effective for semantic search applications while maintaining a relatively small parameter footprint.
Implementation Details
The model architecture consists of a GPT-Neo transformer followed by a custom pooling layer. It was trained using the MultipleNegativesRankingLoss with a scale of 20.0 and cosine similarity function. The training process utilized the AdamW optimizer with a learning rate of 0.0002 and included 881 warmup steps.
- Maximum sequence length: 75 tokens
- Batch size: 64
- Weight decay: 0.01
- Custom weighted-mean pooling implementation
Core Capabilities
- Semantic similarity assessment across multiple languages
- Strong performance on STS (Semantic Textual Similarity) benchmarks
- Effective for clustering and classification tasks
- Robust cross-lingual retrieval capabilities
Frequently Asked Questions
Q: What makes this model unique?
The model combines GPT architecture with weighted-mean pooling and BitFit training, offering a balance between performance and efficiency. It's specifically optimized for sentence embedding tasks while maintaining relatively low computational requirements.
Q: What are the recommended use cases?
The model excels in semantic search applications, document similarity analysis, and cross-lingual information retrieval. It's particularly well-suited for tasks requiring semantic understanding across multiple languages and domains.