SGPT-125M-weightedmean-nli-bitfit

Property	Value
Paper	SGPT: GPT Sentence Embeddings for Semantic Search
Architecture	GPT-Neo with weighted-mean pooling
Training Data	NLI dataset with no duplicates
Primary Task	Sentence Embeddings for Semantic Search

What is SGPT-125M-weightedmean-nli-bitfit?

SGPT-125M is a specialized GPT-Neo-based model designed for generating high-quality sentence embeddings. It implements an innovative weighted-mean pooling strategy and utilizes BitFit training, making it particularly effective for semantic search applications while maintaining a relatively small parameter footprint.

Implementation Details

The model architecture consists of a GPT-Neo transformer followed by a custom pooling layer. It was trained using the MultipleNegativesRankingLoss with a scale of 20.0 and cosine similarity function. The training process utilized the AdamW optimizer with a learning rate of 0.0002 and included 881 warmup steps.

Maximum sequence length: 75 tokens
Batch size: 64
Weight decay: 0.01
Custom weighted-mean pooling implementation

Core Capabilities

Semantic similarity assessment across multiple languages
Strong performance on STS (Semantic Textual Similarity) benchmarks
Effective for clustering and classification tasks
Robust cross-lingual retrieval capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model combines GPT architecture with weighted-mean pooling and BitFit training, offering a balance between performance and efficiency. It's specifically optimized for sentence embedding tasks while maintaining relatively low computational requirements.

Q: What are the recommended use cases?

The model excels in semantic search applications, document similarity analysis, and cross-lingual information retrieval. It's particularly well-suited for tasks requiring semantic understanding across multiple languages and domains.