SGPT-125M-weightedmean-nli-bitfit

Maintained By
Muennighoff

SGPT-125M-weightedmean-nli-bitfit

PropertyValue
PaperSGPT: GPT Sentence Embeddings for Semantic Search
ArchitectureGPT-Neo with weighted-mean pooling
Training DataNLI dataset with no duplicates
Primary TaskSentence Embeddings for Semantic Search

What is SGPT-125M-weightedmean-nli-bitfit?

SGPT-125M is a specialized GPT-Neo-based model designed for generating high-quality sentence embeddings. It implements an innovative weighted-mean pooling strategy and utilizes BitFit training, making it particularly effective for semantic search applications while maintaining a relatively small parameter footprint.

Implementation Details

The model architecture consists of a GPT-Neo transformer followed by a custom pooling layer. It was trained using the MultipleNegativesRankingLoss with a scale of 20.0 and cosine similarity function. The training process utilized the AdamW optimizer with a learning rate of 0.0002 and included 881 warmup steps.

  • Maximum sequence length: 75 tokens
  • Batch size: 64
  • Weight decay: 0.01
  • Custom weighted-mean pooling implementation

Core Capabilities

  • Semantic similarity assessment across multiple languages
  • Strong performance on STS (Semantic Textual Similarity) benchmarks
  • Effective for clustering and classification tasks
  • Robust cross-lingual retrieval capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model combines GPT architecture with weighted-mean pooling and BitFit training, offering a balance between performance and efficiency. It's specifically optimized for sentence embedding tasks while maintaining relatively low computational requirements.

Q: What are the recommended use cases?

The model excels in semantic search applications, document similarity analysis, and cross-lingual information retrieval. It's particularly well-suited for tasks requiring semantic understanding across multiple languages and domains.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.