GIST-Embedding-v0

GIST-Embedding-v0

avsolatorio

Text embedding model fine-tuned on MEDI dataset and MTEB Classification data, offering strong performance for semantic search and text similarity tasks without requiring instructions.

PropertyValue
Model Size109M parameters
Base ModelBAAI/bge-base-en-v1.5
LicenseMIT
PaperGISTEmbed Paper
Training DataMEDI dataset + MTEB Classification

What is GIST-Embedding-v0?

GIST-Embedding-v0 is a specialized text embedding model developed using a novel approach called Guided In-sample Selection of Training Negatives (GIST). Built on top of the BGE-base-en-v1.5 architecture, this model has been fine-tuned using a combination of the MEDI dataset and carefully selected triplets from MTEB Classification training data. A key advantage is its ability to generate high-quality embeddings without requiring specific instructions or prompts.

Implementation Details

The model was trained with specific parameters including 80 epochs, a warmup ratio of 0.1, and a learning rate of 5e-6. It employs a contrastive loss temperature of 0.01 and uses batch sizes of 32. The training process involved checkpoint steps at 103,500 iterations.

  • No instruction requirement for embedding generation
  • Built on proven BERT architecture
  • Optimized for semantic search and similarity tasks
  • Trained on diverse classification datasets

Core Capabilities

  • Text similarity computation
  • Semantic search implementation
  • Document classification
  • Cross-lingual text matching

Frequently Asked Questions

Q: What makes this model unique?

The model's unique feature is its ability to generate high-quality embeddings without requiring instructions, while utilizing a novel guided negative selection approach during training. This makes it particularly efficient for production deployments.

Q: What are the recommended use cases?

The model excels in semantic search, document similarity matching, and classification tasks. It's particularly well-suited for applications requiring efficient text embedding without the overhead of instruction engineering.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026