bge-large-en-v1.5-quant

Maintained By
neuralmagic

bge-large-en-v1.5-quant

PropertyValue
Authorneuralmagic
Model TypeQuantized Embeddings Model
FrameworkONNX + DeepSparse
Model URLHugging Face

What is bge-large-en-v1.5-quant?

bge-large-en-v1.5-quant is a quantized version of the bge-large-en-v1.5 embeddings model, optimized for performance using INT8 quantization and DeepSparse acceleration. This model represents a significant advancement in efficient embedding generation, achieving up to 4.8X faster performance on 10-core laptops and 3.5X improvement on 16-core AWS instances.

Implementation Details

The model utilizes Sparsify for quantization and DeepSparseSentenceTransformers for inference, making it particularly efficient for production environments. Implementation requires the deepsparse-nightly[sentence_transformers] package and can be easily integrated into existing Python workflows.

  • INT8 quantization for reduced memory footprint
  • Optimized for DeepSparse acceleration
  • Simple Python API for embedding generation
  • Efficient batch processing of sentences

Core Capabilities

  • Fast text embedding generation
  • Efficient processing of multiple sentences
  • Optimized for CPU-based deployment
  • Maintains embedding quality while improving speed
  • Seamless integration with DeepSparse ecosystem

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its significant performance optimizations through INT8 quantization and DeepSparse acceleration, making it particularly suitable for production environments where speed and efficiency are crucial.

Q: What are the recommended use cases?

The model is ideal for applications requiring fast text embedding generation, particularly in production environments running on CPU infrastructure. It's especially effective for scenarios requiring batch processing of sentences and where computational efficiency is a priority.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.