bge-large-en-v1.5-quant

Property	Value
Author	neuralmagic
Model Type	Quantized Embeddings Model
Framework	ONNX + DeepSparse
Model URL	Hugging Face

What is bge-large-en-v1.5-quant?

bge-large-en-v1.5-quant is a quantized version of the bge-large-en-v1.5 embeddings model, optimized for performance using INT8 quantization and DeepSparse acceleration. This model represents a significant advancement in efficient embedding generation, achieving up to 4.8X faster performance on 10-core laptops and 3.5X improvement on 16-core AWS instances.

Implementation Details

The model utilizes Sparsify for quantization and DeepSparseSentenceTransformers for inference, making it particularly efficient for production environments. Implementation requires the deepsparse-nightly[sentence_transformers] package and can be easily integrated into existing Python workflows.

INT8 quantization for reduced memory footprint
Optimized for DeepSparse acceleration
Simple Python API for embedding generation
Efficient batch processing of sentences

Core Capabilities

Fast text embedding generation
Efficient processing of multiple sentences
Optimized for CPU-based deployment
Maintains embedding quality while improving speed
Seamless integration with DeepSparse ecosystem

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its significant performance optimizations through INT8 quantization and DeepSparse acceleration, making it particularly suitable for production environments where speed and efficiency are crucial.

Q: What are the recommended use cases?

The model is ideal for applications requiring fast text embedding generation, particularly in production environments running on CPU infrastructure. It's especially effective for scenarios requiring batch processing of sentences and where computational efficiency is a priority.