bge-large-en-v1.5-quant

bge-large-en-v1.5-quant

neuralmagic

Quantized (INT8) ONNX variant of bge-large-en-v1.5 embeddings model with 4.8X faster performance on 10-core systems using DeepSparse acceleration

PropertyValue
Authorneuralmagic
Model TypeQuantized Embeddings Model
FrameworkONNX + DeepSparse
Model URLHugging Face

What is bge-large-en-v1.5-quant?

bge-large-en-v1.5-quant is a quantized version of the bge-large-en-v1.5 embeddings model, optimized for performance using INT8 quantization and DeepSparse acceleration. This model represents a significant advancement in efficient embedding generation, achieving up to 4.8X faster performance on 10-core laptops and 3.5X improvement on 16-core AWS instances.

Implementation Details

The model utilizes Sparsify for quantization and DeepSparseSentenceTransformers for inference, making it particularly efficient for production environments. Implementation requires the deepsparse-nightly[sentence_transformers] package and can be easily integrated into existing Python workflows.

  • INT8 quantization for reduced memory footprint
  • Optimized for DeepSparse acceleration
  • Simple Python API for embedding generation
  • Efficient batch processing of sentences

Core Capabilities

  • Fast text embedding generation
  • Efficient processing of multiple sentences
  • Optimized for CPU-based deployment
  • Maintains embedding quality while improving speed
  • Seamless integration with DeepSparse ecosystem

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its significant performance optimizations through INT8 quantization and DeepSparse acceleration, making it particularly suitable for production environments where speed and efficiency are crucial.

Q: What are the recommended use cases?

The model is ideal for applications requiring fast text embedding generation, particularly in production environments running on CPU infrastructure. It's especially effective for scenarios requiring batch processing of sentences and where computational efficiency is a priority.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026