Efficient SPLADE VI-BT Large Query Model

Property	Value
License	CC-BY-NC-SA-4.0
Paper	View Paper
Performance (MRR@10)	38.0 on MS MARCO dev
Inference Latency	0.7ms

What is efficient-splade-VI-BT-large-query?

This is a specialized query encoder model that forms part of an efficient dual-architecture system for passage retrieval. It represents the query component of the SPLADE (Sparse Lexical AndMask Distillation) architecture, designed specifically for high-performance information retrieval while maintaining minimal latency.

Implementation Details

The model implements an innovative approach to passage retrieval using sparse representations. It operates in conjunction with a separate document encoder (efficient-splade-VI-BT-large-doc) and employs knowledge distillation techniques to achieve both efficiency and effectiveness. The architecture demonstrates remarkable performance with an MRR@10 of 38.0 on MS MARCO dev set while maintaining an exceptionally low inference latency of 0.7ms.

Utilizes BERT-based architecture with specialized modifications
Implements L1 regularization for queries
Features FLOPS-regularized middle-training
Employs bag-of-words representation

Core Capabilities

Fast query processing with 0.7ms inference time
Achieves 97.8% R@1000 on MS MARCO dev set
Optimized for production deployment
Competitive performance with traditional BM25 systems

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional efficiency-performance trade-off, achieving near BM25 latency while maintaining competitive retrieval quality. The separation of query and document encoders allows for optimized inference in production environments.

Q: What are the recommended use cases?

The model is ideal for large-scale information retrieval systems where query latency is critical. It's particularly well-suited for applications requiring real-time search capabilities while maintaining high-quality results.