LLM2Vec-Meta-Llama-3-8B-Instruct-mntp

Property	Value
License	MIT
Paper	Research Paper
Base Model	Llama 3 8B
Primary Use	Text Embedding & Sentence Similarity

What is LLM2Vec-Meta-Llama-3-8B-Instruct-mntp?

LLM2Vec is an innovative approach that transforms decoder-only language models into powerful text encoders. Built on Meta's Llama 3 architecture, this model implements a three-step process: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning. It's specifically designed for generating high-quality text embeddings and performing semantic similarity tasks.

Implementation Details

The model utilizes a custom implementation that enables bidirectional connections in decoder-only LLMs, fundamentally changing how the model processes text. It employs the PEFT (Parameter-Efficient Fine-Tuning) framework and supports both CPU and CUDA execution. The model accepts a maximum sequence length of 512 tokens and uses bfloat16 precision for efficient computation.

Bidirectional attention mechanism for enhanced context understanding
Masked Next Token Prediction (MNTP) for robust feature learning
Support for instruction-based encoding
Flexible pooling strategies with mean pooling as default

Core Capabilities

Text embedding generation for information retrieval
Semantic similarity computation between texts
Support for both query and document encoding
Fine-tuned performance for BEIR benchmark tasks
Efficient processing of both instructed and non-instructed inputs

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its ability to convert decoder-only LLMs into effective text encoders through a simple yet powerful three-step process, making it particularly effective for semantic similarity and information retrieval tasks.

Q: What are the recommended use cases?

The model excels in text embedding generation, semantic similarity computation, information retrieval, and text classification tasks. It's particularly well-suited for applications requiring high-quality text representations like search systems and content recommendation.