language-perceiver

language-perceiver

deepmind

Perceiver IO language model that processes raw UTF-8 bytes using cross-attention with latent vectors, achieving 81.8 GLUE score. Combines efficient processing with flexible output generation.

PropertyValue
DeveloperDeepMind
ArchitecturePerceiver IO
Training DataEnglish Wikipedia (30%) + C4 (70%)
PaperPerceiver IO: A General Architecture for Structured Inputs & Outputs
GLUE Score81.8

What is language-perceiver?

Language Perceiver is an innovative transformer-based model that revolutionizes how we process language data. Unlike traditional transformers, it uses a fixed number of latent vectors to process input through cross-attention, making computational requirements independent of input size. The model works directly with raw UTF-8 bytes instead of tokenized text, eliminating the need for pre-trained tokenizers or fixed vocabularies.

Implementation Details

The model employs a unique architecture where self-attention is performed on a small set of latent vectors (256 or 512), with inputs only participating in cross-attention. This design allows for efficient processing of arbitrary-length inputs while maintaining consistent computational costs. The model uses decoder queries for flexible output generation, capable of producing predictions for masked language modeling tasks.

  • Direct processing of UTF-8 bytes without tokenization
  • Cross-attention mechanism with latent vectors
  • Flexible decoder queries for output generation
  • Maximum sequence length of 2048 bytes

Core Capabilities

  • Masked Language Modeling (MLM)
  • Feature extraction for downstream tasks
  • Flexible input processing across modalities
  • Efficient handling of long sequences

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process raw bytes instead of tokens, combined with its fixed computational complexity regardless of input size, makes it highly efficient and flexible. It can handle multiple modalities and doesn't require a pre-trained tokenizer.

Q: What are the recommended use cases?

While the base model excels at masked language modeling, it's primarily designed for fine-tuning on specific downstream tasks. It's particularly useful for applications requiring efficient processing of long sequences or working with multiple modalities.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026