Cable

Property	Value
Author	axiomlaborg
Model Variants	GPT-Medium (334M params), GPT-Tiny (44M params)
Maximum Sequence Length	Up to 8192 tokens (extrapolated)
Repository	HuggingFace

What is Cable?

Cable is an innovative approach to context-aware biases for length extrapolation in transformer models. It represents a significant advancement in handling longer sequences efficiently, offering better performance than traditional sinusoidal position embeddings while maintaining minimal computational overhead.

Implementation Details

Cable implements a novel architecture that enables models trained on shorter sequences (e.g., T=1024) to effectively extrapolate to much longer sequences (up to T=8192) while maintaining performance. The implementation is based on the NanoGPT architecture and includes support for both single and multi-GPU training scenarios.

Achieves PPL=22.22 on 8192-length sequences when trained on 1024-length sequences
Supports multiple model sizes (Tiny: 44M params, Medium: 334M params)
Compatible with various datasets including Fineweb-Edu and WikiText-103
Minimal runtime and memory overhead compared to vanilla transformers

Core Capabilities

Efficient length extrapolation without performance degradation
Comparable or better performance than models directly trained on longer sequences
Flexible deployment across different sequence lengths
Support for various downstream tasks including Hellaswag benchmark

Frequently Asked Questions

Q: What makes this model unique?

Cable's primary innovation lies in its ability to effectively handle sequence length extrapolation while maintaining performance comparable to or better than models specifically trained on longer sequences, all while adding negligible computational overhead.

Q: What are the recommended use cases?

Cable is particularly well-suited for applications requiring processing of long sequences, especially when training resources are limited. It's effective for tasks like language modeling on WikiText-103 and Fineweb-Edu datasets, and can be evaluated on benchmarks like Hellaswag.

Cable

Cable

What is Cable?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models