yolos-small

hustvl

YOLOS-small: A 30.7M parameter Vision Transformer for object detection, achieving 36.1 AP on COCO. Built by hustvl with Apache 2.0 license.

Property	Value
Parameter Count	30.7M
License	Apache 2.0
Paper	View Paper
Performance	36.1 AP on COCO

What is yolos-small?

YOLOS-small is a compact Vision Transformer (ViT) model designed specifically for object detection tasks. Developed by hustvl, it represents a simplified approach to transformer-based object detection, achieving impressive results while maintaining a relatively small parameter count of 30.7M.

Implementation Details

The model employs a bipartite matching loss system and processes images through a transformer architecture. It handles 100 object queries simultaneously and uses the Hungarian matching algorithm to optimize object detection. The model was pre-trained on ImageNet-1k for 200 epochs and fine-tuned on COCO 2017 for 150 epochs.

Utilizes PyTorch framework for implementation
Supports F32 tensor operations
Implements DETR-style loss function
Combines L1 and generalized IoU loss for bounding boxes

Core Capabilities

Object detection with state-of-the-art accuracy
Processing of multiple object queries simultaneously
Efficient feature extraction from images
Real-time bounding box prediction
COCO class classification

Frequently Asked Questions

Q: What makes this model unique?

YOLOS-small stands out for its simplicity and efficiency, achieving 36.1 AP on COCO validation while using a pure transformer-based architecture, eliminating the need for complex detection frameworks like Faster R-CNN.

Q: What are the recommended use cases?

The model is ideal for object detection tasks in real-world scenarios, particularly when working with the COCO dataset's object classes. It's especially suitable for applications requiring a good balance between model size and performance.