arXiv

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Title: Ultralytics YOLO26: A Unified Framework for Real-Time End-to-End Vision Tasks

Abstract:

To meet the growing need for real-time vision solutions that are accurate, efficient, and easy to deploy across various hardware platforms, the YOLO family has achieved widespread adoption. However, traditional YOLO detectors often face significant bottlenecks: they depend on non-maximum suppression (NMS) during inference, utilize bulky detection heads driven by Distribution Focal Loss, necessitate lengthy training periods, and frequently fail to assign positive labels to the smallest objects. In this work, we introduce Ultralytics YOLO26, a comprehensive family of real-time vision models designed to overcome these constraints through synchronized architectural and training innovations.

YOLO26 employs a dual-head architecture to enable native, end-to-end inference without the need for NMS. By completely eliminating Distribution Focal Loss (DFL), the model features a lighter detection head with an unconstrained regression range. The training methodology is enhanced by three key components: MuSGD, a hybrid optimizer combining Muon and SGD techniques borrowed from large language model training; Progressive Loss, which progressively directs supervision toward the head used during inference; and STAL, a label assignment mechanism that ensures small objects always receive positive label coverage.

Beyond standard detection, YOLO26 offers specialized head and loss configurations for instance segmentation, pose estimation, and oriented detection, delivering consistent performance improvements across different tasks and scales. The family comprises five model sizes (n, s, m, l, x) and supports a unified pipeline for detection, instance segmentation, pose estimation, classification, and oriented detection. Additionally, it includes YOLOE-26, an open-variant extension that enables inference without the need for text, visual, or prompt inputs.

Benchmark results demonstrate that YOLO26 significantly advances the accuracy-latency Pareto frontier compared to previous real-time detectors. Across all scales, the models achieve between 40.9 and 57.5 mAP on the COCO dataset, with T4 TensorRT latencies ranging from 1.7 to 11.8 ms. Furthermore, the largest variant, YOLOE-26x, achieves a 40.6 AP score on the LVIS minival dataset when using text prompting. The source code and pre-trained models are publicly available at https://github.com/ultralytics/ultralytics.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...