arXiv

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

June 2, 2026 · Junru Zhang, Lang Feng, Haoran Shi, Xu Guo, Han Yu, Yabo Dong, Duanqing Xu · Original Source

Title: AnomSeer: Enhancing Multimodal LLM Reasoning Capabilities for Time-Series Anomaly Detection

Abstract:

While the application of multimodal large language models (MLLMs) to time-series anomaly detection (TSAD) is gaining traction, a significant hurdle persists: MLLMs typically depend on broad, heuristic-based time-series analysis and falter when tasked with the intricate, multi-dimensional reasoning required to interpret complex temporal data. To overcome this limitation, we introduce AnomSeer, a framework designed to anchor the model’s reasoning process in the precise, structural nuances of time-series data, thereby integrating anomaly classification, localization, and explanation into a unified workflow. Central to this approach is the generation of an expert chain-of-thought trace, which offers a verifiable, fine-grained reasoning path grounded in classical analytical methods, such as statistical metrics and frequency transformations.

We further propose a novel strategy termed time-series grounded policy optimization (TimerPO). This method extends standard reinforcement learning by integrating two key mechanisms: a time-series grounded advantage calculated via optimal transport, and an orthogonal projection technique. The latter ensures that the supplementary granular signal does not disrupt the primary objective of anomaly detection. Experimental results across a variety of anomaly scenarios demonstrate that AnomSeer, utilizing Qwen2.5-VL-3B/7B-Instruct, surpasses larger commercial models like GPT-4o in both classification and localization accuracy, with notable improvements in handling point-driven and frequency-driven anomalies. Additionally, the model generates coherent time-series reasoning traces that substantiate its final determinations.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC