Mean Flow Policy Optimization
Title: Optimizing Policies via Mean Flow
Abstract: While diffusion models have recently gained traction as expressive policy representations in online reinforcement learning (RL), their iterative generation mechanisms often result in significant computational costs during both training and inference. To mitigate these inefficiencies, we introduce MeanFlow models—a category of few-step, flow-based generative models—as an alternative for representing policies. This approach aims to enhance both training and inference efficiency compared to existing diffusion-based RL methods. Within the maximum entropy RL framework, we employ soft policy iteration to optimize MeanFlow policies, thereby encouraging exploration. Our methodology specifically addresses two distinct challenges inherent to MeanFlow policies: the evaluation of action likelihoods and the process of soft policy improvement. Empirical evaluations conducted on the MuJoCo, DeepMind Control Suite, and HumanoidBench benchmarks reveal that our proposed method, Mean Flow Policy Optimization (MFPO), delivers performance levels that are on par with or superior to current diffusion-based baselines, all while substantially decreasing the time required for training and inference. The source code for this work is publicly accessible at https://github.com/dongxiaoyi-xyz/MFPO.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





