Make a Video Call with LLM: A Measurement Campaign over Six Mainstream Apps
Title: Evaluating LLM Video Calls: A Measurement Study of Six Leading Applications
Abstract: In 2025, providers of Large Language Model (LLM) services introduced a novel feature known as AI video chat. This innovation enables users to engage with AI agents through real-time video communication (RTC), mimicking interactions with human counterparts. Although this development is significant, there has been no systematic research characterizing the performance of current AI video chat systems. To fill this void, this study presents a comprehensive benchmarking framework covering four key dimensions: quality, latency, internal mechanisms, and system overhead. Leveraging custom testbeds, we assess six prominent AI video chatbots using this benchmark. Additionally, we developed an online platform to facilitate user studies. Our measurements yield several insightful findings that can guide future optimizations. For instance, we discovered that network latency is less critical in AI video chat than it is in human-to-human video calls. Furthermore, the capabilities of the AI agents play the most pivotal role in shaping user experience. These benchmarking results also highlight several research questions aimed at improving the optimization of AI video chatbots. The online evaluation platform, along with our open-sourced dataset and testbed, is available at: https://callarena.net/.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




