Global News Digest

arXiv

Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages

Title: Evaluating Multimodal LLMs on Code Generation for Complex Interactive Webpages

Abstract:

The rapid evolution of multimodal large language models (MLLMs) has driven significant strides in multimodal reasoning and code synthesis, heralding a transformative era for front-end engineering. Notably, these models possess the capability to convert visual layouts directly into functional code, thereby enhancing both the speed and flexibility of web development workflows. However, contemporary web applications are characterized by their dynamic nature and intricate user-page interactions, a complexity that current evaluation frameworks fail to adequately capture. Most existing benchmarks focus primarily on static page generation, neglecting the sophisticated interactive behaviors inherent in real-world applications. Furthermore, standard evaluation metrics are typically restricted to visual accuracy and code architecture, disregarding the crucial aspect of interaction consistency between generated outputs and reference designs.

To bridge these gaps, we present WebIGBench, the inaugural benchmark specifically engineered to assess code generation capabilities for interactive webpages featuring complex user interactions. Through a methodology that integrates manually curated interaction paths with UI automation, we compiled a dataset of 103 complex webpages sourced from live websites. This benchmark encompasses five prevalent categories of interactive actions, such as clicking and inputting, totaling 871 distinct interactive events. Additionally, we introduce an innovative evaluation pipeline designed to facilitate the automated assessment of these interactive behaviors. Comprehensive experiments utilizing several representative MLLMs highlight the current performance limits of these models in generating code for interactive webpages. The WebIGBench benchmark is publicly accessible at https://github.com/anoa12159-hue/WebIGBench_eval.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.