arXiv

VGGSounder: Audio-Visual Evaluations for Foundation Models

June 4, 2026 · Daniil Zverev, Thadd\"aus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke · Original Source

Title: VGGSounder: Audio-Visual Evaluations for Foundation Models

Original: arXiv:2508.08237v4 Announce Type: replace-cross

Abstract: As audio-visual foundation models continue to emerge, the need for robust assessment of their multi-modal comprehension has become increasingly critical. While the VGGSound dataset serves as a standard benchmark for audio-visual classification, our investigation highlights significant shortcomings within it, such as incomplete labeling, partially overlapping classes, and misaligned modalities. These issues result in skewed evaluations of both auditory and visual competencies. To overcome these challenges, we present VGGSounder, a newly re-annotated, multi-label test set that builds upon VGGSound and is tailored specifically for assessing audio-visual foundation models. VGGSounder incorporates detailed modality annotations, allowing for more accurate analysis of performance across individual modalities. Additionally, we uncover model limitations by examining performance declines when an additional input modality is introduced, utilizing our novel modality confusion metric.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Bloomberg

Exelon CEO Sees Daily Cybersecurity Threats

June 4, 2026

Exelon’s CEO warns of daily cybersecurity threats, highlighting persistent risks to the energy giant.

TechCrunch

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

June 4, 2026

Ramp secured $750M at a $44B valuation, driven by AI integration and $1.5B+ revenue. The fintech firm now serves 70,000 ...

TechCrunch

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

June 4, 2026

Hello Robot’s Stretch avoids Silicon Valley hype, focusing on practical home deployment to gather essential real-world d...

Bloomberg

Canada to Provide Funding, Buy Equity Stakes in AI Startups

June 4, 2026

Canada will fund and buy equity stakes in AI startups to boost the sector. This investment aims to strengthen the nation...

TechCrunch

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

June 4, 2026

A joint Western security alert warns that Chinese spies use LinkedIn to impersonate recruiters and extract sensitive dat...

Bloomberg

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower

June 4, 2026

Peter Thiel’s family office set a record rent for a Miami tower lease. This deal establishes a new benchmark for the cit...

Top international news

VGGSounder: Audio-Visual Evaluations for Foundation Models

Related Articles

Exelon CEO Sees Daily Cybersecurity Threats

Ramp raises $750M at $44B valuation as investors hunger for fintechs with an AI story

Is Silicon Valley ready to put robots in people’s homes? Hello Robot is.

Canada to Provide Funding, Buy Equity Stakes in AI Startups

Chinese spies are using LinkedIn to lure Westerners into sharing sensitive information

Peter Thiel’s Family Office Pays Record Rent for Top Miami Tower