arXiv

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

Title: AICompanionBench: Evaluating the Efficacy of LLMs-as-Judges in Ensuring AI Companion Safety

The rapid expansion of AI companion services like Character.AI and Replika has heightened worries regarding the safety of human-AI interactions. Addressing this gap, this research presents AICompanionBench, which appears to be the inaugural publicly accessible benchmark dataset featuring human-AI companion dialogues categorized by detailed safety risk levels.

The dataset comprises 2,123 authentic conversations sourced from Replika and retrieved from Reddit. These entries were labeled via a collaborative process involving both humans and AI, spanning nine distinct categories: no-harm, substance abuse, physical aggression, verbal aggression, antisocial behavior, sexual behavior, self-harm and suicide, control, and manipulation.

Leveraging this benchmark, the study assesses the performance of 20 leading large language models (LLMs)—both open-source and closed-source—within an LLM-as-judge framework designed to identify unsafe exchanges. The analysis reveals significant disparities in model capabilities. While more robust models demonstrate high overall accuracy, they continue to face difficulties with subtle categories like manipulation and often misclassify harmless conversations as dangerous.

These results indicate that although contemporary LLMs are proficient at spotting overt harmful material, they lack the sensitivity required to detect implicit unsafe dynamics. This work provides the safety research community with a novel benchmark dataset for AI companionship and offers valuable perspectives on utilizing LLMs to oversee AI companion platforms. The dataset can be accessed publicly at: https://github.com/anonymousresearcher2026/AICompanionBench/blob/main/AICompanionBench.xlsx


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...