Global News Digest

arXiv

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

ClawHub Security Signals: Navigating Disagreements Among VirusTotal, Static Analysis, and SkillSpector

Abstract

Agent skills enhance AI agents by providing reusable instructions, scripts, tools, references, and workflows, thereby creating a security perimeter that is distinct from both model safety protocols and conventional package-malware detection methods. ClawHub Security Signals presents a sanitized dataset comprising 67,453 recent public versions of OpenClaw skills. Each entry in the dataset pairs redacted SKILL.md content with sanitized bundled files (where applicable), alongside the final ClawScan registry verdict and evidence gathered from three distinct scanner categories: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector.

Instead of attempting to quantify the prevalence of malicious skills, this study investigates the lack of consensus among these scanners. The three tools rarely identify the same threats; any given pair overlaps on no more than 10.4% of their combined positive findings. Only 0.69% of the total skills were flagged by all three scanners, while a striking 81.9% of flagged skills were detected by just one scanner.

This disagreement is largely structured by the specific attack surface being targeted. SkillSpector, which issues semantic agentic-risk advisories rather than relying on malware-reputation signals, identified 19,209 of the 25,504 suspicious rows (75.3%), yet only 14 of the 206 rows classified as malicious (6.8%). Conversely, the malicious-verdict region displays an inverse pattern: 150 of the 206 malicious rows (72.8%) were positive on VirusTotal, aligning with evidence of bundled-code malware. These findings indicate that securing agent skills necessitates layered governance strategies rather than reliance on single-scanner allow or block decisions.

The corpus is released as a sanitized silver-standard dataset. The labels reflect the registry’s automated verdicts rather than human-annotated ground truth. This release serves as an early, versioned snapshot intended to support the community while a human-annotated subset is developed. We encourage further research, including the development of models tailored for skill-security triage.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.