Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?
Title: Deconstructing Molecular Toxicity: Is the Industry Prepared for Structure-Level Molecular Detoxification Using MLLMs?
Abstract:
Early-stage drug development is frequently derailed by toxicity issues. While significant progress has been made in molecular design and property prediction, the specific challenge of molecular toxicity repair—defined as generating structurally sound molecular variants with diminished toxicity—has lacked a systematic definition or standardized benchmarking framework. To address this deficiency, we present ToxiMol, the inaugural benchmark task designed for general-purpose Multimodal Large Language Models (MLLMs) targeting molecular toxicity repair.
Our work involves the creation of a comprehensive, standardized dataset that encompasses 11 core tasks and features 660 representative toxic molecules, covering a wide array of mechanisms and levels of granularity. To support this, we developed a prompt annotation pipeline equipped with mechanism-aware and task-adaptive functionalities, grounded in expert toxicological insights. Concurrently, we introduced ToxiEval, an automated evaluation framework that streamlines the assessment of repair success through a high-throughput chain integrating toxicity endpoint prediction, synthetic accessibility, drug-likeness, and structural similarity.
We conducted a systematic evaluation of 43 mainstream general-purpose MLLMs, supplemented by extensive ablation studies to investigate critical factors such as evaluation metrics, candidate diversity, and failure attribution. Our experimental findings reveal that while current MLLMs encounter substantial hurdles in this domain, they are beginning to exhibit promising proficiency in understanding toxicity, adhering to semantic constraints, and performing structure-aware edits.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC



