Technology
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT introduces Multi-Agent Reinforcement Fine-Tuning for LLM-based systems, offering a novel Markov Game formulation and scalable framework to overcome traditional MARL challenges.
A Lightweight Context-Driven Training-Free Network for Scene Text Segmentation and Recognition
This training-free, lightweight framework uses context-driven segmentation to accelerate scene text recognition. It achieves state-of-the-art performance with significantly reduced computational resources.
Erased but Not Forgotten: How Backdoors Compromise Concept Erasure
Backdoors can bypass concept erasure in diffusion models, persisting even after removal attempts. This study reveals that such attacks compromise safety protocols, exposing harmful content despite robust mitigation strategies.
A Survey of 3D Reconstruction with Event Cameras
This survey reviews event-based 3D reconstruction methods, classifying them by input modality and technique. It also covers datasets and identifies key challenges for future research.
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
DetailMaster is a new benchmark evaluating T2I models on long, complex prompts. It reveals significant performance bottlenecks and highlights the need for specialized training to handle detailed inputs effectively.
Simulating Macroeconomic Expectations in Survey Experiments with LLM-based Economic Agents
This study uses LLM-based agents to simulate macroeconomic expectations in surveys, closely mirroring human data. It highlights that prior expectations and diverse information sources are crucial for replicating human-like reasoning and distributions.
Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures
DSR-Bench evaluates LLMs’ structural reasoning via data structures, revealing significant limitations. Even top models scored only 0.46 on hard tasks, struggling with spatial, contextual, and self-referential reasoning.
Value-Free Policy Optimization via Reward Partitioning
Reward Partition Optimization (RPO) eliminates value function estimation by normalizing rewards via prompt-level distributions. It outperforms baselines like DRO and KTO, offering stable, aligned, and diverse outputs without auxiliary models.
Cooperation of Experts: Fusing Heterogeneous Information with Large Margin
The Cooperation of Experts (CoE) framework integrates heterogeneous data via domain-specific encoders collaborating through large margin optimization. It demonstrates superior performance and robustness across diverse benchmarks.
GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks
GFlowGR fine-tunes generative recommendation models using GFlowNets to mitigate exposure bias. It leverages collaborative knowledge and diverse sampling to enhance alignment with recommendation data.
Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution
This paper introduces spherical Cauchy VAEs, offering a stable, fast alternative to von Mises-Fisher models by avoiding expensive Bessel functions. The method enables efficient, exact reparameterization and robust KL divergence computation for hyperspherical latent spaces.
Truth, Trust, and Trouble: Medical AI on the Edge
This study benchmarks medical LLMs, finding AlpaCare-13B leads in accuracy and safety. While few-shot prompting boosts performance, models struggle with complex queries, highlighting trade-offs between truth, trust, and helpfulness.
Model Parallelism With Subnetwork Data Parallelism
Subnetwork Data Parallelism (SDP) reduces per-device memory by 28–60% by training structured subnetworks without activation exchange. It maintains or improves performance while eliminating expensive communication overheads.
Beyond Model Base Retrieval: Weaving Knowledge to Master Fine-grained Neural Network Design
M-DESIGN uses retrieval-augmented refinement and evidence graphs to optimize neural network design efficiently. It outperforms baselines in 26/33 scenarios, achieving top performance under strict computational budgets.
AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research
AblationBench evaluates LMs on planning ablation experiments, revealing top models achieve only 45% accuracy, falling short of human performance.
FedS2R: One-Shot Federated Domain Generalization for Synthetic-to-Real Semantic Segmentation in Autonomous Driving
FedS2R is a one-shot federated framework for synthetic-to-real semantic segmentation in autonomous driving. It outperforms individual client models, trailing only 2 mIoU points behind centralized training.
Graph is a Natural Regularization: Revisiting Vector Quantization for Graph Representation Learning
RGVQ addresses codebook collapse in graph vector quantization via topology-aware regularization and Gumbel-Softmax. It significantly boosts codebook utilization and downstream performance.
From Graph Retrieval to Schema Realization: Counterfactual Validation for Text-to-SPARQL over Heterogeneous Knowledge Graphs
SchemaForge is an agentic framework for Text-to-SPARQL over heterogeneous knowledge graphs, using counterfactual validation to align schemas. It outperforms baselines by 11.5% in execution accuracy across four benchmarks.
Toward accurate RUL and SoH estimation using reinforced graph-based physics-informed neural networks enhanced with dynamic weights
RGPD uses reinforced graph-based physics-informed neural networks with dynamic weighting to enhance RUL and SoH estimation. It achieves up to 20% MAPE reduction across diverse degradation datasets.
Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs
The study reveals how ethical reasoning in LLMs creates vulnerabilities exploited by the TRIAL red-teaming protocol. It proposes ERR, a defensive framework using Layer-Stratified Harm-Gated LoRA to mitigate these reasoning-driven attacks.