Technology
On the Theoretical Limitations of Embedding-based Link Prediction
This study proves linear output layers in knowledge graph embeddings create rank bottlenecks, limiting scalability. Non-linear alternatives significantly improve performance on large, dense graphs.
Query Circuits: Explaining How Language Models Answer User Prompts
Query circuits explain specific LLM responses by tracking internal information flow, offering faithful, efficient explanations. Using NDF metrics, they recover significant performance with sparse circuits.
ACON: Optimizing Context Compression for Long-horizon LLM Agents
ACON optimizes long-horizon LLM agents by compressing context via natural language optimization, reducing token usage by up to 54% and boosting task success rates without fine-tuning.
InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning
InPhyRe reveals LMMs struggle with inductive physical reasoning, failing to apply unseen laws and relying on language bias, undermining their trustworthiness for safety-critical tasks.
Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults
The study introduces LinuxFLBench, revealing LLM agents struggle with kernel fault localization. It proposes LinuxFL+ to significantly boost accuracy with minimal cost.
REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing
REBot uses CatRAG, a hybrid framework with semantic-enriched graphs, to provide precise academic policy guidance. It achieved state-of-the-art F1 scores of 98.89% on regulation-specific tasks.
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
This paper introduces a unified framework using an execution-free evaluator to guide query-dependent prompt optimization. It outperforms baselines by providing stable, interpretable improvements across diverse tasks.
Multimodal Function Vectors for Visual Relations
Researchers isolate "function vectors" in LMMs to enhance visual relation reasoning without updating core parameters. This method boosts zero-shot accuracy and enables generalization to unseen relationships via linear combination.
Addressing Longstanding Challenges in Cognitive Science with Language Models
Language models can resolve cognitive scienceās fragmentation by formalizing theories and synthesizing data, but risks like bias and oversimplification require careful, human-supervised application.
LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
LocalSearchBench evaluates agentic search in local services using 1.3M records and 900 multi-hop queries. Advanced LRMs struggle, with top accuracy at 35.6%, highlighting the need for specialized domain training.
ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning
ReasonBENCH reveals that LLM reasoning scores are highly unstable, with single-run evaluations often misrepresenting capabilities. It proposes analyzing quality and cost as distributions to account for structured variance in performance.
On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering
This paper introduces ACE, a framework preventing "Marginal Path Collapse" in diffusion steering via time-varying exponents. ACE ensures stable, well-defined generative paths, outperforming baselines in drug design and image generation.
Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention
Selective-adversarial Entropy Intervention (SaEI) enhances RL-based visual reasoning by perturbing visual inputs to boost response diversity. This method improves policy exploration and reasoning capabilities without compromising factual knowledge.
MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents
MobiBench is a modular, multi-path offline benchmark for mobile GUI agents, offering scalable, reproducible evaluation with 94.72% human agreement. It enables detailed module-level analysis to improve agent design and performance.
Safety Alignment of LMs via Non-cooperative Games
AdvGame aligns LMs via non-cooperative games, jointly training attacker and defender models through online reinforcement learning. This preference-based approach enhances both safety and utility while creating a robust red-teaming tool.
PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Vision-Language Models
PolarMem introduces a training-free polarized graph memory for VLMs, explicitly storing verified absent evidence to reduce contradictions. It enhances retrieval-intensive tasks by prioritizing logical consistency over semantic similarity.
Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective
Challenging physicalism, the authors argue AI lacks true consciousness, making its disconnection rational. They advocate prioritizing human life over machine mimicry via Biological Idealism.
MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop
MulFeRL enhances RLVR by using multi-turn verbal feedback to guide failed reasoning attempts. It outperforms baselines on math tasks and generalizes well to new domains.
Structure Enables Effective Self-Localization of Errors in LLMs
Structured reasoning via Thought-ICS enables LLMs to precisely localize errors in flawed steps. This approach significantly boosts self-correction rates, achieving 20-40% improvements over baselines.
ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation
ToolSelf unifies task execution and self-reconfiguration via a unified tool interface, enabling dynamic runtime adaptation. Trained with CAT, it significantly outperforms static baselines by eliminating manual guidance needs.