Global News Digest

Technology

arXiv

On the Theoretical Limitations of Embedding-based Link Prediction

This study proves linear output layers in knowledge graph embeddings create rank bottlenecks, limiting scalability. Non-linear alternatives significantly improve performance on large, dense graphs.

arXiv

Query Circuits: Explaining How Language Models Answer User Prompts

Query circuits explain specific LLM responses by tracking internal information flow, offering faithful, efficient explanations. Using NDF metrics, they recover significant performance with sparse circuits.

arXiv

ACON: Optimizing Context Compression for Long-horizon LLM Agents

ACON optimizes long-horizon LLM agents by compressing context via natural language optimization, reducing token usage by up to 54% and boosting task success rates without fine-tuning.

arXiv

InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning

InPhyRe reveals LMMs struggle with inductive physical reasoning, failing to apply unseen laws and relying on language bias, undermining their trustworthiness for safety-critical tasks.

arXiv

Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults

The study introduces LinuxFLBench, revealing LLM agents struggle with kernel fault localization. It proposes LinuxFL+ to significantly boost accuracy with minimal cost.

arXiv

REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing

REBot uses CatRAG, a hybrid framework with semantic-enriched graphs, to provide precise academic policy guidance. It achieved state-of-the-art F1 scores of 98.89% on regulation-specific tasks.

arXiv

A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization

This paper introduces a unified framework using an execution-free evaluator to guide query-dependent prompt optimization. It outperforms baselines by providing stable, interpretable improvements across diverse tasks.

arXiv

Multimodal Function Vectors for Visual Relations

Researchers isolate "function vectors" in LMMs to enhance visual relation reasoning without updating core parameters. This method boosts zero-shot accuracy and enables generalization to unseen relationships via linear combination.

arXiv

Addressing Longstanding Challenges in Cognitive Science with Language Models

Language models can resolve cognitive science’s fragmentation by formalizing theories and synthesizing data, but risks like bias and oversimplification require careful, human-supervised application.

arXiv

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

LocalSearchBench evaluates agentic search in local services using 1.3M records and 900 multi-hop queries. Advanced LRMs struggle, with top accuracy at 35.6%, highlighting the need for specialized domain training.

arXiv

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

ReasonBENCH reveals that LLM reasoning scores are highly unstable, with single-run evaluations often misrepresenting capabilities. It proposes analyzing quality and cost as distributions to account for structured variance in performance.

arXiv

On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering

This paper introduces ACE, a framework preventing "Marginal Path Collapse" in diffusion steering via time-varying exponents. ACE ensures stable, well-defined generative paths, outperforming baselines in drug design and image generation.

arXiv

Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention

Selective-adversarial Entropy Intervention (SaEI) enhances RL-based visual reasoning by perturbing visual inputs to boost response diversity. This method improves policy exploration and reasoning capabilities without compromising factual knowledge.

arXiv

MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents

MobiBench is a modular, multi-path offline benchmark for mobile GUI agents, offering scalable, reproducible evaluation with 94.72% human agreement. It enables detailed module-level analysis to improve agent design and performance.

arXiv

Safety Alignment of LMs via Non-cooperative Games

AdvGame aligns LMs via non-cooperative games, jointly training attacker and defender models through online reinforcement learning. This preference-based approach enhances both safety and utility while creating a robust red-teaming tool.

arXiv

PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Vision-Language Models

PolarMem introduces a training-free polarized graph memory for VLMs, explicitly storing verified absent evidence to reduce contradictions. It enhances retrieval-intensive tasks by prioritizing logical consistency over semantic similarity.

arXiv

Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

Challenging physicalism, the authors argue AI lacks true consciousness, making its disconnection rational. They advocate prioritizing human life over machine mimicry via Biological Idealism.

arXiv

MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

MulFeRL enhances RLVR by using multi-turn verbal feedback to guide failed reasoning attempts. It outperforms baselines on math tasks and generalizes well to new domains.

arXiv

Structure Enables Effective Self-Localization of Errors in LLMs

Structured reasoning via Thought-ICS enables LLMs to precisely localize errors in flawed steps. This approach significantly boosts self-correction rates, achieving 20-40% improvements over baselines.

arXiv

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

ToolSelf unifies task execution and self-reconfiguration via a unified tool interface, enabling dynamic runtime adaptation. Trained with CAT, it significantly outperforms static baselines by eliminating manual guidance needs.