Global News Digest

Technology

arXiv

Characterizing Web Search in The Age of Generative AI

This study compares traditional and generative web search, revealing differences in knowledge sources, citation diversity, and output stability. It highlights the need for new evaluation metrics to assess retrieval behavior and synthesis quality in generative AI systems.

arXiv

Learning-To-Measure: In-Context Active Feature Acquisition

Learning-to-Measure (L2M) is a meta-active feature acquisition framework that learns in-context policies across diverse tasks. It outperforms baselines in high-missingness, low-label scenarios without per-task retraining.

arXiv

CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects

CaptionFormer unifies segmentation, tracking, and captioning using VLM-generated synthetic data. It achieves state-of-the-art results on DVOC benchmarks, with code and datasets available.

arXiv

CARES: Context-Aware Resolution Selector for VLMs

CARES is a lightweight module that selects minimal sufficient image resolutions for VLMs, maintaining accuracy while reducing computational costs by up to 80%.

arXiv

Video Reasoning without Training

V-Reason enables video reasoning without training by using an entropy-guided controller to optimize inference. It matches RL-based accuracy while reducing token usage by 58.6%.

arXiv

Generative AI and Sales Productivity: Field Experiments in Online Retail

Large-scale experiments show GenAI boosts online retail sales by up to 16.3% via improved conversion rates, without harming satisfaction.

arXiv

Symbolic Neural Generation with Applications to Lead Discovery in Drug Design

This study introduces Symbolic Neural Generators (SNGs), hybrid neurosymbolic systems combining ILP with LLMs for drug design. SNGs generate valid molecular inhibitors, matching state-of-the-art performance in known targets and showing promise for novel ones.

arXiv

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

First-party AI reports lack social impact depth, while third-party evaluations offer rigorous scrutiny but miss internal data. This gap highlights an urgent need for enforced transparency and independent frameworks.

arXiv

NILC: Discovering New Intents with LLM-assisted Clustering

NILC is an LLM-assisted clustering framework for New Intent Discovery that iteratively refines centroids and rewrites ambiguous utterances. It significantly outperforms baselines in both unsupervised and semi-supervised settings.

arXiv

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

This paper proves grokking stems from gradient descent minimizing weight norms on the zero-loss manifold. Experiments confirm this framework reproduces delayed generalization and representation learning.

arXiv

Optimizing Diversity and Quality through Base-Aligned Model Collaboration

BACo dynamically merges base and aligned LLMs at inference to boost diversity and quality without retraining. It outperforms baselines, achieving a 21.3% joint improvement across open-ended generation tasks.

arXiv

RoboBenchMart: Benchmarking Robots in Retail Environment

RoboBenchMart is an open-source simulated benchmark evaluating VLA robot performance in complex retail "dark-store" environments. It reveals that current models struggle with generalization, prompting the release of its full suite to advance research.

arXiv

Latent Reasoning in TRMs is Secretly a Policy Improvement Operator

This study reveals latent reasoning in TRMs as a policy improvement operator, enabling an 18x reduction in computational cost without accuracy loss through novel training methods.

arXiv

Evaluating the Performance of Deep Learning Models in Whole-body Dynamic 3D Posture Prediction During Load-reaching Activities

Transformer networks outperformed BLSTMs in predicting 3D posture during load-reaching, achieving 58% greater accuracy. A novel cost function further reduced errors by enforcing constant body segment lengths.

arXiv

Latent Collaboration in Multi-Agent Systems

LatentMAS enables direct, lossless collaboration among LLM agents in continuous latent space, bypassing text mediation. It achieves up to 14.6% higher accuracy and 4x faster inference than text-based baselines.

arXiv

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

SpeedAug accelerates robotic policies via tempo-enriched priors and RL fine-tuning. It boosts throughput 1.8x while maintaining high success rates.

arXiv

From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model

This paper introduces the TAD benchmark and two training-free methods, Scene-CoT and TCogMap, to enhance temporal reasoning in autonomous driving VLMs, significantly boosting performance over current SoTA models.

arXiv

Understanding the Effects of Distractors on Reasoning Vision-Language Models

This study finds visual distractors reduce VLM accuracy without extending reasoning, unlike textual distractors. It introduces the Idis dataset and a prompting technique to mitigate these effects.

arXiv

ShelfAware: Real-Time Semantic Localization in Quasi-Static Environments with Low-Cost Sensors

ShelfAware enables robust real-time localization in dynamic indoor spaces by using semantic particle filters. It achieves high accuracy on low-cost hardware by treating scene semantics as statistical evidence.

arXiv

InFerActive: Interactive Tree-Based Exploration of LLM Sampling for Safety Evaluation

InFerActive is an interactive tree-based platform that optimizes LLM safety evaluation by reducing sample requirements by 5x. It improves efficiency and coverage compared to traditional spreadsheet methods.