Technology
Characterizing Web Search in The Age of Generative AI
This study compares traditional and generative web search, revealing differences in knowledge sources, citation diversity, and output stability. It highlights the need for new evaluation metrics to assess retrieval behavior and synthesis quality in generative AI systems.
Learning-To-Measure: In-Context Active Feature Acquisition
Learning-to-Measure (L2M) is a meta-active feature acquisition framework that learns in-context policies across diverse tasks. It outperforms baselines in high-missingness, low-label scenarios without per-task retraining.
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
CaptionFormer unifies segmentation, tracking, and captioning using VLM-generated synthetic data. It achieves state-of-the-art results on DVOC benchmarks, with code and datasets available.
CARES: Context-Aware Resolution Selector for VLMs
CARES is a lightweight module that selects minimal sufficient image resolutions for VLMs, maintaining accuracy while reducing computational costs by up to 80%.
Video Reasoning without Training
V-Reason enables video reasoning without training by using an entropy-guided controller to optimize inference. It matches RL-based accuracy while reducing token usage by 58.6%.
Generative AI and Sales Productivity: Field Experiments in Online Retail
Large-scale experiments show GenAI boosts online retail sales by up to 16.3% via improved conversion rates, without harming satisfaction.
Symbolic Neural Generation with Applications to Lead Discovery in Drug Design
This study introduces Symbolic Neural Generators (SNGs), hybrid neurosymbolic systems combining ILP with LLMs for drug design. SNGs generate valid molecular inhibitors, matching state-of-the-art performance in known targets and showing promise for novel ones.
Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
First-party AI reports lack social impact depth, while third-party evaluations offer rigorous scrutiny but miss internal data. This gap highlights an urgent need for enforced transparency and independent frameworks.
NILC: Discovering New Intents with LLM-assisted Clustering
NILC is an LLM-assisted clustering framework for New Intent Discovery that iteratively refines centroids and rewrites ambiguous utterances. It significantly outperforms baselines in both unsupervised and semi-supervised settings.
The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold
This paper proves grokking stems from gradient descent minimizing weight norms on the zero-loss manifold. Experiments confirm this framework reproduces delayed generalization and representation learning.
Optimizing Diversity and Quality through Base-Aligned Model Collaboration
BACo dynamically merges base and aligned LLMs at inference to boost diversity and quality without retraining. It outperforms baselines, achieving a 21.3% joint improvement across open-ended generation tasks.
RoboBenchMart: Benchmarking Robots in Retail Environment
RoboBenchMart is an open-source simulated benchmark evaluating VLA robot performance in complex retail "dark-store" environments. It reveals that current models struggle with generalization, prompting the release of its full suite to advance research.
Latent Reasoning in TRMs is Secretly a Policy Improvement Operator
This study reveals latent reasoning in TRMs as a policy improvement operator, enabling an 18x reduction in computational cost without accuracy loss through novel training methods.
Evaluating the Performance of Deep Learning Models in Whole-body Dynamic 3D Posture Prediction During Load-reaching Activities
Transformer networks outperformed BLSTMs in predicting 3D posture during load-reaching, achieving 58% greater accuracy. A novel cost function further reduced errors by enforcing constant body segment lengths.
Latent Collaboration in Multi-Agent Systems
LatentMAS enables direct, lossless collaboration among LLM agents in continuous latent space, bypassing text mediation. It achieves up to 14.6% higher accuracy and 4x faster inference than text-based baselines.
SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning
SpeedAug accelerates robotic policies via tempo-enriched priors and RL fine-tuning. It boosts throughput 1.8x while maintaining high success rates.
From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model
This paper introduces the TAD benchmark and two training-free methods, Scene-CoT and TCogMap, to enhance temporal reasoning in autonomous driving VLMs, significantly boosting performance over current SoTA models.
Understanding the Effects of Distractors on Reasoning Vision-Language Models
This study finds visual distractors reduce VLM accuracy without extending reasoning, unlike textual distractors. It introduces the Idis dataset and a prompting technique to mitigate these effects.
ShelfAware: Real-Time Semantic Localization in Quasi-Static Environments with Low-Cost Sensors
ShelfAware enables robust real-time localization in dynamic indoor spaces by using semantic particle filters. It achieves high accuracy on low-cost hardware by treating scene semantics as statistical evidence.
InFerActive: Interactive Tree-Based Exploration of LLM Sampling for Safety Evaluation
InFerActive is an interactive tree-based platform that optimizes LLM safety evaluation by reducing sample requirements by 5x. It improves efficiency and coverage compared to traditional spreadsheet methods.