Technology
RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models
RA-LWLM is a training-free framework using retrieval-augmented in-context learning to enable accurate, cross-scene wireless localization without retraining. It leverages a frozen foundation model and a mixture-of-experts transformer to maintain high accuracy across diverse environments.
Collaborative Space Object Detection with Multi-Satellite Viewpoints in LEO Constellations
This study demonstrates that multi-view fusion in LEO constellations significantly boosts Space Object Detection accuracy. Using YOLOv9-m, three-view inputs improved mAP50 by up to 36.3% over single-view baselines, enhancing space situational awareness.
Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection
This study finds that a two-stage training approach using generative hand data improves safety-critical detection. It enhances standard performance and reduces the gap for out-of-distribution glove scenarios.
The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue
The Image Reconstruction Game benchmark reveals that Describer capabilities drive fidelity, while Generators determine if iteration helps. Automated judges poorly align with human preferences, highlighting the need for calibration.
KliniskVestBERT: BERT Model Specialised to Norwegian Clinical Texts
KliniskVestBERT, specialized Norwegian clinical BERT models, outperform baselines on medical benchmarks. Pre-trained on Helse Vest records, they demonstrate the value of domain-specific NLP for healthcare.
Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space
Echo unifies speaker diarization, speech recognition, and source separation in a single ViT encoder without fine-tuning. While not SOTA, it successfully coexists three tasks in one model, though VQ bottlenecks limit ASR performance.
Rank-Constrained Deep Matrix Completion for Group Recommendation
Group RC-DMC combines low-rank constraints with attention-based modeling to address data sparsity in group recommendations. It outperforms baselines in accuracy and efficiency on MovieLens and Goodbooks datasets.
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?
MMG2Skill transforms unstructured web guides into self-improving agent skills via a closed-loop framework. It outperforms baselines by structuring guides and refining them through trajectory feedback, boosting performance by 12.8–25.3%.
Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
This study evaluates PEFT techniques like adapters and LoRA for instance segmentation, achieving competitive results by fine-tuning only 1-6% of parameters. It highlights the balance between efficiency and performance across different architectures and datasets.
A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision
The TGAD benchmark reveals that current text-guided anomaly detection models rely superficially on language, often ignoring prompts. This suggests performance gains stem from visual features rather than genuine text conditioning.
Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization
This paper introduces a render-free video diffusion framework using 3D mesh tokens for human motion control. It enhances spatial reasoning and reduces artifacts by conditioning generation directly on compressed 3D geometric data.
Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association
The study reveals a mismatch between ranking metrics and assignment objectives in multi-view object association. Sinkhorn normalization exposes that optimizing ranking scores does not guarantee accurate object matching.
PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing
PlanarBench evaluates LLM spatial reasoning by generating ASCII planar graphs from edge lists. It reveals edge count, not node count, primarily predicts task difficulty.
Why Do Time Series Models Need Long Context Windows?
Long context windows reduce uncertainty in identifying the underlying data-generating process, not just capturing dependencies. This separation of process identification from forecasting minimizes error and improves scalability.
Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift
This study combines EfficientNetB5 with CBAM attention, achieving 93.3% accuracy for peach leaf damage classification. Transfer learning further ensures robustness against domain shifts in diverse orchard environments.
Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image
This paper introduces a fast, lightweight novel view synthesis method using differentiable Multiplane Images. It outperforms NeRF/3DGS by being 30.7% faster and using only 14.8% of the model size.
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
OpenWebRL introduces an open framework for training visual web agents via online multi-turn RL on live sites. Its 4B model achieves state-of-the-art open-source results, outperforming prior methods and competing with proprietary systems.
Agentic-J: An AI Agent for Biological Microscopy Image Analysis
Agentic-J is an AI agent for ImageJ/Fiji that translates natural language into reproducible microscopy analysis scripts. It simplifies complex tasks like cell tracking and segmentation for biologists without coding expertise.
Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters
ND-MARL enables zero-shot scalable consensus control for up to 250 quadcopters using a 2-Neighbor topology. It outperforms centralized MARL by integrating communication graphs directly into distributed decision-making.
The Role of Ambiguity in Error Prediction via Uncertainty Quantification
This study refines LLM error prediction by separating input ambiguity from uncertainty quantification. Integrating ambiguity labels via Gated Experts significantly boosts prediction performance across diverse datasets and metrics.