Global News Digest

Technology

arXiv

RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models

RA-LWLM is a training-free framework using retrieval-augmented in-context learning to enable accurate, cross-scene wireless localization without retraining. It leverages a frozen foundation model and a mixture-of-experts transformer to maintain high accuracy across diverse environments.

arXiv

Collaborative Space Object Detection with Multi-Satellite Viewpoints in LEO Constellations

This study demonstrates that multi-view fusion in LEO constellations significantly boosts Space Object Detection accuracy. Using YOLOv9-m, three-view inputs improved mAP50 by up to 36.3% over single-view baselines, enhancing space situational awareness.

arXiv

Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection

This study finds that a two-stage training approach using generative hand data improves safety-critical detection. It enhances standard performance and reduces the gap for out-of-distribution glove scenarios.

arXiv

The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue

The Image Reconstruction Game benchmark reveals that Describer capabilities drive fidelity, while Generators determine if iteration helps. Automated judges poorly align with human preferences, highlighting the need for calibration.

arXiv

KliniskVestBERT: BERT Model Specialised to Norwegian Clinical Texts

KliniskVestBERT, specialized Norwegian clinical BERT models, outperform baselines on medical benchmarks. Pre-trained on Helse Vest records, they demonstrate the value of domain-specific NLP for healthcare.

arXiv

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Echo unifies speaker diarization, speech recognition, and source separation in a single ViT encoder without fine-tuning. While not SOTA, it successfully coexists three tasks in one model, though VQ bottlenecks limit ASR performance.

arXiv

Rank-Constrained Deep Matrix Completion for Group Recommendation

Group RC-DMC combines low-rank constraints with attention-based modeling to address data sparsity in group recommendations. It outperforms baselines in accuracy and efficiency on MovieLens and Goodbooks datasets.

arXiv

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

MMG2Skill transforms unstructured web guides into self-improving agent skills via a closed-loop framework. It outperforms baselines by structuring guides and refining them through trajectory feedback, boosting performance by 12.8–25.3%.

arXiv

Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks

This study evaluates PEFT techniques like adapters and LoRA for instance segmentation, achieving competitive results by fine-tuning only 1-6% of parameters. It highlights the balance between efficiency and performance across different architectures and datasets.

arXiv

A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision

The TGAD benchmark reveals that current text-guided anomaly detection models rely superficially on language, often ignoring prompts. This suggests performance gains stem from visual features rather than genuine text conditioning.

arXiv

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

This paper introduces a render-free video diffusion framework using 3D mesh tokens for human motion control. It enhances spatial reasoning and reduces artifacts by conditioning generation directly on compressed 3D geometric data.

arXiv

Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association

The study reveals a mismatch between ranking metrics and assignment objectives in multi-view object association. Sinkhorn normalization exposes that optimizing ranking scores does not guarantee accurate object matching.

arXiv

PlanarBench: Evaluating LLM Spatial Reasoning via Planar Graph Drawing

PlanarBench evaluates LLM spatial reasoning by generating ASCII planar graphs from edge lists. It reveals edge count, not node count, primarily predicts task difficulty.

arXiv

Why Do Time Series Models Need Long Context Windows?

Long context windows reduce uncertainty in identifying the underlying data-generating process, not just capturing dependencies. This separation of process identification from forecasting minimizes error and improves scalability.

arXiv

Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift

This study combines EfficientNetB5 with CBAM attention, achieving 93.3% accuracy for peach leaf damage classification. Transfer learning further ensures robustness against domain shifts in diverse orchard environments.

arXiv

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

This paper introduces a fast, lightweight novel view synthesis method using differentiable Multiplane Images. It outperforms NeRF/3DGS by being 30.7% faster and using only 14.8% of the model size.

arXiv

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

OpenWebRL introduces an open framework for training visual web agents via online multi-turn RL on live sites. Its 4B model achieves state-of-the-art open-source results, outperforming prior methods and competing with proprietary systems.

arXiv

Agentic-J: An AI Agent for Biological Microscopy Image Analysis

Agentic-J is an AI agent for ImageJ/Fiji that translates natural language into reproducible microscopy analysis scripts. It simplifies complex tasks like cell tracking and segmentation for biologists without coding expertise.

arXiv

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

ND-MARL enables zero-shot scalable consensus control for up to 250 quadcopters using a 2-Neighbor topology. It outperforms centralized MARL by integrating communication graphs directly into distributed decision-making.

arXiv

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

This study refines LLM error prediction by separating input ambiguity from uncertainty quantification. Integrating ambiguity labels via Gated Experts significantly boosts prediction performance across diverse datasets and metrics.