Global News Digest

Technology

arXiv

AutoEval Done Right: Using Synthetic Data for Model Evaluation

This paper introduces statistically rigorous algorithms using synthetic data to enhance autoevaluation. Tested on GPT-4, the methods boosted effective human-labeled sample sizes by up to 50% without bias.

arXiv

Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation

This survey reviews recent multi-modal 3D intelligence advances, proposing a new classification framework and benchmark analysis. It addresses current gaps by highlighting challenges, evaluating methods, and outlining future research directions.

arXiv

Perturbation Effects on Accuracy and Fairness among Similar Individuals

The authors propose RIFair, a framework detecting Robust Individual Fairness violations by generating semantic-preserving perturbations. This approach reveals latent vulnerabilities in deep neural networks that separate robustness and fairness metrics often miss.

arXiv

Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance

This study analyzes Tree-Structured Parzen Estimator (TPE) control parameters via ablation studies to clarify their roles. It provides optimized configurations that significantly enhance TPE's empirical performance in hyperparameter tuning.

arXiv

Implicit Regularization for Multi-label Feature Selection

This study introduces a novel multi-label feature selection estimator using implicit regularization via Hadamard product parameterization and latent semantic label embedding. Experiments show it reduces extra bias and enables benign overfitting compared to conventional sparse methods.

arXiv

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

DAG-Plan uses LLMs to generate Directed Acyclic Graphs for dual-arm robots, enabling parallel execution and dynamic adaptation. It outperforms linear and iterative methods, boosting success rates by 48% and efficiency by 84.1%.

arXiv

Agricultural Landscape Understanding At Country-Scale

This study introduces the first national-scale framework mapping smallholder agricultural entities like fields, trees, and water bodies. High-resolution maps are publicly available via API to support precision farming and policy.

arXiv

A Foundation Model for Wearable Movement Data in Mental Health Research

The Pretrained Actigraphy Transformer (PAT) is an open-source foundation model for wearable movement data that outperforms baselines in mental health predictions. It offers superior accuracy and interpretable insights for tracking depression, sleep, and medication use.

arXiv

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors

This study introduces a self-supervised framework for endoscopic depth and pose estimation using a Generative Latent Bank and VAE. It outperforms existing methods on SimCol and EndoSLAM datasets by leveraging latent priors for robustness.

arXiv

Introduction to Graph Neural Networks for Machine Learning Engineers

This survey explains GNN mechanics via an encoder-decoder paradigm, addressing challenges like oversmoothing. It offers practical insights for ML engineers through theoretical foundations and empirical evaluations on homogeneous graphs.

arXiv

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

HiFi-KPI is a dataset of 1.65M paragraphs for hierarchical KPI extraction from earnings filings, supporting classification and extraction tasks. It includes a lite subset for benchmarking, revealing encoder models outperform LLMs in structured extraction.

arXiv

ShapeLib: Designing a library of programmatic 3D shape abstractions with Large Language Models

ShapeLib uses LLMs to automatically generate programmatic 3D shape abstraction libraries from seeds or text. It outperforms prior methods in usability and generalization, enabling advanced shape editing and generation.

arXiv

Efficient Weighted Sampling via Score-based Generative Models

This paper proposes a training-free weighted sampling framework using pretrained score-based models. It achieves 1.2x-4.7x speedups by avoiding costly derivatives and resampling.

arXiv

EuroBERT: Scaling Multilingual Encoders for European Languages

EuroBERT scales multilingual encoders for European languages, outperforming current models in proficiency, math, and code. It supports 8,192 tokens and provides open-source models and training frameworks.

arXiv

Efficient LLM Moderation with Multi-Layer Latent Prototypes

MLPM is a lightweight, adaptable input moderation tool using multi-layer latent prototypes to enhance safety with minimal overhead. It boosts performance across benchmarks and integrates seamlessly into existing LLM workflows.

arXiv

Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills

Skill-MoE uses inferred skills for adaptive expert routing, boosting performance by 8.15% while clustering instances to fit 16 models on one GPU.

arXiv

Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals

This paper introduces Efficient Layer Attention (ELA), which prunes redundant layers using KL divergence and EBQM. This approach cuts training time by 30% while boosting performance in image classification and object detection.

arXiv

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

The paper argues that inference-time scaling strategies, not just pre-training, enhance generative models. It proposes designing inference procedures before training objectives to improve efficiency in sequence expansion and state refinement.

arXiv

T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models

T1 enhances small language models by using external tools for memory-intensive verification, reducing cognitive load. This approach allows a 1B model to outperform an 8B model on the MATH benchmark.

arXiv

GRANITE : a Byzantine-Resilient Dynamic Gossip Learning Framework

GRANITE is a Byzantine-resilient gossip learning framework that dynamically adjusts aggregation thresholds to counter poisoned models. It achieves near-optimal accuracy with 30% malicious nodes while accelerating convergence and reducing communication costs by up to ninefold.