Technology
AutoEval Done Right: Using Synthetic Data for Model Evaluation
This paper introduces statistically rigorous algorithms using synthetic data to enhance autoevaluation. Tested on GPT-4, the methods boosted effective human-labeled sample sizes by up to 50% without bias.
Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation
This survey reviews recent multi-modal 3D intelligence advances, proposing a new classification framework and benchmark analysis. It addresses current gaps by highlighting challenges, evaluating methods, and outlining future research directions.
Perturbation Effects on Accuracy and Fairness among Similar Individuals
The authors propose RIFair, a framework detecting Robust Individual Fairness violations by generating semantic-preserving perturbations. This approach reveals latent vulnerabilities in deep neural networks that separate robustness and fairness metrics often miss.
Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance
This study analyzes Tree-Structured Parzen Estimator (TPE) control parameters via ablation studies to clarify their roles. It provides optimized configurations that significantly enhance TPE's empirical performance in hyperparameter tuning.
Implicit Regularization for Multi-label Feature Selection
This study introduces a novel multi-label feature selection estimator using implicit regularization via Hadamard product parameterization and latent semantic label embedding. Experiments show it reduces extra bias and enables benign overfitting compared to conventional sparse methods.
DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning
DAG-Plan uses LLMs to generate Directed Acyclic Graphs for dual-arm robots, enabling parallel execution and dynamic adaptation. It outperforms linear and iterative methods, boosting success rates by 48% and efficiency by 84.1%.
Agricultural Landscape Understanding At Country-Scale
This study introduces the first national-scale framework mapping smallholder agricultural entities like fields, trees, and water bodies. High-resolution maps are publicly available via API to support precision farming and policy.
A Foundation Model for Wearable Movement Data in Mental Health Research
The Pretrained Actigraphy Transformer (PAT) is an open-source foundation model for wearable movement data that outperforms baselines in mental health predictions. It offers superior accuracy and interpretable insights for tracking depression, sleep, and medication use.
Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors
This study introduces a self-supervised framework for endoscopic depth and pose estimation using a Generative Latent Bank and VAE. It outperforms existing methods on SimCol and EndoSLAM datasets by leveraging latent priors for robustness.
Introduction to Graph Neural Networks for Machine Learning Engineers
This survey explains GNN mechanics via an encoder-decoder paradigm, addressing challenges like oversmoothing. It offers practical insights for ML engineers through theoretical foundations and empirical evaluations on homogeneous graphs.
HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
HiFi-KPI is a dataset of 1.65M paragraphs for hierarchical KPI extraction from earnings filings, supporting classification and extraction tasks. It includes a lite subset for benchmarking, revealing encoder models outperform LLMs in structured extraction.
ShapeLib: Designing a library of programmatic 3D shape abstractions with Large Language Models
ShapeLib uses LLMs to automatically generate programmatic 3D shape abstraction libraries from seeds or text. It outperforms prior methods in usability and generalization, enabling advanced shape editing and generation.
Efficient Weighted Sampling via Score-based Generative Models
This paper proposes a training-free weighted sampling framework using pretrained score-based models. It achieves 1.2x-4.7x speedups by avoiding costly derivatives and resampling.
EuroBERT: Scaling Multilingual Encoders for European Languages
EuroBERT scales multilingual encoders for European languages, outperforming current models in proficiency, math, and code. It supports 8,192 tokens and provides open-source models and training frameworks.
Efficient LLM Moderation with Multi-Layer Latent Prototypes
MLPM is a lightweight, adaptable input moderation tool using multi-layer latent prototypes to enhance safety with minimal overhead. It boosts performance across benchmarks and integrates seamlessly into existing LLM workflows.
Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills
Skill-MoE uses inferred skills for adaptive expert routing, boosting performance by 8.15% while clustering instances to fit 16 models on one GPU.
Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
This paper introduces Efficient Layer Attention (ELA), which prunes redundant layers using KL divergence and EBQM. This approach cuts training time by 30% while boosting performance in image classification and object detection.
Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms
The paper argues that inference-time scaling strategies, not just pre-training, enhance generative models. It proposes designing inference procedures before training objectives to improve efficiency in sequence expansion and state refinement.
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
T1 enhances small language models by using external tools for memory-intensive verification, reducing cognitive load. This approach allows a 1B model to outperform an 8B model on the MATH benchmark.
GRANITE : a Byzantine-Resilient Dynamic Gossip Learning Framework
GRANITE is a Byzantine-resilient gossip learning framework that dynamically adjusts aggregation thresholds to counter poisoned models. It achieves near-optimal accuracy with 30% malicious nodes while accelerating convergence and reducing communication costs by up to ninefold.