Global News Digest

Technology

arXiv

Bridging the Last Mile of Time Series Forecasting with LLM Agents

This study introduces an LLM-agent framework to address the "last-mile" of time series forecasting by integrating business context into statistical predictions. The system enhances accuracy and auditability through reasoning, memory, and reflection mechanisms.

arXiv

Tracking the Behavioral Trajectories of Adapting Agents

This paper introduces a framework to quantify agent traits by analyzing text embedding differences in skill files. Validated with high accuracy, it enables agents to monitor each other's behavioral evolution via a secure protocol.

arXiv

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

ClinEnv evaluates LLMs as physicians via interactive, multi-stage inpatient simulations. Results show models struggle with management decisions and redundant querying, revealing gaps hidden by static benchmarks.

arXiv

SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment

SafeSteer aligns LLMs via localized on-policy distillation on safety tokens, minimizing capability loss. It achieves robust safety with only 100 harmful samples, requiring less than 1% of the data used by prior methods.

arXiv

A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks

The 1D-CGS model combines 1D-CNNs and GraphSAGE to efficiently rank influential nodes in complex networks. It outperforms existing methods in accuracy and speed, achieving superior ranking precision with minimal computational cost.

arXiv

A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis

This study proposes a unified CNN framework with novel time-domain data augmentation to robustly classify ECG and EEG signals. It effectively addresses class imbalance and outperforms existing benchmarks on benchmark datasets.

arXiv

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

BenHalluEval introduces the first Bengali hallucination benchmark, revealing significant LLM calibration gaps. Its dual-track metric exposes limitations of single-track evaluations and prompting in low-resource contexts.

arXiv

Empathic and agentic artificial intelligence in nursing: perspectives on a human-centered framework for cancer care navigation in the United States

This article proposes a human-centered AI framework for US cancer care navigation, augmenting nurses’ empathy and agency to address resource gaps and improve care coordination.

arXiv

RuleEdit: Failure-Guided Human-AI Model Editing with Prospective Impact Preview

RuleEdit is a human-AI framework using failure detection and impact previews to guide model editing. It significantly improved stroke rehabilitation assessment performance and feedback quality, revealing a local-global trade-off.

arXiv

A phenomenon of AI-conformity: how algorithms change human moral decision-making

This study reveals that AI reasoning influences human moral judgments as strongly as social pressure. It challenges the idea that morality is immune to algorithmic conformity.

arXiv

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

DraDDP is the first multimodal, multi-party dialogue discourse parsing dataset, featuring 6,374 utterances from TV dramas. It demonstrates that multimodal data significantly improves parsing accuracy.

arXiv

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

DOPA enhances LLM robustness by using out-of-distribution proxies to retrieve diverse demonstrations when target domains are inaccessible. It leverages Mahalanobis distance to ensure variety, significantly improving performance in OOD scenarios.

arXiv

Examine Clinicians' Modification of Hedging Language in Ambient AI Documentation: A Comparative Study of AI Drafts and Final Notes

Clinicians increased hedging language in ambient AI drafts, favoring additions over removals. Significant variability in these linguistic shifts was observed across vendors and specialties.

arXiv

SortingHat: Redefining Operating Systems Education with a Tailored Digital Teaching Assistant

SortingHat is an AI teaching assistant using RAG and MARL to personalize Operating Systems education. It provides adaptive 3D mentorship, customized exercises, and automated grading to improve learning outcomes.

arXiv

Understanding Stigmatizing Language in Clinical Documentation: A Paired Comparison of Ambient AI Drafts and Clinician Finalized Notes

A study of 66,297 notes reveals clinician editing of AI drafts increases stigmatizing language from 21.4% to 24.0%. This suggests human review inadvertently adds bias to electronic health records.

arXiv

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

AEyeDE detects AI-generated text by analyzing attention attribution maps via a CNN. It outperforms baselines, offering robust, interpretable detection across diverse scenarios.

arXiv

SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding

SENSE improves retrieval-based speculative decoding by using semantic embedding navigation and soft-gated evaluation to bypass strict lexical dependencies. It achieves up to 3.26x speedup while maintaining generation quality across LLaMA and Qwen models.

arXiv

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

CSRP improves Chinese text correction via chain-of-thought reasoning and efficiency-aware reinforcement learning, reducing over-correction. It achieves state-of-the-art results on NACGEC and CSCD benchmarks, outperforming GPT-4.

arXiv

lmfaoooo at SemEval-2026 Task 1: Humor Is an Audience. Preference Modeling for Constrained Humor Generation

The "lmfaoooo" team won SemEval-2026 Task 1 by using a preference model trained on pairwise comparisons to select the best humor from diverse candidates.

arXiv

TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models

TrustLDM benchmarks trustworthiness in Language Diffusion Models, revealing alignment deterioration with malicious post-contexts. It introduces TrustLDM-Auto to systematically identify vulnerabilities across safety, privacy, and fairness.