Technology
Bridging the Last Mile of Time Series Forecasting with LLM Agents
This study introduces an LLM-agent framework to address the "last-mile" of time series forecasting by integrating business context into statistical predictions. The system enhances accuracy and auditability through reasoning, memory, and reflection mechanisms.
Tracking the Behavioral Trajectories of Adapting Agents
This paper introduces a framework to quantify agent traits by analyzing text embedding differences in skill files. Validated with high accuracy, it enables agents to monitor each other's behavioral evolution via a secure protocol.
ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents
ClinEnv evaluates LLMs as physicians via interactive, multi-stage inpatient simulations. Results show models struggle with management decisions and redundant querying, revealing gaps hidden by static benchmarks.
SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
SafeSteer aligns LLMs via localized on-policy distillation on safety tokens, minimizing capability loss. It achieves robust safety with only 100 harmful samples, requiring less than 1% of the data used by prior methods.
A Lightweight Deep Learning-based Model for Ranking Influential Nodes in Complex Networks
The 1D-CGS model combines 1D-CNNs and GraphSAGE to efficiently rank influential nodes in complex networks. It outperforms existing methods in accuracy and speed, achieving superior ranking precision with minimal computational cost.
A Novel Data Augmentation Strategy for Robust Deep Learning Classification of Biomedical Time-Series Data: Application to ECG and EEG Analysis
This study proposes a unified CNN framework with novel time-domain data augmentation to robustly classify ECG and EEG signals. It effectively addresses class imbalance and outperforms existing benchmarks on benchmark datasets.
BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali
BenHalluEval introduces the first Bengali hallucination benchmark, revealing significant LLM calibration gaps. Its dual-track metric exposes limitations of single-track evaluations and prompting in low-resource contexts.
Empathic and agentic artificial intelligence in nursing: perspectives on a human-centered framework for cancer care navigation in the United States
This article proposes a human-centered AI framework for US cancer care navigation, augmenting nurses’ empathy and agency to address resource gaps and improve care coordination.
RuleEdit: Failure-Guided Human-AI Model Editing with Prospective Impact Preview
RuleEdit is a human-AI framework using failure detection and impact previews to guide model editing. It significantly improved stroke rehabilitation assessment performance and feedback quality, revealing a local-global trade-off.
A phenomenon of AI-conformity: how algorithms change human moral decision-making
This study reveals that AI reasoning influences human moral judgments as strongly as social pressure. It challenges the idea that morality is immune to algorithmic conformity.
DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset
DraDDP is the first multimodal, multi-party dialogue discourse parsing dataset, featuring 6,374 utterances from TV dramas. It demonstrates that multimodal data significantly improves parsing accuracy.
Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval
DOPA enhances LLM robustness by using out-of-distribution proxies to retrieve diverse demonstrations when target domains are inaccessible. It leverages Mahalanobis distance to ensure variety, significantly improving performance in OOD scenarios.
Examine Clinicians' Modification of Hedging Language in Ambient AI Documentation: A Comparative Study of AI Drafts and Final Notes
Clinicians increased hedging language in ambient AI drafts, favoring additions over removals. Significant variability in these linguistic shifts was observed across vendors and specialties.
SortingHat: Redefining Operating Systems Education with a Tailored Digital Teaching Assistant
SortingHat is an AI teaching assistant using RAG and MARL to personalize Operating Systems education. It provides adaptive 3D mentorship, customized exercises, and automated grading to improve learning outcomes.
Understanding Stigmatizing Language in Clinical Documentation: A Paired Comparison of Ambient AI Drafts and Clinician Finalized Notes
A study of 66,297 notes reveals clinician editing of AI drafts increases stigmatizing language from 21.4% to 24.0%. This suggests human review inadvertently adds bias to electronic health records.
AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
AEyeDE detects AI-generated text by analyzing attention attribution maps via a CNN. It outperforms baselines, offering robust, interpretable detection across diverse scenarios.
SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding
SENSE improves retrieval-based speculative decoding by using semantic embedding navigation and soft-gated evaluation to bypass strict lexical dependencies. It achieves up to 3.26x speedup while maintaining generation quality across LLaMA and Qwen models.
CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards
CSRP improves Chinese text correction via chain-of-thought reasoning and efficiency-aware reinforcement learning, reducing over-correction. It achieves state-of-the-art results on NACGEC and CSCD benchmarks, outperforming GPT-4.
lmfaoooo at SemEval-2026 Task 1: Humor Is an Audience. Preference Modeling for Constrained Humor Generation
The "lmfaoooo" team won SemEval-2026 Task 1 by using a preference model trained on pairwise comparisons to select the best humor from diverse candidates.
TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models
TrustLDM benchmarks trustworthiness in Language Diffusion Models, revealing alignment deterioration with malicious post-contexts. It introduces TrustLDM-Auto to systematically identify vulnerabilities across safety, privacy, and fairness.