Global News Digest

arXiv

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Title: LayerRoute: Adaptive Layer Skipping in Agentic Language Models Through Input-Conditioned LoRA Fine-Tuning

Abstract

Agentic language model architectures typically oscillate between two fundamentally different operational modes: structured tool calls, which are short, deterministic, and exhibit low perplexity, and open-ended planning or reasoning phases, which are lengthy, complex, and characterized by high perplexity. However, existing inference frameworks currently allocate uniform computational resources to every step, ignoring this structural heterogeneity. To address this inefficiency, we present LayerRoute, a lightweight adapter capable of selectively bypassing transformer blocks on a per-input basis.

LayerRoute is integrated into the 24 transformer blocks of the Qwen2.5-0.5B-Instruct model. It introduces two key components to each block: a per-layer router consisting of approximately 897 parameters (implemented as a Linear(896,1) layer) that generates a hard binary gate using a straight-through estimator, and LoRA adapters with a rank of 8, adding roughly 1.08 million parameters to the Q/K/V/O attention projections. The primary backbone weights remain frozen throughout this process.

By conducting a single end-to-end training pass on agentic datasets—including Hermes, Glaive, GSM8K, and Turing—alongside a gate regularization term, the system learns to identify which blocks can be skipped for specific input types. Following 3,000 training steps, which took only 6.4 minutes on an A100 40GB GPU, LayerRoute demonstrates a 12.91% skip differential. Specifically, tool calls result in a 15.25% reduction in FLOPs, whereas planning steps see a mere 2.34% reduction. This performance is achieved with just 1.10 million trainable parameters, representing only 0.22% of the 494 million parameters in the backbone. Furthermore, quality metrics surpass those of the base model due to the LoRA adaptation, yielding perplexity deltas of -1.29 for tool calls and -1.30 for planning tasks.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ā€˜as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers ā€œas much as possible,ā€ emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.