Echelon: Auditable Aggregate-Only Language-Model Adaptation Across Privacy Boundaries
Title: Echelon: Enabling Auditable, Aggregate-Only Language Model Adaptation Across Privacy Boundaries
Abstract
Cross-organizational language model adaptation is increasingly constrained by strict governance policies. In numerous deployment scenarios, critical components such as device-level model parameters, activations, optimizer states, and per-device updates are prohibited from leaving their administrative boundaries. Current distributed and federated frameworks generally presuppose the ability to exchange models across sites, subsequently attempting to layer privacy mechanisms on top. This approach complicates regulatory compliance and results in fragile auditing processes.
To address these challenges, we introduce Echelon, a training architecture designed with privacy boundaries as a foundational element. Echelon treats the non-export of device-level model state as a strict systems invariant. Within each boundary, devices perform local training; the sole data transmitted across boundaries consists of securely aggregated boundary-level deltas and minimal coordination metadata (O(1) size), which are exposed via a tangible audit surface.
Limiting inter-boundary exchange to aggregated updates fundamentally alters the optimization landscape. The system must maintain stability despite challenges such as wide-area network (WAN) delays, heterogeneous participation rates, user churn, and non-independent and identically distributed (non-IID) data distributions, all while the global plane remains blind to individual device updates. Echelon achieves this through a combination of buffered semi-asynchronous secure aggregation, staleness-aware weighting, defined participation windows, proximal local objectives, and a drift-aware outer synchronization controller.
In benchmarks involving 1B-parameter LoRA adaptation across M=2 boundaries, Echelon was evaluated in a budget-matched contest over three random seeds using 24.88M tokens. The system achieved a validation loss of 3.887 ± 0.010, performing best or tied for best among tuned low-communication baselines under constraints of fixed tokens, fixed byte size, fixed wall-clock time, and fixed synchronization counts.
Furthermore, stress tests on OpenWebText demonstrated that Echelon sustained throughput rates between 2,139 and 2,176 tokens per second across various WAN and non-IID conditions. Compared to a privacy-parity DiLoCo+SA baseline, Echelon-DA reduced the time-to-target under WAN latency conditions. Quality degradation remained minimal, at no more than 2.2%, even when subjected to 200ms of emulated latency or severe non-IID data partitioning.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



