arXiv

Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)

Title: Aligned Training: A Parameter-Free Approach to Enhancing the Quality and Stability of Sparse Autoencoder Features

Abstract:

Sparse autoencoders (SAEs) serve as a primary tool for interpreting the internal mechanisms of deep neural networks (DNNs) by decomposing activations into high-dimensional features. Nevertheless, they suffer from significant limitations, most notably the presence of numerous inactive "dead" features and inherent instability. While existing SAE variants seek to address these problems, they typically necessitate extra data, resampling procedures, or additional training phases. In this work, we introduce aligned training, a parameter-free reparameterization technique that simultaneously boosts reconstruction accuracy, eradicates dead features, and markedly increases stability across different training seeds.

Our method is grounded in a previously unnoticed phenomenon: the quality of SAE features, quantified by the inner product between encoder and decoder directions (termed the alignment score), exhibits a bimodal distribution across contemporary architectures. Aligned training imposes a geometric constraint that forces the inner product between encoder and decoder weights to equal one for each feature. This mechanism eliminates a specific source of degeneracy in SAE training without introducing any new hyperparameters.

Experiments across various models, dictionary sizes, and sparsity levels demonstrate that aligned training achieves Pareto improvements on SAEBench benchmarks. Furthermore, beyond resolving issues related to dead features, stability, and reconstruction, the method is compatible with mechanical interpretability techniques such as Top/BatchTop-K architectures and p-Annealing. Ultimately, aligned training significantly elevates the quality and stability of SAE features without incurring additional computational complexity or cost.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...