A Negative Result on Cross-Model Activation Transfer in a Pythia Multi-Hop Setting
Title: Cross-Model Activation Transfer Fails to Enhance Performance in a Pythia Multi-Hop Scenario
Abstract: While recent studies indicate that language models can convey behavioral characteristics via hidden signals embedded in generated data during the training phase, this research investigates whether a more direct and rigorous communication channel is feasible. Specifically, we explore the possibility of one language model conveying intermediate reasoning states to another at inference time by translating and injecting hidden activations, bypassing the need to transmit natural-language text. To evaluate this hypothesis, we conducted a controlled experiment involving a multi-hop reasoning task between a Pythia-160M model (sender) and a Pythia-410M model (receiver).
Our analysis reveals that a linear translation layer successfully establishes a robust map between the hidden states of the sender and receiver in normalized space, achieving a normalized cosine similarity of approximately 0.97 across different random seeds. Despite this strong representational alignment, injecting these translated activations into the receiver during inference yields no improvement in downstream answering accuracy. Additive injection at low strengths performs no better than the baseline with no injection, as evidenced by confidence intervals that encompass zero. Furthermore, replacement-style injection consistently degrades performance, and attempts to rescale the translated vectors to match the receiver’s hidden-state norm fail to restore efficacy. Consequently, this study presents a scoped negative result: within this specific context, offline representational alignment does not facilitate effective causal communication within the receiving model.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



