Letting One AI Share Its Mind With Another — Faster, More Accurate Collaboration

The Big Picture

Dense alignment lets one agent send its internal memory (attention caches) directly to a different agent so the receiver can solve tasks without re-reading the prompt, cutting compute 2–3× while keeping or improving accuracy.

ON THIS PAGE

Key Findings

Latent communication has two roles: a small subset of internal signals can steer reasoning when the receiver already sees the input, but copying the original input itself requires transmitting a much larger, denser chunk of the sender’s internal memory. Learning a lightweight adapter that aligns key-value memory across different model architectures preserves both reasoning and context, outperforming prior sparse methods. The dense adapter runs 2–3× less compute than sending decoded text in context-aware settings and remains accurate when the receiver has no access to the original prompt — a scenario where previous methods fail. A2A Protocol Pattern

Avoid common pitfallsLearn what failures to watch for

Learn More

Data Highlights

1Context-unaware transfer needs most of the cache: in a 288-group test, accuracy stays near chance until >150 groups are kept and reaches ~87% when 250 groups are preserved.

2Dense alignment runs about 2–3× lower compute than text-based message passing in context-aware tasks while matching or exceeding text accuracy.

3Method validated across six sender→receiver directions spanning models from ~4 billion to ~14 billion parameters, consistently beating sparse heterogeneous baselines.

What This Means

Engineers building multi-agent systems that need efficient handoffs (planners, retrievers, executors, verifiers) can use dense alignment to avoid costly text decoding and speed up inference. Technical leads evaluating trade-offs between compute and fidelity will find this useful: it reduces redundant work while preserving the full contextual content when needed. Planning Pattern

Key Figures

Figure 1: See what I see, know what I think. We study real latent mind reading across heterogeneous agents, in which one agent can read both what another agent sees and what it thinks. Guided by our latent communication information structure analysis, we learn dense alignment between agents and evaluate context-aware and context-unaware settings. Dense alignment is accurate and efficient in both regimes, surpassing sparse-steering heterogeneous baselines (cache-to-cache) while using less compute than text communication.

Fig 1: Figure 1: See what I see, know what I think. We study real latent mind reading across heterogeneous agents, in which one agent can read both what another agent sees and what it thinks. Guided by our latent communication information structure analysis, we learn dense alignment between agents and evaluate context-aware and context-unaware settings. Dense alignment is accurate and efficient in both regimes, surpassing sparse-steering heterogeneous baselines (cache-to-cache) while using less compute than text communication.

Figure 2: Sparse vs. dense heterogeneous alignment. Prior sparse methods partially align reasoning (mainly in context-aware transfer) and do not preserve dense context. Our dense alignment maps sender caches into receiver-compatible caches to support both robust reasoning and dense context transfer across context-aware and context-unaware regimes.

Fig 2: Figure 2: Sparse vs. dense heterogeneous alignment. Prior sparse methods partially align reasoning (mainly in context-aware transfer) and do not preserve dense context. Our dense alignment maps sender caches into receiver-compatible caches to support both robust reasoning and dense context transfer across context-aware and context-unaware regimes.

Figure 4: Compressed-sensing head selection: random ablation masks estimate sender-head importance, which is aggregated to KV-group scores and used to keep top- K K groups for communication.

Fig 3: Figure 4: Compressed-sensing head selection: random ablation masks estimate sender-head importance, which is aggregated to KV-group scores and used to keep top- K K groups for communication.

$Figure 5: Sparse reasoning signal vs. dense context signal (Qwen3-4B self-communication). Accuracy is plotted against the number of KV groups kept ( K K of 288 288 ). Solid blue : context-aware CS filtering, where the receiver still sees the input; K = 0 K=0 is the single-agent receiver baseline. Open blue squares : random KV-group selection. Solid red : context-unaware CS filtering, where the receiver relies only on transmitted KV caches. Context-aware communication reaches near-ceiling accuracy with few KV groups, suggesting a sparse reasoning signal. In contrast, context-unaware communication requires dense context transfer, staying near chance until K > 150 K>150 and approaching the ceiling only at K = 250 K=250 ( 87 % 87\% of the cache).$

Fig 4: Figure 5: Sparse reasoning signal vs. dense context signal (Qwen3-4B self-communication). Accuracy is plotted against the number of KV groups kept ( K K of 288 288 ). Solid blue : context-aware CS filtering, where the receiver still sees the input; K = 0 K=0 is the single-agent receiver baseline. Open blue squares : random KV-group selection. Solid red : context-unaware CS filtering, where the receiver relies only on transmitted KV caches. Context-aware communication reaches near-ceiling accuracy with few KV groups, suggesting a sparse reasoning signal. In contrast, context-unaware communication requires dense context transfer, staying near chance until K > 150 K>150 and approaching the ceiling only at K = 250 K=250 ( 87 % 87\% of the cache).

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Considerations

The adapter is trained per sender–receiver pair today, so scaling to many agents or open-set pairings will need shared or transfer-learned adapters. Experiments focus on math and reasoning benchmarks and a limited range of model sizes, so behavior on broader tasks and very large models is untested. Sending dense internal memory can expose prompt content, raising privacy and leakage concerns that need policy or technical controls. Guardrails Pattern

Deep Dive

Latent communication swaps text messages for internal model states (key-value attention caches), which avoids autoregressive decoding and re-encoding overhead. A compressed-sensing analysis shows a clear split: if the receiver already has the input, only a sparse set of attention groups are needed to steer reasoning; if the receiver has no input, the transmitted cache must preserve dense contextual knowledge across most groups. That insight explains why prior work focusing on context-aware settings could use sparse signals but would fail when the receiver must rely solely on the sender’s memory. Semantic Capability Matching Pattern The solution is a learned dense alignment: a small adapter that maps a sender’s cache into a receiver-compatible cache using positional disentanglement, per-head transformations and gating, and a two-stage training schedule (first reconstruct the receiver’s native cache, then fine-tune for generation). This preserves both the sender’s ‘‘what it saw’’ and ‘‘how it thinks.’' Experiments show the adapter beats previous sparse heterogeneous baselines, matches or exceeds text-based handoffs while using 2–3× less compute in context-aware setups, and remains accurate in the tougher context-unaware regime where baselines collapse. Practical next steps are shared adapters for many-to-many agent networks and safeguards for leaked prompt content. Evaluation-Driven Development (EDDOps)

Need expert guidance?We can help implement this

Learn More

Credibility Assessment:

Mixed evidence: some recognizable researchers (e.g., Valts Blukis h≈13, Rene Vidal/Stan Birchfield are known) but most authors have modest h-indices and it's an arXiv preprint with no affiliations listed.

multi-agent trust agent-to-agent evaluation agent reliability multi-agent orchestration

Not sure where to start?