The Big Picture
Training agents to send continuous internal representations instead of plain text makes their teamwork both more accurate and more stable—gains emerge with modest supervised tuning and a short, focused communication trace.
ON THIS PAGE
The Evidence
Using internal model states as a continuous communication channel lets gradients flow across agents, so the communication protocol itself can be learned together with reasoning. Emergence-Aware Monitoring Pattern Supervised fine-tuning of these latent messages produced large accuracy gains on hard reasoning benchmarks and reduced unstable decoding behavior compared with text-based or untrained latent channels. A small number of learned communication steps gives big wins, while very long latent traces add noise and hurt performance. Agent-to-Agent Protocol
Not sure where to start?Get personalized recommendations
Data Highlights
1+26.7% absolute improvement on AIME24 compared to baseline setups.
2Accuracy rose from 50.0% (no latent steps) to 76.7% with 10 learned latent steps, then fell to 63.3% at 40 steps, showing an optimal short trace length.
3Mean decoding perplexity dropped from 1.31 (untrained latent channel) to 1.24 with learned latent communication, indicating more stable generation.
What This Means
Engineers building multi-agent AI systems and orchestration layers will care because learned latent communication can raise hard-task accuracy and reduce unstable outputs with only modest fine-tuning. Technical leaders evaluating agent reliability should consider designs that expose and train internal state passing, rather than relying only on text messages or separate untrained channels. Orchestrator-Worker Pattern
Key Figures

Fig 1: Figure 1: In Stage I, agents 1 to K–1 sequentially construct a shared KV trace by prefilling the existing cache and appending newly generated KV segments without gradient updates. The accumulated KV trace serves as a latent communication medium across agents. In Stage II, the final agent performs autoregressive decoding on the prefilled KV cache. Cross-attention over the KV trace produces hidden states, which are projected through the LM head to generate tokens. Supervised fine-tuning is applied using cross-entropy loss, and gradients are backpropagated to update only the LoRA parameters of the final agent while keeping the backbone model frozen.
Ready to evaluate your AI agents?
Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.
Learn MoreConsiderations
The approach requires access to model internal key–value states, so it is not directly usable with closed-off APIs that hide those internals. Experiments used parameter-efficient adapters and small supervised datasets, so real-world gains may depend on available labeled interactions and model compatibility. Too many communication steps hurt performance, so protocol length must be tuned per task to avoid noisy traces. Insecure Trust Boundaries
Methodology & More
Treating inter-agent messages as continuous internal states (key–value cache segments) rather than serialized text lets the whole multi-agent workflow be optimized end-to-end. Upstream agents sequentially build a shared latent trace by appending internal KV segments; a final agent decodes conditioned on that trace. Supervised fine-tuning (using light adapters) trains the system over full multi-agent trajectories so communication and reasoning are learned together. Defense in Depth Pattern Across math, commonsense, and code tasks and several open-source models, learned latent communication improved accuracy and decoding stability versus single-model inference, text-based multi-agent setups, and untrained latent channels. Gains are largest with a compact number of communication steps (example: jump from 50.0% to 76.7% at 10 steps), and learned traces yield lower perplexity and fewer decoding outliers. Practically, the method shows that richer, trainable agent-to-agent signals can make multi-agent orchestration more reliable, but it requires architectural access to internal states and careful tuning of trace length and training data. Tool Use Pattern
Avoid common pitfallsLearn what failures to watch for
Credibility Assessment:
All authors have low h-indices, no affiliations specified, and it’s an arXiv preprint with no citations — signals point to emerging or limited-info researchers (2 stars).