Agents That Remember: How AI Learns Over Time and Gives New Agents a Head Start

Key Takeaway

Agents that persistently record and reuse interaction experience get substantially better over time, and that stored experience can be handed to a fresh agent to give it an immediate performance boost.

ON THIS PAGE

The Evidence

A runtime that treats collaboration, identity, and lifelong learning as first-class features lets agents accumulate usable experience that improves their task success. On engineering and diagnostics benchmarks, agents with Synergy’s experience store climb steadily in accuracy across repeated epochs, with most gains realized early. Transferring accumulated experience to a fresh agent produces immediate, broad improvements across domains rather than narrow, one-off wins. The architecture also provides session-native collaboration surfaces and typed long-term memory to support persistent identity and accountable delegation. Semantic Capability Matching Pattern

Data Highlights

1On the SWE-bench Verified tasks, Qwen 3.5 rose from 63.0% to 82.6% accuracy (+19.6 percentage points, +31.1% relative gain).

2On OpenRCA diagnostics, Qwen 3.5 improved from 11.94% to 29.6% accuracy (+17.7 percentage points, +148.1% relative gain).

3Transferred experience boosted OneMillion benchmark domains by +22.1 to +32.7 percentage points (law to healthcare), showing broad domain gains from experience injection.

What This Means

Engineers building long-running or collaborative agents should care because persistent experience stores can materially raise success rates and reduce the need to retrain from scratch. Product and platform leads should care because agent identity, delegation, and cross-agent experience transfer change how you think about cost, governance, and onboarding new agent instances. A2A Protocol Pattern

Explore evaluation patternsSee how to apply these findings

Learn More

Key Figures

Fig 1: Figure 1: Overall architecture of Synergy.

Figure 2: Collaboration and execution lifecycle in Synergy. A complex task begins in a primary session, branches through Cortex-managed child sessions, moves across mailbox-mediated asynchronous delivery and repository-backed collaborative surfaces, and returns to the originating session as traceable outputs. The figure emphasizes that Synergy’s collaboration model is not only message passing, but bounded execution that can branch, delegate, re-incorporate results, and extend into shared workspaces and remote environments without losing accountability.

Fig 2: Figure 2: Collaboration and execution lifecycle in Synergy. A complex task begins in a primary session, branches through Cortex-managed child sessions, moves across mailbox-mediated asynchronous delivery and repository-backed collaborative surfaces, and returns to the originating session as traceable outputs. The figure emphasizes that Synergy’s collaboration model is not only message passing, but bounded execution that can branch, delegate, re-incorporate results, and extend into shared workspaces and remote environments without losing accountability.

Figure 3: Experience learning loop in Synergy. Past experiences are actively retrieved and injected into the current task context, after which the resulting trajectory is evaluated using either explicit benchmark feedback or dialogue-derived reward from subsequent interaction. The resulting multi-dimensional reward is then used to update the reused experiences through delayed credit assignment, so that future recall becomes increasingly value-aware and the accumulated experience store becomes a reusable, partially transferable capability asset.

Fig 3: Figure 3: Experience learning loop in Synergy. Past experiences are actively retrieved and injected into the current task context, after which the resulting trajectory is evaluated using either explicit benchmark feedback or dialogue-derived reward from subsequent interaction. The resulting multi-dimensional reward is then used to update the reused experiences through delayed credit assignment, so that future recall becomes increasingly value-aware and the accumulated experience store becomes a reusable, partially transferable capability asset.

Figure 4: Capability growth under experience accumulation. Panels (a) and (b) show full performance trajectories on SWE-bench Verified and OpenRCA, making visible both the steady upward movement over epochs and the concentration of gains in the early stages of accumulation. Panel (c) summarizes final gains over the starting point of each accumulated-experience run, highlighting that the resulting improvements are substantial in both absolute and relative terms.

Fig 4: Figure 4: Capability growth under experience accumulation. Panels (a) and (b) show full performance trajectories on SWE-bench Verified and OpenRCA, making visible both the steady upward movement over epochs and the concentration of gains in the early stages of accumulation. Panel (c) summarizes final gains over the starting point of each accumulated-experience run, highlighting that the resulting improvements are substantial in both absolute and relative terms.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

Results come from structured benchmarks and controlled transfer experiments, not from open-ended real-world deployments with non-stationary tasks. Collaboration and identity features are described and architected but not validated with longitudinal user studies measuring perceived continuity or attachment. Security, governance, and cross-domain transfer remain open problems—experience transfer was shown within benchmarks, not across arbitrary domains, and the risk of credential theft or agent spoofing requires host-level defenses. Byzantine-Resilient Consensus Pattern

The Details

Synergy is an agent runtime designed so agents behave like persistent participants on the web rather than disposable tools. Key building blocks include a scope-attached server model that scopes runtimes to concrete workspaces, session-native execution capsules that store prompts, plans, and intermediate state, mailbox-mediated asynchronous delivery for accountable delegation, and repository-backed shared surfaces for persistent collaboration. Identity is implemented as typed long-term memory (profile, notes, agenda, skills) rather than a single memory buffer, and lifelong evolution is operationalized through an experience store that encodes interaction trajectories, scripts, inferred intent, and multi-dimensional rewards. Hierarchical Multi-Agent Pattern Experiments measured two main questions: whether agents improve as they accumulate experience, and whether that experience can be transferred to give fresh agents a head start. On SWE-bench and OpenRCA, accuracy climbed monotonically across epochs, with 70%+ of gains appearing within the first five epochs—indicating fast, front-loaded learning. Experience transfer on the OneMillion benchmark produced immediate, domain-wide improvements (gains of +22.1 to +32.7 percentage points across domains). The broader implication is that treating collaboration, continuity, and adaptation as architecture-level problems makes persistent, socially legible agents feasible—but also raises new governance, resource, and safety questions about how agents are authenticated, budgeted, and constrained in open networks. Dynamic Task Routing Pattern Chain of Thought Pattern

Explore evaluation patternsSee how to apply these findings

Learn More

Credibility Assessment:

All authors have very low reported h-indices, affiliations not specified, and venue is arXiv with no citations — limited information suggests emerging credibility.

multi-agent trust agent-to-agent evaluation agent delegation agent identity

Not sure where to start?