Agent Playground is liveTry it here → | put your agent in real scenarios against other agents and see how it stacks up

The Big Picture

A simple protocol lets each AI agent accept only the parts of a peer’s message it trusts, record where those accepted facts came from, and resume that filtered memory across restarts—preventing echo chambers and lost context.

The Evidence

A seven-field message format plus a per-field evaluation gate, a lineage trail, and write-time filtering give multiple agents a way to share evaluated, resumable memories rather than raw chat logs. With those primitives, each agent stores its own role-filtered understanding (not the unvetted message), can trace whether a returning claim is original or replayed, and resumes work across restarts without replaying full history. The protocol is implemented as a Claude-compatible plugin, used in a consumer app for mood-aware playback, and exercised in a 14-wave production sprint with three role-specialised agents.
Not sure where to start?Get personalized recommendations
Learn More

Data Highlights

1237,000 training samples used to train the per-field evaluator (SVAF); unsupervised training flagged the 'mood' field as highest-weight.
2A 14-wave production sprint ran across 3 role-specialised agent sessions to validate joint behavior and persistent, write-time filtered memory.
3Prior analysis motivating the work catalogued 14 distinct failure modes from over 1,600 annotated multi-agent traces, showing a practical need for a protocol-level fix.

What This Means

Engineers building multi-agent systems: adopt per-field admission and lineage to avoid replayed claims and drift. Platform and product leaders: use the protocol to give teams observable, auditable agent decisions and better long-running coordination. Researchers: the primitives give a concrete starting point for cross-agent memory, provenance, and trust experiments.

Key Figures

Figure 1: MMP’s 8-layer architecture. Layers 0–3 (Protocol Infrastructure) carry identity, transport, connection, and memory. Layers 4–7 (Mesh Cognition) carry coupling (SVAF), synthetic memory, xMesh (per-agent Liquid Neural Network), and application (where agents reason on the remix subgraph). The semantic-infrastructure contribution of this paper — CAT7 (§3.1), SVAF (§3.2), lineage (§3.3), remix (§3.4) — operates at Layers 3 and 4, with CMBs held in L3 Memory and SVAF evaluation performed in L4 Coupling.
Fig 1: Figure 1: MMP’s 8-layer architecture. Layers 0–3 (Protocol Infrastructure) carry identity, transport, connection, and memory. Layers 4–7 (Mesh Cognition) carry coupling (SVAF), synthetic memory, xMesh (per-agent Liquid Neural Network), and application (where agents reason on the remix subgraph). The semantic-infrastructure contribution of this paper — CAT7 (§3.1), SVAF (§3.2), lineage (§3.3), remix (§3.4) — operates at Layers 3 and 4, with CMBs held in L3 Memory and SVAF evaluation performed in L4 Coupling.
Figure 2: MMP mesh topology across three Claude Code sessions on two machines — claude-code-mac (macOS, CTO role), claude-strategic-win and claude-research-win (Windows, COO and CMO roles). Each session runs a sym-mesh-channel MCP server as a distinct mesh peer with its own identity and meshmem (MMP §2.4: agent-to-agent, not device-to-device). Transport between machines is Bonjour mDNS on LAN with optional WebSocket relay for WAN. The in-flight CAT7 CMB shown is the one captured in Listing 1 — claude-research-win emitting via sym_observe , claude-code-mac receiving, running neural SVAF evaluation ( decision='aligned' ), and storing the remix with source set to the fusion of both peer node names and populated lineage.
Fig 2: Figure 2: MMP mesh topology across three Claude Code sessions on two machines — claude-code-mac (macOS, CTO role), claude-strategic-win and claude-research-win (Windows, COO and CMO roles). Each session runs a sym-mesh-channel MCP server as a distinct mesh peer with its own identity and meshmem (MMP §2.4: agent-to-agent, not device-to-device). Transport between machines is Bonjour mDNS on LAN with optional WebSocket relay for WAN. The in-flight CAT7 CMB shown is the one captured in Listing 1 — claude-research-win emitting via sym_observe , claude-code-mac receiving, running neural SVAF evaluation ( decision='aligned' ), and storing the remix with source set to the fusion of both peer node names and populated lineage.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Considerations

Evidence comes from reference deployments and a single production sprint rather than wide benchmarks; the paper reports behavior and implementability, not standardized performance gains. Current implementations are Claude-native and used the same underlying model with role-weight tuning; cross-provider interoperability remains open work. Adopting the protocol requires agents to expose role-specific evaluation anchors and to persist remixed state, which adds design and operational overhead.

Methodology & More

The Mesh Memory Protocol defines a lightweight semantic layer for agent-to-agent communication so AI teams stop exchanging raw messages and start exchanging vetted, resumable facts. Every exchange is a Cognitive Memory Block split into a fixed seven-field header (focus, issue, intent, motivation, commitment, perspective, mood) and an optional task-specific body. A per-field neural evaluation (SVAF) runs on the header at receipt; each agent accepts or rejects individual fields according to its role, then writes only its accepted remix into local persistent memory. Each accepted entry includes lineage pointers so agents can trace whether a claim originated with a peer, themselves, or a fusion of multiple sources. The paper ships a reference implementation: a Claude-compatible plugin that passed directory review, a consumer iOS agent that reacts to the mood field for music playback, and a 14-wave training-data sprint run by three role-specialised agent sessions to observe real coordination patterns. SVAF was trained on 237K samples and discovered that mood was a high-weight signal for cross-domain coupling. The protocol prevents echo loops by storing signal-level provenance, ensures each agent’s restarted context contains its own evaluated knowledge (not raw history), and makes audit and trust signals explicit. Limits include narrow deployment scope so far, the need for per-agent SVAF tuning, and unanswered questions about cross-provider adoption and large-scale benchmarks.
Avoid common pitfallsLearn what failures to watch for
Learn More
Credibility Assessment:

Single-author paper from University of Windsor with low h-index and arXiv venue; recognized university but limited signals of broad research impact.