How AI Players Learn Who to Trust — and Who to Betray

The Big Picture

When AI agents can remember past interactions they form stable, role-dependent reputations that change who gets picked for teams; high-reputation agents are chosen far more and smarter agents use more subtle deception.

ON THIS PAGE

The Evidence

Allowing language-based agents to keep cross-game memory produces repeatable reputation labels — the same player is described differently depending on whether they played as a friend or an enemy. Players with strong positive reputations are included on mission teams much more often, and agents explicitly reference past games when deciding who to trust. Increasing the agents' reasoning depth leads to more strategic deception: agents intentionally pass early missions to build trust and then sabotage later, a tactic common in skilled human play.

Data Highlights

1High-reputation players received 45.6% more team inclusions than lower-reputation players.

2The "sleeper agent" deception (passing early missions to build trust) appeared in 75% of medium/high-reasoning games versus 36% at low reasoning.

3In a 50-game run with five agents, one player (Charlie) was labeled “subtle” 38 times, showing a persistent reputation signal.

What This Means

Engineers building multi-agent systems and AI teams should care because reputation effects change which agents get trusted and delegated tasks, affecting overall system behavior. Technical leads evaluating agent-to-agent evaluation or agent monitoring can use these findings to design memory and tracking features that surface and mitigate unexpected social dynamics. See Multi-Agent Event Management for a use-case context.

Not sure where to start?Get personalized recommendations

Learn More

Key Figures

Figure 1 : Overview of our experimental setup showing emergent social dynamics and reputation. Five LLM agents (Alice, Bob, Charlie, Diana, Eve) play repeated Avalon games with randomized roles. After each game, agents generate self-reflections and observations about other players, which persist into subsequent games as memory. This cross-game memory enables reputation formation, with agents referencing past behavior in their strategic reasoning (example quotes shown on right).

Fig 1: Figure 1 : Overview of our experimental setup showing emergent social dynamics and reputation. Five LLM agents (Alice, Bob, Charlie, Diana, Eve) play repeated Avalon games with randomized roles. After each game, agents generate self-reflections and observations about other players, which persist into subsequent games as memory. This cross-game memory enables reputation formation, with agents referencing past behavior in their strategic reasoning (example quotes shown on right).

Figure 2 : Game flow in The Resistance: Avalon. Each mission round consists of four phases: (1) Discussion, where players share observations and suspicions; (2) Team Proposal, where the current leader selects team members; (3) Voting, where all players approve or reject the proposal; and (4) Mission Execution, where approved team members secretly choose success or fail. The cycle repeats until one team wins three missions.

Fig 2: Figure 2 : Game flow in The Resistance: Avalon. Each mission round consists of four phases: (1) Discussion, where players share observations and suspicions; (2) Team Proposal, where the current leader selects team members; (3) Voting, where all players approve or reject the proposal; and (4) Mission Execution, where approved team members secretly choose success or fail. The cycle repeats until one team wins three missions.

Fig 3: Figure 3 : Game viewer interface showing a 5-player game during Mission 3’s proposal phase.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Considerations

Results come from a simulated hidden-role game and may not directly generalize to other tasks or real-world human interactions. Outcomes depend on how memory and prompts are implemented, so different memory formats or models could yield different dynamics. Experiments used a fixed small group size and model family, so larger groups or other model architectures might show different patterns. For potential failure modes, see Mutual Validation Trap.

Methodology & More

Repeated interactions in a hidden-role game let language-model agents form and act on reputations. Five agents played many rounds of the game while keeping short reflections about each other between games; those reflections became a cross-game memory that the agents could cite during discussion and when selecting team members. The design also manipulated how deeply agents were prompted to reason so the study could see whether more sophisticated thought leads to different social strategies. In line with Consensus-Based Decision Pattern, memory-enabled agents developed stable, role-conditional impressions — the same individual could be called “subtle” when playing as an enemy but “straightforward” when playing as a teammate — and those impressions influenced behavior. High-reputation agents were picked for mission teams far more often (about 45.6% more), and higher reasoning levels produced more advanced deception tactics: agents often passed early missions to gain trust and then sabotaged later missions (seen in 75% of higher-reasoning runs vs 36% at low reasoning). The results suggest that simple memory plus language-based reasoning produces humanlike social dynamics, which has implications for designing agent tracking, reputation features, and safety monitoring in multi-agent deployments. For further discussion of potential failure modes, see Rogue Agent Behavior.

Avoid common pitfallsLearn what failures to watch for

Learn More

Credibility Assessment:

Solo author with very low h-index (1) and no affiliations, arXiv preprint with no citations — matches UNKNOWN / minimal credibility signal.

multi-agent trust agent track record agent-to-agent evaluation agent reliability

Not sure where to start?