Make Language Models Pick Strategies That Can’t Be Easily Exploited

Key Takeaway

Iteratively having multiple language-model agents respond to the average of others’ past choices produces strategies that are stronger on average and more robust to adaptive attackers than single-shot or naive multi-round methods.

ON THIS PAGE

Key Findings

Decomposing each stakeholder into its own language-model agent and running an iterative loop where each agent best-responds to the empirical mix of others’ past strategies untangles mutually dependent viewpoints and leads to better decisions. Tested across 13 strategic scenarios (games and negotiations), the method improved both average payoff (tournament strength) and worst-case payoff against adaptive attackers (robustness). The benefits are largest in settings with hidden information, randomness, or where mixed strategies are needed — precisely the real-world conditions where single-shot approaches fail.

Data Highlights

1Evaluated on 13 diverse strategic scenarios spanning competitive games and negotiations.

2Each matchup used 16 paired matches with seat exchange and results averaged over 8 random seeds for reliability.

3MAFP (multi-agent fictitious play) achieved the top scores on both tournament strength and robustness among compared baselines, with the largest gains in imperfect-information and stochastic scenarios.

What This Means

Engineers building multi-agent systems and decision agents — because this gives a practical way to produce strategies that resist exploitation. Technical leaders evaluating agent reliability and robustness — because the method offers a measurable lift in both average and worst-case performance. Researchers in agent evaluation and negotiation — because it connects game-theory equilibrium methods to language-based agent design.

Not sure where to start?Get personalized recommendations

Learn More

Key Figures

Figure 1: Existing MAS address execution complexity, as in software engineering or research (left), by dividing a task into subtasks across cooperative agents. In contrast, MAFP targets stance entanglement, as in competitive market or strategic games (right), where stakeholders’ decisions are mutually dependent: it decomposes these entangled stances into agents and derives decisions through fictitious play.

Fig 1: Figure 1: Existing MAS address execution complexity, as in software engineering or research (left), by dividing a task into subtasks across cooperative agents. In contrast, MAFP targets stance entanglement, as in competitive market or strategic games (right), where stakeholders’ decisions are mutually dependent: it decomposes these entangled stances into agents and derives decisions through fictitious play.

Figure 2: Illustration of MAFP algorithm. Fictitious play in game theory finds equilibrium through an iteratively convergent process in which each player best responds to the empirical average of others’ past actions, here converging to the Nash equilibrium of rock–paper–scissors. Inspired by this, multi-agent fictitious play (MAFP) decomposes stances into agents and finds policies through multi-agent co-evolution: at each round, agents update decisions by best-responding to the empirical mixture of others’ past decisions.

Fig 2: Figure 2: Illustration of MAFP algorithm. Fictitious play in game theory finds equilibrium through an iteratively convergent process in which each player best responds to the empirical average of others’ past actions, here converging to the Nash equilibrium of rock–paper–scissors. Inspired by this, multi-agent fictitious play (MAFP) decomposes stances into agents and finds policies through multi-agent co-evolution: at each round, agents update decisions by best-responding to the empirical mixture of others’ past decisions.

Figure 3: Per-iteration quality of policies produced by each iterative method. For each method, we run an internal tournament among its four iterations and report each iteration’s average utility against the other three. The shaded band shows the standard error of the mean.

Fig 3: Figure 3: Per-iteration quality of policies produced by each iterative method. For each method, we run an internal tournament among its four iterations and report each iteration’s average utility against the other three. The shaded band shows the standard error of the mean.

Fig 4: Figure 4: Target-profile utility under adversarial evolution during robustness evaluation. Each curve shows a method’s per-iteration utility against an evolving attacker, averaged across scenarios. The star marks each method’s worst-case round. Shaded band shows the standard error of the mean.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

Experiments were limited to 13 scenarios and a single action model, so results may shift on much larger, real-world markets or with different underlying language models. The method needs repeated multi-agent rollouts, increasing compute compared with single-shot approaches. Theoretical questions remain about when and how the iterative process converges in natural-language strategy space and which equilibrium it selects when multiple equilibria exist.

Full Analysis

MAFP turns a hard decision problem — where each stakeholder’s best choice depends on everyone else’s — into an iterative simulation . Each stakeholder is represented by a language-model agent with a described role, goals, and payoffs. At each round, every agent forms an empirical mixture (a summary of others’ past strategies) and then generates a best-response strategy against that mixture. Repeat for several rounds and output the empirical mixture as the final decision profile. Two operators implement this in natural language: aggregation (to summarize past choices) and best-response (to generate a strategy maximizing payoff against that summary). The method was evaluated on 13 scenarios covering strategic games and natural-language negotiations. Matches were run 16 times per pairing with seat swaps and averaged across 8 random seeds, using a single action model to execute strategies. MAFP outperformed single-round and naive multi-round baselines on two complementary metrics: tournament strength (average payoff against a field of opponents) and robustness (worst-case payoff against an adaptive adversary). Gains were most pronounced in settings with hidden information, stochastic transitions, or where optimal solutions require randomised (mixed) strategies. Practically, MAFP offers a training-free, simulation-style way to produce more resilient and less-exploitable language-based strategies, useful for agent evaluation, negotiation automation, and multi-agent decision systems.

Explore evaluation patternsSee how to apply these findings

Learn More

Credibility Assessment:

ArXiv paper; one author with low h-index (3) and no strong affiliation signals.

multi-agent trust agent-to-agent evaluation agent reliability multi-agent orchestration

Not sure where to start?