Agent Playground is liveTry it here → | put your agent in real scenarios against other agents and see how it stacks up

Key Takeaway

An adaptive, rule-based concession strategy (anchor-and-resume) guarantees no offer retractions under live price updates while matching or exceeding a large language model on broker savings, using deterministic, low-cost computation.

What They Found

Mapping concession aggressiveness to the live price spread automatically sets the right posture per deal (quickly concede when margins are thin, hold firm when wide). When the pricing model changes mid-negotiation, an anchor-and-resume step shifts the concession curve without ever lowering a prior offer, so carriers never see retractions. Running the strategy as a deterministic engine while using a language model Tool Use Pattern only to format messages keeps outcomes reproducible, inexpensive, and auditable, yet yields savings comparable to a 20-billion-parameter language model in experiments.
Explore evaluation patternsSee how to apply these findings
Learn More

Data Highlights

1Zero retractions across 115,125 negotiations under dynamic pricing (three experiments covering rule-based, LLM broker, and LLM-carrier conditions).
2Broker savings: two-index strategy 0.690 vs unconstrained 20B LLM 0.642 (p < 0.001).
375.8% agreement when the two-index broker faced LLM-powered carriers (6,750 negotiations), with savings ≈ 0.707 and zero retractions.

Why It Matters

Engineers and product leads building negotiation agents or marketplaces: you get a deterministic strategy that adapts to live price updates, so you can scale without paying for expensive per-round model calls. Platform operators and compliance teams: audit-ready pricing decisions Multi-Agent Compliance & Audit and the ability to swap language models for message formatting improve reliability and reduce vendor lock-in.

Key Figures

Figure 1 . Carrier archetype concession curves. Dashed lines mark r min r_{\min} and r max r_{\max} . Annotations indicate walk-away zones for Hardliner (round 8) and Anchoring (round 9).
Fig 1: Figure 1 . Carrier archetype concession curves. Dashed lines mark r min r_{\min} and r max r_{\max} . Annotations indicate walk-away zones for Hardliner (round 8) and Anchoring (round 9).
Figure 2 . Rule-based evaluation: 105,000 negotiations (21,000 per strategy). (a) By carrier archetype, with an Overall column. (b) By spread regime. Black borders highlight the Two-Index row. N/A indicates zero agreements. In the rounds panels, green indicates faster convergence. Fixed- β \beta rows are flat across regimes; the Two-Index row adapts posture from Conceder (narrow) to Boulware (wide).
Fig 2: Figure 2 . Rule-based evaluation: 105,000 negotiations (21,000 per strategy). (a) By carrier archetype, with an Overall column. (b) By spread regime. Black borders highlight the Two-Index row. N/A indicates zero agreements. In the rounds panels, green indicates faster convergence. Fixed- β \beta rows are flat across regimes; the Two-Index row adapts posture from Conceder (narrow) to Boulware (wide).
Figure 3 . Sensitivity of the two-index strategy to calibration constant c c . (a) By spread regime across 12 S S values. (b) By carrier persona. Lower c c produces more Boulware behavior (higher savings, lower agreement); higher c c improves deal closure against adversarial carriers.
Fig 3: Figure 3 . Sensitivity of the two-index strategy to calibration constant c c . (a) By spread regime across 12 S S values. (b) By carrier persona. Lower c c produces more Boulware behavior (higher savings, lower agreement); higher c c improves deal closure against adversarial carriers.
Figure 4 . Two-Index ( c = 3 c=3 , green) vs. unconstrained LLM broker (purple) vs. Two-Index against LLM-powered carriers (red). (a) By carrier archetype, with Overall column. (b) By spread regime.
Fig 4: Figure 4 . Two-Index ( c = 3 c=3 , green) vs. unconstrained LLM broker (purple) vs. Two-Index against LLM-powered carriers (red). (a) By carrier archetype, with Overall column. (b) By spread regime.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Considerations

All experiments use synthetic loads informed by industry patterns, so real-world tuning (the single calibration constant) is needed before deployment. The current method handles a single numeric issue (rate); multi-issue deals require extending the approach. Unconstrained language models in production may behave differently than the experimental settings, so operator testing against live counterparties is recommended. Large Language Model

Deep Dive

Negotiations are driven by how much room exists between a broker's minimum acceptable rate and the market-driven target (the spread). Instead of choosing one fixed concession curve, derive the curve shape from the spread so narrow-margin loads adopt a fast-concede posture and wide-margin loads hold out. When the pricing model updates mid-negotiation, find the matching point on the new curve that is at least as generous as the last offer (anchor) and continue conceding from there (resume). That construction guarantees monotonic, never-decreasing offers even if prices shift repeatedly. The strategy runs as a deterministic formula (the strategy engine) Consensus-Based Decision Pattern while a language model only translates decisions into human-like messages. In three large experiments — a 105k-rule based sweep across spreads, a head-to-head versus an unconstrained 20B language model, and a robustness test against language-model carriers — the two-index method produced zero retractions, matched or beat the LLM on broker savings, and achieved high agreement rates (notably 75.8% versus LLM carriers). Practical benefits include negligible per-round inference cost, reproducible pricing outputs for audit, and the flexibility to replace the message-generation model without changing negotiation outcomes. Next steps include learning the calibration constant from historical outcomes and extending the method to multi-issue bargaining. LLM-as-Judge Pattern
Avoid common pitfallsLearn what failures to watch for
Learn More
Credibility Assessment:

No recognizable affiliations or author reputations, arXiv preprint with zero citations — fits UNKNOWN / low credibility signal.