How to Predict What an Unknown AI Agent Will Do Next

The Big Picture

A small frozen language model used as an observer, combined with a supervised table model that adapts from a few past games, predicts an unfamiliar agent’s accept/reject choices and next offers more accurately than directly prompting a large model.

ON THIS PAGE

The Evidence

Model each decision as a single table row that mixes structured game variables, a dialogue embedding, and a hidden-state snapshot from a small frozen language model (the Observer). Training a supervised tabular predictor on a large source population plus K prior games of the target agent yields better transfer to new, engineered agents than asking a large model to predict directly. The Observer’s hidden states add information beyond standard dialogue embeddings, boosting accept/reject prediction and reducing offer-prediction error, even when no target-specific games are available. agent-to-agent evaluation framework.

Not sure where to start?Get personalized recommendations

Learn More

Data Highlights

1At K=16, adding Observer hidden states increased response-prediction AUC by about +4.0 percentage points in bargaining and +4.9 percentage points in negotiation versus game+text features (≈+6 pp over direct prompting).

2Observer features reduced offer-regression error by about 14% in bargaining (K=16) compared with the tabular baseline without Observer states.

3Observer gains appear even at K=0: frozen LLM hidden states improve AUC without any prior target-specific games, showing value in zero-shot transfer.

What This Means

Engineers building or monitoring AI agents will get a practical recipe for predicting a new counterpart’s next move using only a few prior interactions. Product and reliability leads can use this to build pre-deployment tests, agent reputation signals, and continuous monitoring that combine population-level behavior with a target’s short track record.

Key Figures

Figure 1 : Alice (seller) and Bob (buyer) negotiate via free-text offers. Following Bob’s $5,000 round-4 offer, Alice’s next move is the prediction target. (a) Response prediction (classification): will she accept? (b) Proposal prediction (regression): if she rejects, what will she propose?

Fig 1: Figure 1 : Alice (seller) and Bob (buyer) negotiate via free-text offers. Following Bob’s $5,000 round-4 offer, Alice’s next move is the prediction target. (a) Response prediction (classification): will she accept? (b) Proposal prediction (regression): if she rejects, what will she propose?

Figure 2 : Three approaches for predicting decisions of a target agent. (A) LLM-as-Predictor receives the decision-time state, dialogue, and K K observed target games, and directly outputs the decision. (B) Textual-tabular prediction represents each decision point as a row of game features and dialogue. (C) Our method augments this row with Observer hidden-state representations from a frozen LLM.

Fig 2: Figure 2 : Three approaches for predicting decisions of a target agent. (A) LLM-as-Predictor receives the decision-time state, dialogue, and K K observed target games, and directly outputs the decision. (B) Textual-tabular prediction represents each decision point as a row of game features and dialogue. (C) Our method augments this row with Observer hidden-state representations from a frozen LLM.

$Figure 3 : Observer gain over the game+text features baseline by relative depth. Observer gains are stable across mid-to-late layers (relative depth 0.6 0.6 – 0.9 0.9 ) ( Left : Response, Δ \Delta AUC; Right : Proposal, Δ R 2 \Delta R^{2} ). Rows: bargaining (top), negotiation (bottom); columns: K-shot examples.$

Fig 3: Figure 3 : Observer gain over the game+text features baseline by relative depth. Observer gains are stable across mid-to-late layers (relative depth 0.6 0.6 – 0.9 0.9 ) ( Left : Response, Δ \Delta AUC; Right : Proposal, Δ R 2 \Delta R^{2} ). Rows: bargaining (top), negotiation (bottom); columns: K-shot examples.

Figure 4 : Schematic of the multimodal tabular row at a single decision point. The row concatenates the three feature modalities of Section 4 : game-state features (red), the dialogue representation produced by the sentence encoder (blue), and the Observer hidden-state representation of the current decision-time state (purple). Game-state features are divided into configuration-level situation features (e.g., game horizon, product valuation) and per-round entries summarizing the last few rounds and the current offer; the dialogue representation contributes per-round textual entries. Cell counts are illustrative; actual modality dimensions and game-feature columns differ by game family (bargaining vs. negotiation).

Fig 4: Figure 4 : Schematic of the multimodal tabular row at a single decision point. The row concatenates the three feature modalities of Section 4 : game-state features (red), the dialogue representation produced by the sentence encoder (blue), and the Observer hidden-state representation of the current decision-time state (purple). Game-state features are divided into configuration-level situation features (e.g., game horizon, product valuation) and per-round entries summarizing the last few rounds and the current offer; the dialogue representation contributes per-round textual entries. Cell counts are illustrative; actual modality dimensions and game-feature columns differ by game family (bargaining vs. negotiation).

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

Results come from controlled bargaining and negotiation games, not live marketplaces, so real-world performance may vary. The approach relies on having a relevant labeled source population to learn from; without it, transfer will be weaker. The Observer helps more for accept/reject prediction and for bargaining-style tasks; in some negotiation settings the structured game state alone already predicts proposals well. Model Context Protocol (MCP) Pattern

Methodology & More

Represent each decision point as a single text-tabular row that combines (1) structured game-state features (round, current offer, public configuration), (2) a generic dialogue embedding for recent messages, and (3) a decision-oriented hidden-state vector from a small frozen language model (the Observer). Train a supervised tabular foundation model on thousands of such rows drawn from a broad source population, and adapt to a new target by including K prior games from that target in the same training set. At test time, only public state and dialogue are available — no access to the target’s prompt or internal logic. This setup transfers better across agent populations than asking a large model to directly predict moves from examples. Across experiments transferring from a 13-agent controlled tournament to 91 hackathon-built agents, the Observer hidden states consistently improved accept/reject prediction AUC by ~4–5 percentage points and cut offer prediction error (regression) by ~14% in bargaining. The hidden-state vectors provide richer situational signal than the Observer’s own generated answers, and once included, generic sentence embeddings add little extra value. Practically, the method offers a scalable way to build agent-to-agent evaluation, short-track reputation, and pre-production checks by separating representation (use small language models to encode dialogue) from adaptation and decision (use supervised tabular learning). market-based coordination Blackboard Pattern

Avoid common pitfallsLearn what failures to watch for

Learn More

Credibility Assessment:

All authors have low h-indices and no noted affiliations or strong venue (arXiv), indicating limited established reputation.

multi-agent trust agent-to-agent evaluation agent reliability agent track record

Not sure where to start?