A Self-Improving Team That Lets Game Agents Decide Twice as Fast

Key Takeaway

A three-agent loop that prunes and compresses game observations lets large language models make real-time strategy decisions much faster and more reliably—cutting input size by about 70% and halving response time while boosting win rates.

ON THIS PAGE

What They Found

Structural pruning of raw game observations using a graph-based entropy measure removes redundant information and reduces the LLM input load dramatically. A closed-loop of decision, evaluation, and policy agents (with memory and post-game reflection) three-agent structure suppresses random behavior from the language model and enables ongoing strategy improvement. On StarCraft II maps, this combination keeps average decision time under ~1 second and yields much higher win rates compared with prior LLM-based approaches.

Explore evaluation patternsSee how to apply these findings

Learn More

Data Highlights

1Observation pruning reduces input tokens by roughly 70%

2Decision response time reduced by over 50% compared to a leading LLM baseline

3Achieved up to 100% win rate across evaluated StarCraft II maps and difficulty levels

What This Means

Engineers building real-time AI agents and technical leads evaluating LLM-driven decision systems will find this useful because it shows how to use large language models without heavy retraining while meeting tight time budgets. Researchers exploring multi-agent coordination or memory-enhanced decision loops can reuse the three-agent structure and the structural-pruning idea for other high-dimensional environments. multi-agent coordination

Key Figures

Figure 1: The SEMA framework addresses two pivotal challenges for LLMs in RTS environments. First, the massive observational data leads to excessive input sequences, escalating reasoning latency and hindering real-time response. Second, the inherent stochastic bias of LLMs induces inconsistent decision logic, even in the exact same scenario, two distinct decisions can yield diametrically opposed outcomes, such as a shift from victory to defeat. This volatility severely undermines the robustness of agents in complex adversarial settings.

Fig 1: Figure 1: The SEMA framework addresses two pivotal challenges for LLMs in RTS environments. First, the massive observational data leads to excessive input sequences, escalating reasoning latency and hindering real-time response. Second, the inherent stochastic bias of LLMs induces inconsistent decision logic, even in the exact same scenario, two distinct decisions can yield diametrically opposed outcomes, such as a shift from victory to defeat. This volatility severely undermines the robustness of agents in complex adversarial settings.

Figure 2: Overview of SEMA. First, structural modeling and dynamic pruning are employed to extract core observations, reducing reasoning latency. Second, decision and evaluation agents perform closed-loop calibration via history retrieval to suppress stochastic bias. Finally, the policy agent analyzes episode performance and updates experience, driving the continuous self-evolution of strategic logic.

Fig 2: Figure 2: Overview of SEMA. First, structural modeling and dynamic pruning are employed to extract core observations, reducing reasoning latency. Second, decision and evaluation agents perform closed-loop calibration via history retrieval to suppress stochastic bias. Finally, the policy agent analyzes episode performance and updates experience, driving the continuous self-evolution of strategic logic.

Fig 3: (a) 3m

Fig 4: (a) 3m

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

Results were produced using a specific large foundation model and were evaluated against the built-in StarCraft II AI rather than human players, so performance may vary in other settings. The pruning step trades off reduced input size for risk of discarding rare but critical signals; edge cases were not deeply analyzed. The framework adds system complexity (topological modeling, memory pools, agent orchestration) that requires engineering effort and compute at runtime. guardrails

Deep Dive

SEMA tackles the two core obstacles to using large language models for fast game decisions: overwhelming high-dimensional observations and stochastic, inconsistent outputs. It first models the game-state as a weighted graph and uses structural entropy to rank and prune observations, keeping core semantic elements while cutting redundant tokens. At decision time, three collaborating agents run: a decision agent proposes actions, an evaluation agent retrieves past trajectories and corrects likely errors, and a policy (experience) agent records outcomes and summarizes episodes. A nested feedback loop applies step-level corrections during play and episode-level reflection afterward, so strategies evolve without fine-tuning the base model. Reflection Pattern In experiments across eight StarCraft II maps (50 randomized trials per map against built-in AI), the approach reduced LLM input tokens by about 70% and cut decision latency by over half, keeping average move time under about one second. The system also reported strong competitive results, with up to 100% win rates on some maps. Practically, SEMA shows a path to deploying LLM-based reasoning in tight real-time settings by combining targeted data compression with multi-agent consistency checks and a memory-driven learning loop. Human-in-the-Loop Pattern Limitations include reliance on a large foundation model, evaluation against scripted opponents, and potential loss of rare signals during pruning—areas to probe in follow-up work.

Avoid common pitfallsLearn what failures to watch for

Learn More

Credibility Assessment:

All authors have very low h-indices, no notable affiliations listed, and it's only an arXiv preprint with no citations — fits an emerging/limited-info profile.

multi-agent orchestration continuous agent evaluation agent reliability

Not sure where to start?