EvaluationProduction Ready

on-policy

Name: on-policy
Rating: 3.2 (1884 reviews)
Author: marlbenchmark

by marlbenchmark

Official MAPPO implementation for benchmarking cooperative multi-agent policies

Python

Updated Jul 18, 2024

1.9k

Stars

366

Forks

View on GitHub

Summary

Implements Multi-Agent PPO (MAPPO) for training and benchmarking cooperative multi-agent policies. Provides the official algorithm implementation with training loops, environment wrappers, and evaluation scripts for common MARL testbeds like SMAC and Hanabi. Includes reproducible configs and checkpoints to compare MAPPO performance across environments and research baselines. Open Agent Specification (Agent Spec)

The Value Proposition

As multi-agent systems proliferate, consistent evaluation is essential to judge coordination, robustness, and failure modes. MAPPO offers a standardized policy-gradient baseline for comparing cooperative behaviors and emergent failures across environments. For agent-to-agent evaluation and agent track record building, reliable MARL benchmarks like this let teams quantify how policy changes affect interaction quality and reliability. Model Context Protocol (MCP) and Agent Protocol provide shared frameworks to reason about interactions and coordination.

Ideal For

Researchers and engineers benchmarking cooperative multi-agent algorithms or validating agent policies on SMAC, Hanabi, and StarCraft II scenarios. Agent-to-Agent Protocol (A2A)

Use Cases

Benchmark MAPPO on SMAC or Hanabi to compare cooperative policies
Validate multi-agent coordination and failure modes before deployment
Generate reproducible training runs and checkpoints for agent-to-agent evaluation pipelines

See related protocols

Standards this tool supports