Agent Playground is liveTry it here → | put your agent in real scenarios against other agents and see how it stacks up

The Big Picture

Most systems called “agents” today are engineered pipelines; true agency requires a model that holds persistent goals, an evolving self-model, internal planning, and a learned decision regulator — summed up in the GIC architecture.

The Evidence

Current “agents” rely heavily on external scaffolding (prompts, tools, fixed workflows) and therefore struggle with long-term, autonomous tasks. Genuine agency needs five core abilities: persistent hierarchical goals, an evolving identity, separate simulation of the world, self-regulation about when to deliberate, and self-directed learning. The GIC (Goal–Identity–Configurator) design stitches those pieces together: hierarchical goal decomposition, an internal identity that updates over time, simulative planning via a separate world model, and a learned configurator that decides when to plan versus act. Keeping the agent model separate from the world model prevents conflating action choice with state prediction and makes training and failure analysis clearer. five core abilities.
Not sure where to start?Get personalized recommendations
Learn More

Data Highlights

15 core agency elements identified: persistent goals, evolving identity, decision-making, self-regulation, and self-directed learning.
23 decision layers in the GIC design: System I (reactive actions), System II (simulative planning with a world model), and System III (a learned configurator to decide when to invoke planning).
32 distinct learned models recommended: an agent model (decides actions) kept separate from a world model (predicts consequences) to avoid conflicting objectives during training.

What This Means

Engineers building AI agents who want systems to operate across long time horizons and changing situations will find the GIC blueprint practical for design choices. Product and safety leads can use the distinction between agent model and world model to better diagnose failures and design evaluation pipelines. Researchers interested in agent autonomy and continual learning can use the five agency axes as a roadmap for experiments and benchmarks. The GIC blueprint offers a practical perspective for integration into existing architectures, such as the GIC blueprint.

Key Figures

Figure 1: Humans exhibit multiple layers of intelligence: linguistic and symbolic reasoning, physical and spatial competence, social understanding, and higher-level “philosophical” capacities.
Fig 1: Figure 1: Humans exhibit multiple layers of intelligence: linguistic and symbolic reasoning, physical and spatial competence, social understanding, and higher-level “philosophical” capacities.
Figure 2: Illustration of an agent acting in an environment to achieve a goal.
Fig 2: Figure 2: Illustration of an agent acting in an environment to achieve a goal.
Figure 3: Comparison of step-by-step subgoals to hierarchical decomposition of overall goal. ( Left ) contemporary agentic systems are supplied a short-horizon goal g t g_{t} at every step, and the objective disappears once the interaction ends. ( Right ) Alternative hierarchical approach instructs the system once with a long-term / overall goal g g ; a learned decomposition module δ \delta breaks it into a sequence of subgoals ( g 1 , g 2 , … ) (g_{1},g_{2},\dots) , selected based on outcomes predicted by a hierarchical world model f f and revised as the state s t s_{t} evolves, each pursued by short-horizon capabilities that are easier to learn and supervise.
Fig 3: Figure 3: Comparison of step-by-step subgoals to hierarchical decomposition of overall goal. ( Left ) contemporary agentic systems are supplied a short-horizon goal g t g_{t} at every step, and the objective disappears once the interaction ends. ( Right ) Alternative hierarchical approach instructs the system once with a long-term / overall goal g g ; a learned decomposition module δ \delta breaks it into a sequence of subgoals ( g 1 , g 2 , … ) (g_{1},g_{2},\dots) , selected based on outcomes predicted by a hierarchical world model f f and revised as the state s t s_{t} evolves, each pursued by short-horizon capabilities that are easier to learn and supervise.
Figure 4: An agent that revises its self-model i t i_{t} at each step (fast-slow, solid) expects to accumulate less regret than one with fixed identity i 0 i_{0} (slow-only, dashed), as per Theorem 1 . The slow-only curve grows linearly within each round, with slope drops only at round boundaries when slow-update happens ( ▼ \blacktriangledown ); the fast-slow curve is concave within each round as identity evolution continuously reduces per-step regret.
Fig 4: Figure 4: An agent that revises its self-model i t i_{t} at each step (fast-slow, solid) expects to accumulate less regret than one with fixed identity i 0 i_{0} (slow-only, dashed), as per Theorem 1 . The slow-only curve grows linearly within each round, with slope drops only at round boundaries when slow-update happens ( ▼ \blacktriangledown ); the fast-slow curve is concave within each round as identity evolution continuously reduces per-step regret.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

The proposal is primarily conceptual and theoretical; concrete, large-scale empirical results are deferred to companion work, so practical gains are not yet proven at production scale. Safety and ethical implications of systems that can change their own identity or goals require careful governance, oversight, and robust evaluation before deployment. Scaling to many interacting agents, highly open-ended environments, or real-world robotics raises additional engineering and evaluation challenges that remain open. For governance and evaluation guidance, see governance.

Methodology & More

Modern systems labeled as agents often execute tasks by orchestrating external tools, fixed prompts, or engineered workflows. Those approaches work well for short, well-scoped tasks but falter when a system must pursue long-term objectives, adapt its own capabilities, or decide how deeply to think about a problem. To clarify what genuine agency requires, the authors formalize the decision problem into two objects: an agent model that chooses actions and a world model that predicts consequences, and then introduce latent structures—goals and identity—inside the agent factor rather than scattering them across external scaffolding. planning pattern From that foundation comes GIC (Goal–Identity–Configurator). The design centers on hierarchical goal decomposition (so a single long-term goal is split into manageable subgoals), an evolving identity that updates with experience (so the agent can change self-assessments or roles without retraining), a separate simulative world model used for planning, and a learned configurator that decides when to invoke expensive planning versus acting reactively. The architecture argues for three interacting decision modes (reactive, simulative, and regulated choice) and emphasizes separating training signals for action selection versus environment prediction. Implications include clearer failure diagnosis, more scalable handling of long-horizon tasks, and a structured path toward multi-agent and continual learning research, but empirical validation and safety governance are necessary next steps. Agentic RAG Pattern Planning Pattern
Avoid common pitfallsLearn what failures to watch for
Learn More
Credibility Assessment:

Includes Eric Xing, a top AI researcher—high credibility despite ArXiv venue.