evaluationmediumcommon

Human-in-the-Loop Pattern

High-stakes decisions requiring human oversight and approval

Overview

The Challenge

Fully autonomous agents make mistakes, take irreversible actions, or handle sensitive decisions without appropriate oversight.

The Solution

Integrate human review at critical decision points, allowing approval, modification, or rejection of agent actions before execution.

When to Use

Financial transactions above thresholds
Healthcare recommendations
Legal document generation
Any irreversible or high-impact actions

When NOT to Use

High-volume, low-stakes operations
Real-time systems where latency is critical
Tasks where human review adds no value

Trade-offs

Advantages

+Prevents costly mistakes
+Builds user trust
+Satisfies regulatory requirements
+Captures edge cases for improvement

Considerations

−Adds latency to workflows
−Creates bottlenecks at human review
−Requires human availability
−Can cause decision fatigue

Implement this pattern with our SDK

Get RepKit

Deep Dive

Overview

Human-in-the-Loop (HITL) is an architectural pattern where human judgment is strategically embedded in agent workflows. Rather than full autonomy, HITL ensures humans supervise high-stakes decisions.

Why HITL Matters in 2025

Even the most capable agents fail frequently:

Google's Gemini 2.5 Pro fails to complete real-world office tasks 70% of the time
Agents get stuck in loops, misread instructions, or take context-inappropriate actions
A Taco Bell customer ordered 18,000 waters through an AI drive-through

HITL prevents small mistakes from becoming major incidents.

Decision Framework

When to Require HITL

Risk Level	Reversibility	Recommendation
Low	Reversible	Full autonomy
Medium	Reversible	Async HITL
High	Irreversible	Sync HITL required
Critical	Any	Always HITL

Examples by Category

Full Autonomy:

Answering factual questions
Formatting documents
Internal calculations

Async Review:

Email drafts (review before send)
Code suggestions (review before commit)
Report generation

Sync HITL Required:

Financial transactions
Database modifications
External API calls with side effects
Healthcare recommendations

Implementation Patterns

Interrupt-Based (LangGraph)

from langgraph.checkpoint import MemorySaver

# Define interrupt before sensitive action
graph.add_conditional_edges(
    "agent",
    should_interrupt,
    {
        "interrupt": "human_review",
        "continue": "execute_action"
    }
)

AG-UI Integration

{
  "type": "INTERRUPT",
  "action": "delete_database",
  "details": {...},
  "options": ["approve", "deny", "modify"]
}

Async Channels

For non-blocking flows, route to review channels:

Slack notifications
Email approvals
Dashboard queues

HITL Response Options

Approve: Execute action as proposed
Modify: Edit parameters before execution
Reject: Cancel with feedback
Escalate: Route to higher authority

Best Practices

Right-Size Interrupts

Too many interrupts create fatigue; too few allow errors. Tune thresholds based on:

Historical error rates
Cost of mistakes
User tolerance for friction

Context Preservation

When resuming after HITL, ensure full context is restored. Use persistent checkpointing.

Feedback Loop

Capture human decisions to improve future routing:

if human_approved and agent_was_confident:
    # Maybe reduce HITL for similar cases
elif human_rejected and agent_was_confident:
    # Increase caution for similar cases

Regulatory Compliance

EU AI Act, NIST AI RMF, and ISO/IEC 42001 all emphasize human oversight for high-risk AI systems. HITL patterns help meet compliance requirements.

The Balance

HITL isn't a temporary workaround—it's a long-term pattern for building trustworthy AI. The goal is finding the right balance: machine speed for filtering, human expertise where it counts.

Example Scenarios

Trading Bot Approval

A trading agent proposes transactions above $10,000. Each proposal goes to a human trader who reviews market context and approves or rejects within a time window.

OutcomePrevented three potentially costly trades in the first month while maintaining 98% approval rate for good decisions

Want to learn more patterns?

Explore Learning Paths

Considerations

Balance HITL frequency against user friction. Too many interrupts cause fatigue; too few allow errors.

PreviousEvaluation-Driven Development (EDDOps)

NextLLM-as-Judge Pattern

Dimension Scores

Safety

5/5

Accuracy

5/5

Cost

2/5

Speed

1/5

Implementation

Complexitymedium

Implementation Checklist

Checkpoint system

Review queue UI

State persistence

0/3 complete

Human-in-the-Loop Pattern

Overview

The Challenge

The Solution

When to Use

When NOT to Use

Trade-offs

Advantages

Considerations

Deep Dive

Overview

Why HITL Matters in 2025

Decision Framework

When to Require HITL

Examples by Category

Implementation Patterns

Interrupt-Based (LangGraph)

AG-UI Integration

Async Channels

HITL Response Options

Best Practices

Right-Size Interrupts

Context Preservation

Feedback Loop

Regulatory Compliance

The Balance

Example Scenarios

Trading Bot Approval

Considerations

Implementation

Tags