evaluationmediumcommon

Human-in-the-Loop Pattern

High-stakes decisions requiring human oversight and approval

Overview

The Challenge

Fully autonomous agents make mistakes, take irreversible actions, or handle sensitive decisions without appropriate oversight.

The Solution

Integrate human review at critical decision points, allowing approval, modification, or rejection of agent actions before execution.

When to Use
  • Financial transactions above thresholds
  • Healthcare recommendations
  • Legal document generation
  • Any irreversible or high-impact actions
When NOT to Use
  • High-volume, low-stakes operations
  • Real-time systems where latency is critical
  • Tasks where human review adds no value

Trade-offs

Advantages
  • +Prevents costly mistakes
  • +Builds user trust
  • +Satisfies regulatory requirements
  • +Captures edge cases for improvement
Considerations
  • Adds latency to workflows
  • Creates bottlenecks at human review
  • Requires human availability
  • Can cause decision fatigue
Implement this pattern with our SDK
Get RepKit

Deep Dive

Overview

Human-in-the-Loop (HITL) is an architectural pattern where human judgment is strategically embedded in agent workflows. Rather than full autonomy, HITL ensures humans supervise high-stakes decisions.

Why HITL Matters in 2025

Even the most capable agents fail frequently:

  • Google's Gemini 2.5 Pro fails to complete real-world office tasks 70% of the time
  • Agents get stuck in loops, misread instructions, or take context-inappropriate actions
  • A Taco Bell customer ordered 18,000 waters through an AI drive-through

HITL prevents small mistakes from becoming major incidents.

Decision Framework

When to Require HITL

Risk Level Reversibility Recommendation
Low Reversible Full autonomy
Medium Reversible Async HITL
High Irreversible Sync HITL required
Critical Any Always HITL

Examples by Category

Full Autonomy:

  • Answering factual questions
  • Formatting documents
  • Internal calculations

Async Review:

  • Email drafts (review before send)
  • Code suggestions (review before commit)
  • Report generation

Sync HITL Required:

  • Financial transactions
  • Database modifications
  • External API calls with side effects
  • Healthcare recommendations

Implementation Patterns

Interrupt-Based (LangGraph)

from langgraph.checkpoint import MemorySaver

# Define interrupt before sensitive action
graph.add_conditional_edges(
    "agent",
    should_interrupt,
    {
        "interrupt": "human_review",
        "continue": "execute_action"
    }
)

AG-UI Integration

{
  "type": "INTERRUPT",
  "action": "delete_database",
  "details": {...},
  "options": ["approve", "deny", "modify"]
}

Async Channels

For non-blocking flows, route to review channels:

  • Slack notifications
  • Email approvals
  • Dashboard queues

HITL Response Options

  1. Approve: Execute action as proposed
  2. Modify: Edit parameters before execution
  3. Reject: Cancel with feedback
  4. Escalate: Route to higher authority

Best Practices

Right-Size Interrupts

Too many interrupts create fatigue; too few allow errors. Tune thresholds based on:

  • Historical error rates
  • Cost of mistakes
  • User tolerance for friction

Context Preservation

When resuming after HITL, ensure full context is restored. Use persistent checkpointing.

Feedback Loop

Capture human decisions to improve future routing:

if human_approved and agent_was_confident:
    # Maybe reduce HITL for similar cases
elif human_rejected and agent_was_confident:
    # Increase caution for similar cases

Regulatory Compliance

EU AI Act, NIST AI RMF, and ISO/IEC 42001 all emphasize human oversight for high-risk AI systems. HITL patterns help meet compliance requirements.

The Balance

HITL isn't a temporary workaround—it's a long-term pattern for building trustworthy AI. The goal is finding the right balance: machine speed for filtering, human expertise where it counts.

Example Scenarios

Trading Bot Approval

A trading agent proposes transactions above $10,000. Each proposal goes to a human trader who reviews market context and approves or rejects within a time window.

OutcomePrevented three potentially costly trades in the first month while maintaining 98% approval rate for good decisions
Want to learn more patterns?
Explore Learning Paths
Considerations

Balance HITL frequency against user friction. Too many interrupts cause fatigue; too few allow errors.

Dimension Scores
Safety
5/5
Accuracy
5/5
Cost
2/5
Speed
1/5
Implementation
Complexitymedium
Implementation Checklist
Checkpoint system
Review queue UI
State persistence
0/3 complete
Tags
evaluationsafetyoversightapprovalgovernance

Was this pattern helpful?