Highcoordination

Coordination Deadlock

Multiple agents enter a state where each is waiting for another to act, causing the entire system to stall.

Overview

How to Detect

Tasks hang indefinitely. Agents repeatedly check status without progress. System throughput drops to zero. Timeout errors cascade across the system.

Root Causes

Circular dependencies between agents. Missing timeout configurations. Ambiguous handoff protocols. Resource contention without arbitration.

Test your agents against this failure mode

Try Playground

Deep Dive

Overview

Deadlock occurs when agents form circular dependencies, each waiting for resources or actions that can only come from another waiting agent.

Classic Deadlock Pattern

Agent A: Holds Resource 1, waiting for Resource 2
Agent B: Holds Resource 2, waiting for Resource 1

Result: Both wait forever.

Agent-Specific Deadlock Scenarios

Handoff Deadlock

Agent A: "Task requires specialized knowledge. Handing to Agent B."
Agent B: "I need additional context. Handing back to Agent A."
Agent A: "Waiting for Agent B's response..."
[Infinite loop]

Approval Deadlock

Agent A: "Action requires approval from Agent B."
Agent B: "I need Agent A to verify credentials first."
[Neither can proceed]

Resource Contention Deadlock

Multiple agents compete for exclusive access to the same tools or data sources.

Consensus Deadlock

In voting systems, agents may wait for a quorum that can never be reached.

Detection Patterns

Timeout-Based

async def execute_with_timeout(task, timeout=30):
    try:
        return await asyncio.wait_for(task, timeout)
    except asyncio.TimeoutError:
        log.warning("Potential deadlock detected")
        return await break_deadlock(task)

Cycle Detection

Monitor agent state graphs for circular wait patterns:

def detect_cycles(wait_graph):
    # DFS for cycle detection
    visited = set()
    rec_stack = set()

    for agent in wait_graph:
        if has_cycle(agent, wait_graph, visited, rec_stack):
            return True
    return False

Prevention Strategies

Ordered Resource Acquisition

Always acquire resources in a consistent global order.

Timeouts with Fallbacks

Never wait indefinitely; always have a fallback path.

Preemption

Allow system to forcibly release resources from stalled agents.

Lock-Free Designs

Use optimistic concurrency or message passing instead of locks.

How to Prevent

Timeout Policies: Set maximum wait times for all inter-agent operations.

Deadlock Detection: Monitor wait graphs for cycles.

Resource Ordering: Acquire shared resources in consistent global order.

Preemption Rights: Allow coordinators to break deadlocks by forcing agent actions.

Heartbeat Monitoring: Detect stalled agents through health checks.

Want expert guidance on implementation?

Get Consulting

Real-World Examples

A customer service multi-agent system experienced deadlock when the routing agent waited for the specialist agent to accept a task, while the specialist waited for the routing agent to provide required context.

PreviousContext Drift

NextExplanation Degradation

Coordination Deadlock

Overview

How to Detect

Root Causes

Deep Dive

Overview

Classic Deadlock Pattern

Agent-Specific Deadlock Scenarios

Handoff Deadlock

Approval Deadlock

Resource Contention Deadlock

Consensus Deadlock

Detection Patterns

Timeout-Based

Cycle Detection

Prevention Strategies

Ordered Resource Acquisition

Timeouts with Fallbacks

Preemption

Lock-Free Designs

How to Prevent

Real-World Examples

Tags