Overview
Race conditions occur when multiple agents access and modify shared resources concurrently without proper synchronization. The outcome depends on the unpredictable timing of agent operations, leading to inconsistent and often incorrect results.
Classic Race Condition Pattern
Time Agent A Shared State Agent B
──── ──────── ──────────── ────────
T1 Read balance: $100 balance = $100
T2 balance = $100 Read balance: $100
T3 Deduct $30
T4 Write: $70 balance = $70
T5 balance = $70 Deduct $50
T6 balance = $50 Write: $50
Expected: $100 - $30 - $50 = $20
Actual: $50 (Agent A's update lost)
Multi-Agent Race Scenarios
Document Editing Race
Agent A: Editing paragraph 3
Agent B: Also editing paragraph 3
Agent C: Restructuring document
All three save simultaneously:
- A's changes overwritten by B
- C's restructure loses both A and B's work
Task Assignment Race
Task Queue: [Task 1]
Agent A: Checks queue, sees Task 1, starts processing
Agent B: Checks queue, sees Task 1, starts processing
Result: Task 1 processed twice, potentially with
conflicting outcomes
State Machine Race
Order Status: PENDING
Agent A: Transitions PENDING → PROCESSING
Agent B: Transitions PENDING → CANCELLED
Both succeed (no locking):
Database shows: CANCELLED
Agent A continues: Processes cancelled order
Memory/Context Race
Shared Context: {customer: "Alice", issue: "billing"}
Agent A: Updates context with resolution details
Agent B: Updates context with escalation details
Depending on timing:
- Resolution details lost, or
- Escalation details lost, or
- Corrupted merge of both
Detection Challenges
Non-Deterministic
Race conditions don't occur every time—they depend on precise timing:
Run 1: Works fine
Run 2: Works fine
Run 3: Data corrupted
Run 4: Works fine
Hard to Reproduce
In testing, timing is often different than production:
Test environment: Single-threaded, no races
Production: Multi-agent, races occur
Silent Corruption
Many race conditions don't cause errors—they cause wrong data:
No error thrown
No exception logged
Just incorrect results
Prevention Patterns
Optimistic Locking
def update_with_optimistic_lock(resource_id, update_fn):
while True:
resource = read(resource_id)
version = resource.version
new_value = update_fn(resource)
success = write_if_version_matches(
resource_id, new_value, version
)
if success:
return new_value
# else: retry with fresh read
Task Claiming
def claim_task(agent_id, task_id):
result = atomic_update(
tasks,
{"_id": task_id, "claimed_by": None},
{"$set": {"claimed_by": agent_id, "claimed_at": now()}}
)
return result.modified_count == 1
Event Sourcing
Instead of updating state, append events:
Event 1: {type: "deduct", amount: 30, agent: "A"}
Event 2: {type: "deduct", amount: 50, agent: "B"}
Current state = replay all events in order
No overwrites possible