evaluationsimplecommon

Reflection Pattern

Improving output quality through iterative self-critique

Overview

The Challenge

Initial agent outputs often contain errors, inconsistencies, or quality issues that could be caught with review.

The Solution

Add a self-evaluation layer where the agent critiques its own output, identifies problems, and iteratively refines until quality thresholds are met.

When to Use

High-stakes outputs where errors are costly
Creative tasks benefiting from refinement
Tasks with clear quality criteria
Code generation and review

When NOT to Use

Latency-critical applications
Simple factual lookups
When "good enough" is acceptable

Trade-offs

Advantages

+Catches errors before delivery
+Improves output quality significantly
+Self-documenting critique process
+No additional infrastructure needed

Considerations

−Multiplies LLM calls and latency
−Can over-refine and make output worse
−May never reach satisfaction threshold
−Higher cost per request

New to agent evaluation?

Start Learning

Deep Dive

Overview

The Reflection pattern adds self-critique to agent workflows. After generating an initial response, the agent switches into "critic mode" to evaluate its work, then revises if needed.

Core Mechanism

Generate Initial Output
        ↓
Self-Critique (Critic Mode)
        ↓
Issues Found? → Yes → Revise Output → Loop
        ↓
        No
        ↓
Return Final Output

Critique Dimensions

Accuracy Check

Are facts correct?
Are sources properly cited?
Are calculations accurate?

Completeness Check

Are all requirements addressed?
Is anything missing?
Are edge cases considered?

Consistency Check

Does the output contradict itself?
Is the logic sound?
Are assumptions explicit?

Quality Check

Is it well-written?
Is the structure clear?
Does it follow guidelines?

Implementation

async def generate_with_reflection(prompt, max_iterations=3):
    output = await agent.generate(prompt)

    for i in range(max_iterations):
        critique = await agent.critique(output)

        if critique.is_satisfactory:
            return output

        output = await agent.revise(output, critique.feedback)

    return output  # Return best effort after max iterations

When to Use Reflection

Good fit:

High-stakes outputs where errors are costly
Creative tasks benefiting from refinement
Tasks with clear quality criteria
Educational/explanatory content

Poor fit:

Latency-critical applications
Simple factual lookups
Tasks where "good enough" is acceptable

Combining with Other Patterns

Reflection pairs well with:

ReAct: Reflect on action outcomes
Multi-Agent: Separate critic agent
LLM-as-Judge: Formalize critique scoring

Anti-Patterns

Agent never satisfied with output. Set maximum iterations.

Shallow Critique

"Looks good" isn't useful feedback. Require specific observations.

Over-Revision

Agent makes output worse through excessive changes. Track quality metrics.

Ready to implement?

Get RepKit

Considerations

Reflection multiplies LLM calls and latency. Use when the cost of errors exceeds the cost of additional processing.

PreviousRed Teaming Pattern

NextEvaluation-Driven Development (EDDOps)

Dimension Scores

Safety

3/5

Accuracy

5/5

Cost

2/5

Speed

2/5

Implementation

Complexitysimple

Implementation Checklist

Critique prompts

Quality thresholds

Iteration limits

0/3 complete

Reflection Pattern

Overview

The Challenge

The Solution

When to Use

When NOT to Use

Trade-offs

Advantages

Considerations

Deep Dive

Overview

Core Mechanism

Critique Dimensions

Accuracy Check

Completeness Check

Consistency Check

Quality Check

Implementation

When to Use Reflection

Combining with Other Patterns

Anti-Patterns

Infinite Refinement

Shallow Critique

Over-Revision

Considerations

Implementation

Tags