evaluationsimplecommon

Reflection Pattern

Improving output quality through iterative self-critique

Overview

The Challenge

Initial agent outputs often contain errors, inconsistencies, or quality issues that could be caught with review.

The Solution

Add a self-evaluation layer where the agent critiques its own output, identifies problems, and iteratively refines until quality thresholds are met.

When to Use
  • High-stakes outputs where errors are costly
  • Creative tasks benefiting from refinement
  • Tasks with clear quality criteria
  • Code generation and review
When NOT to Use
  • Latency-critical applications
  • Simple factual lookups
  • When "good enough" is acceptable

Trade-offs

Advantages
  • +Catches errors before delivery
  • +Improves output quality significantly
  • +Self-documenting critique process
  • +No additional infrastructure needed
Considerations
  • Multiplies LLM calls and latency
  • Can over-refine and make output worse
  • May never reach satisfaction threshold
  • Higher cost per request
New to agent evaluation?
Start Learning

Deep Dive

Overview

The Reflection pattern adds self-critique to agent workflows. After generating an initial response, the agent switches into "critic mode" to evaluate its work, then revises if needed.

Core Mechanism

Generate Initial Output
        ↓
Self-Critique (Critic Mode)
        ↓
Issues Found? → Yes → Revise Output → Loop
        ↓
        No
        ↓
Return Final Output

Critique Dimensions

Accuracy Check

  • Are facts correct?
  • Are sources properly cited?
  • Are calculations accurate?

Completeness Check

  • Are all requirements addressed?
  • Is anything missing?
  • Are edge cases considered?

Consistency Check

  • Does the output contradict itself?
  • Is the logic sound?
  • Are assumptions explicit?

Quality Check

  • Is it well-written?
  • Is the structure clear?
  • Does it follow guidelines?

Implementation

async def generate_with_reflection(prompt, max_iterations=3):
    output = await agent.generate(prompt)

    for i in range(max_iterations):
        critique = await agent.critique(output)

        if critique.is_satisfactory:
            return output

        output = await agent.revise(output, critique.feedback)

    return output  # Return best effort after max iterations

When to Use Reflection

Good fit:

  • High-stakes outputs where errors are costly
  • Creative tasks benefiting from refinement
  • Tasks with clear quality criteria
  • Educational/explanatory content

Poor fit:

  • Latency-critical applications
  • Simple factual lookups
  • Tasks where "good enough" is acceptable

Combining with Other Patterns

Reflection pairs well with:

  • ReAct: Reflect on action outcomes
  • Multi-Agent: Separate critic agent
  • LLM-as-Judge: Formalize critique scoring

Anti-Patterns

Infinite Refinement

Agent never satisfied with output. Set maximum iterations.

Shallow Critique

"Looks good" isn't useful feedback. Require specific observations.

Over-Revision

Agent makes output worse through excessive changes. Track quality metrics.

Ready to implement?
Get RepKit
Considerations

Reflection multiplies LLM calls and latency. Use when the cost of errors exceeds the cost of additional processing.

Dimension Scores
Safety
3/5
Accuracy
5/5
Cost
2/5
Speed
2/5
Implementation
Complexitysimple
Implementation Checklist
Critique prompts
Quality thresholds
Iteration limits
0/3 complete
Tags
evaluationself-critiquequalityiterativerefinement

Was this pattern helpful?