OperationsProduction Ready

langfuse

Name: langfuse
Rating: 5.0 (21810 reviews)
Author: langfuse

by langfuse

LLM observability, evals, and prompt management for production systems

TypeScript

Updated Feb 11, 2026

21.8k

Stars

2.1k

Forks

Commits/Week

152

Commits/Month

View on GitHub

What It Does

Langfuse collects and visualizes LLM telemetry, prompts, metrics, and evals to give engineering teams observability into model behavior. Pipes events from SDKs and OpenTelemetry, stores traces and prompts, and provides dashboards, a playground, and evaluation tooling. Distinctive features include integrated evals, prompt management, and exportable traces for debugging agent interactions through Model Context Protocol (MCP). Additionally, it supports exporting traces for debugging via the Vector Database.

Test in the Playground

Validate against scenarios

Key Benefits

As agent systems scale, you need granular logs and evaluations to understand failures and build trust: surface which models, prompts, or agents caused issues. Langfuse lets teams correlate prompts, model outputs, metrics, and evaluations so you can move from anecdote to measurable agent reliability. That visibility is essential for building agent track records, continuous evaluation, and production-grade governance. Human-in-the-Loop

When to Use

Teams running production LLMs or multi-agent workflows who need centralized logging, prompt-level traces, and continuous evaluation for reliability and governance. Agent Registry Pattern

How It's Used

When you need centralized tracing of prompts, model responses, and metrics to diagnose agent failures
When you want to run continuous evals and track model or agent performance over time
When you require prompt versioning, playground testing, and exporting traces for audits