The "LangGraph vs LangChain" framing is half of why teams get this decision wrong. They aren't competing libraries — they're a stack the same authors maintain, and the dividing line keeps moving as production AI patterns settle. We run both in production every week across customer deployments. Here's what the boundary looks like in 2026, where each one earns its keep, and the failure modes you should be ready for.
What each one actually is
LangChain — the integration & primitives layer
LangChain is the set of building blocks: model abstractions, prompt templates, output parsers, document loaders, vector store clients, retrievers, and the glue around RAG. It's where the third-party integration work lives — every database, every model provider, every tool wrapper.
LangGraph — the orchestration layer
LangGraph is a directed-graph state machine for agentic workflows. Nodes are functions, edges are transitions, and the state object flows through the graph. It's where you express "if the model wants to call a tool, route to the tool node; if it returns a final answer, route to terminate."
LangGraph imports from LangChain. They're meant to be used together, and the high-quality production deployments we've shipped do.
Where each one is the right tool
Use LangChain alone for:
- Single-shot prompt → response patterns (summarisation, extraction, classification).
- RAG over a fixed corpus where you don't need multi-step reasoning or tool use.
- Quick prototypes — the highest-velocity way to get from "I have an idea" to "I'm calling an LLM with my data."
- Components you'll reuse from a non-LangGraph orchestrator (e.g. calling a retriever from your own backend code).
Use LangGraph for:
- Anything where the model decides what to do next based on intermediate state — agents, multi-step reasoning, tool routing.
- Workflows with cycles (retry on failure, refine and re-evaluate, human approval gates).
- Multi-agent setups (planner → researcher → writer → critic).
- Production systems where you need observable, replayable, time-travel-debuggable execution.
What changed when we moved from LangChain agents to LangGraph
The LangChain v0.x AgentExecutor abstraction was, charitably, a leak magnet. It hid a state machine inside a callback chain, error handling was implicit, and the same prompt could behave radically differently between runs because the loop was opaque.
LangGraph fixed three things that mattered for production:
- State is explicit. You define a TypedDict (Python) or interface (TypeScript) for the agent's working memory. Every node reads and writes it. There's no hidden mutation.
- Edges are inspectable. Routing logic is a function you can test, log, and instrument. You can see exactly why the graph went left at each decision point.
- Checkpointing is built in. A graph run is serializable to a checkpointer (we use Postgres). You can pause, resume, replay, fork, or hand a long-running agent to a human for approval mid-flight.
That third point is what closed the deal for production for us. Long-running agents that survive process restarts, integrate with message queues, and emit observable events were genuinely hard to build on top of LangChain agents. With LangGraph, they're the default.
Concrete patterns we ship
1. The supervisor + worker pattern
A supervisor node decides which worker (researcher, retriever, writer, validator) gets the next turn. Workers do their job and return to the supervisor. The supervisor decides termination. This is our default for any non-trivial agent.
2. The reflection loop
Generate → critique → refine, with a hard turn limit. Useful for anything where the first draft is consistently bad in the same ways — code generation, structured output, anything with strict validation.
3. The human-in-the-loop gate
For high-stakes actions (sending email, modifying production data, approving a refund), the graph pauses at a node, persists state, and waits for an external API call to resume. The "user" can be a Slack approval, a queue worker, or a UI button.
4. The retry-with-different-tool pattern
If a tool call fails or returns low-quality output, route to a different tool with different parameters. Keep a counter in state to prevent infinite loops.
Production gotchas — what actually breaks
Cost explosions in cycles
A graph with cycles can — and will — burn through tokens silently. Every production deployment needs a max-iteration cap, a cumulative-token cap, and an alert on either. We default to 12 iterations and a 100k-token cap per run. Anything beyond that is almost always a bug.
State bloat
The state object can grow unbounded across iterations — every retrieved chunk, every tool result, every reflection. Pass through only what the next node needs. We routinely cut state size by 70% with summarisation nodes between hops.
Tool-calling non-determinism
Different model versions emit tool calls differently — argument formatting, escape characters, missing fields. Your graph should validate every tool call against a schema and route invalid calls to a recovery node, not crash.
Streaming the right thing
Users want to see tokens stream during the final answer node, not during the supervisor's internal deliberation. LangGraph's streaming is per-node — pick the nodes whose output you want surfaced and suppress the rest.
Observability in three layers
Production agents need three observability layers:
- LangSmith / Helicone for trace-level inspection.
- Structured logs per node with correlation IDs so your APM tool can see graph runs as transactions.
- An eval suite that runs before deployment and on a sample of production traffic.
Skipping any of these is fine in prototype, fatal in production.
Migrating from LangChain agents to LangGraph
We've done this migration four times now. The pattern that works:
- List the existing agent's possible states and transitions on a whiteboard. You'll find more than you expected.
- Convert each tool wrapper into a graph node — same input, same output, exposed as a function. No logic change yet.
- Replace the implicit AgentExecutor loop with explicit edges. Start with a single supervisor node that routes between tools.
- Add checkpointing on day one. Don't ship without it.
- Run both old and new in shadow mode for a week against the same inputs. Eval the differences.
- Cut over. Remove the old code in a separate PR.
Allow 1–3 weeks per non-trivial agent. The migration doesn't add capability so much as it adds inspectability — the value compounds with every subsequent change.
What we use today
For new agentic workloads: LangGraph for orchestration, LangChain for the integration primitives (retrievers, document loaders, tool wrappers), with LangSmith for observability and a custom eval harness for regression testing on every PR.
For non-agentic LLM features (single-shot extraction, classification, translation): often we drop LangChain entirely and call the model provider directly. The abstraction tax stops paying for itself when the workflow is a one-step prompt.
The headline
LangChain and LangGraph aren't a versus. They're a stack: LangChain for the building blocks, LangGraph for the control flow. Use them together for production agents, use LangChain alone for single-shot patterns, and don't ship either to production without observability and an eval suite.