background-shape
LangChain LCEL vs LlamaIndex, Picking a Framework in Late 2023
November 28, 2023 · 7 min read · by Muhammad Amal ai

TL;DR — LangChain (with LCEL) is best when your app is an orchestration of LLM calls, tools, and branching logic. / LlamaIndex is best when your app is fundamentally about retrieval and indexing over a body of documents. / Many production systems benefit from using both — LlamaIndex for the retrieval layer, LangChain for the application graph around it.

Every engineering team I talk to lately asks some version of the same question. “Should we be using LangChain or LlamaIndex?” The framing is wrong — they overlap but aren’t substitutes — but the question is fair, because the docs for both have grown faster than the conceptual clarity around what each one is for.

I’ve shipped production systems on both, sometimes on the same codebase. Here’s the working model I use for picking between them in late 2023.

What Each One Is Actually For

LlamaIndex started as GPT Index and its center of gravity is still retrieval over documents. Indexing, chunking, querying, retrievers, response synthesizers — these are the abstractions where LlamaIndex feels native. When you’re building “ask questions over a corpus,” LlamaIndex is the more direct fit. Its documentation reflects this — the conceptual model is documents in, queries out.

LangChain started as a chain abstraction for composing LLM calls and has grown into a general framework: agents, tools, memory, callbacks, retrievers (which it has, but they’re not its center). LCEL (LangChain Expression Language) is the recent addition that pulled the chain composition into a pipe-style operator API. If your application is “LLM does X, then if condition Y, call tool Z, then loop until done” — that’s LangChain territory.

The overlap is wide. Both can build a basic RAG pipeline. Both have retrievers, both have chat memory, both have document loaders. The question is which one’s center of gravity is closer to your application’s shape.

What LCEL Looks Like Now

LangChain 0.0.340 (current as of this week) has LCEL as the recommended composition style. The pipe operator chains components:

# langchain==0.0.340
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.vectorstores import PGVector
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
vector_store = PGVector(
    connection_string=PG_CONN,
    embedding_function=embeddings,
    collection_name="wiki",
)
retriever = vector_store.as_retriever(search_kwargs={"k": 6})

llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using only the provided context. Cite sources."),
    ("user", "Context:\n{context}\n\nQuestion: {question}"),
])

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke("What's the on-call rotation for payments?")

This is a real improvement over the old LLMChain / RetrievalQA style. The pipe composition makes the data flow explicit. Streaming works through the chain (chain.stream()) without special handling. Async works (chain.ainvoke()). Batching works (chain.batch()).

The pieces are also more composable. You can swap the retriever, the prompt, or the LLM without restructuring. Branching is a RunnableBranch. Parallel calls are RunnableParallel. The expression language is starting to look like an orchestration DSL.

That said: LCEL still has rough edges. Type errors are unhelpful when a step’s output doesn’t match the next step’s input. Debugging a long chain requires LangSmith or a lot of print(chain.invoke(...)) at intermediate points. The abstraction leaks more than I’d like, but it’s a meaningful step forward from where LangChain was in the spring.

What LlamaIndex Does Better

LlamaIndex’s strength is the retrieval stack. The node parsers, the postprocessors (rerankers, similarity filters), the response synthesizers — these have more depth than LangChain’s equivalents. If you’re tuning retrieval (and you should be — see my eval post), the LlamaIndex abstractions give you more handles.

The query engine concept is also more honest about what retrieval-augmented generation actually does. A BaseQueryEngine takes a query and returns a response with source nodes. The composition inside — retrieve, rerank, synthesize — is open for substitution. LangChain’s RAG path is a chain of pieces stitched together; it works but the seams show.

For ingest, LlamaIndex’s document loaders (LlamaHub) cover more sources and the metadata propagation through chunking is cleaner. If your pipeline is “ingest these 20 different document types, embed, store, query,” LlamaIndex saves more code.

Where I’d Pick Each

Pick LlamaIndex when:

  • The product is fundamentally document QA over a defined corpus
  • You need fine control over chunking, retrieval, and rerank stages
  • Your team is small and you want the shortest path from documents to a working query engine
  • You’re tuning retrieval metrics seriously

Pick LangChain when:

  • The application has substantial orchestration beyond RAG (multi-step agents, tools, conditional branches)
  • You’re building chat applications where memory and conversation state matter
  • You need composability across many LLM calls in a single user request
  • You want the agent abstractions, even with their flaws

Use both when:

  • The retrieval is non-trivial (you’d want LlamaIndex for it) and the app around it is non-trivial (you’d want LangChain for that). This is more common than you’d think.

The interop is fine. LlamaIndex retrievers can be wrapped to expose a LangChain BaseRetriever interface. LangChain tools can call LlamaIndex query engines.

# Wrap LlamaIndex query engine as a LangChain tool
from langchain.tools import Tool
from llama_index.query_engine import RetrieverQueryEngine

li_query_engine = RetrieverQueryEngine.from_args(li_retriever, ...)

def query_internal_docs(question: str) -> str:
    response = li_query_engine.query(question)
    return str(response)

internal_docs_tool = Tool(
    name="internal_docs_search",
    func=query_internal_docs,
    description="Search internal company documentation",
)

This pattern lets the LangChain agent treat the LlamaIndex query engine as one tool among many. The agent decides when to call it; the query engine handles the retrieval depth.

The 0.9 Migration

LlamaIndex 0.9 is days away as I write this. The breaking changes I’m tracking:

  • ServiceContext is being deprecated in favor of a more explicit per-component configuration
  • The package layout is reshuffling (the LlamaIndex 0.9 migration notes will be the source of truth — verify before you upgrade)
  • Some retriever and node parser APIs are getting cleaner signatures

I’d hold off upgrading production stacks for two to four weeks after 0.9 ships. Let the community find the rough edges first. For new projects starting after 0.9, start there.

LangChain doesn’t have a clean version cadence — it ships breaking changes inside 0.0.x releases regularly. Pin your version aggressively. Read the changelog before you upgrade.

Common Pitfalls

Treating either framework as a stable platform. Both libraries are evolving fast. Pin versions. Test on every upgrade. Don’t build deeply against internal modules — only the documented surface.

Choosing based on the demo, not the workload. LangChain agent demos are compelling. Most production apps don’t need a full agent loop; they need a deterministic pipeline. LlamaIndex query engine demos are compelling. Some production apps need conversational orchestration that LlamaIndex doesn’t model as cleanly. Match to your actual shape.

Adopting LCEL without buying in fully. Mixing the old LLMChain style with LCEL in the same codebase is painful. Pick one and migrate completely.

Skipping the eval. Frameworks don’t fix retrieval quality. You still need the eval pipeline regardless of which framework you choose.

Over-engineering with agents. “Just have the LLM decide what to do” is a tempting design and a debugging nightmare in production. Deterministic chains for deterministic workflows. Save agent loops for genuinely open-ended tasks.

What’s Next

I expect the framework landscape will keep churning. There’s room for a leaner framework that picks one or two opinions and holds them — and we may see it ship in 2024. Until then, the working answer is: use what fits your shape, pin your versions, and don’t fall in love with any one abstraction.

Wrapping Up

LangChain and LlamaIndex aren’t competitors so much as adjacent libraries with overlapping middles. Pick by what’s at the center of your application. Use both when each one’s strength applies to a different part of your system. Pin versions, ship the eval, and don’t let the framework dictate your architecture.

The frameworks are tools. The architecture is yours.