From Curious Developer to Building AI Agents — What I Learnt
A personal account of exploring LangChain, RAG systems, and the building blocks of modern agentic AI
A few months ago, I had no idea what a "chain" was in the context of AI. I'd heard the word "LangChain" thrown around in Twitter threads and YouTube thumbnails, and every time someone mentioned "RAG", I quietly assumed they were talking about something else entirely. I was a developer who could write Python, understood APIs, and had played around with ChatGPT — but building real AI systems? That felt like someone else's territory.
This post is my honest attempt to document what changed — what I built, what confused me, where things finally clicked, and why I believe every developer today should have at least a working mental model of how these systems are assembled.
I'm a Machine Learning Student so I had a knowledge on how do I train a model from scratch. But nothing about GenAI, So, I did put together a GitHub repository called GenAI that covers the full stack of LangChain components — and along the way, I learnt more than I expected about how the "AI magic" actually works under the hood.
"The best way to understand a system is to build it piece by piece, and be uncomfortable for long enough that the discomfort becomes familiarity."
Why I Started: The Honest Answer
I won't pretend I had a grand vision. Generative AI was everywhere, and I didn't want to be the developer who could only use these tools as a consumer. I wanted to understand what was happening behind the chat interface. What does the model actually receive when I type a question? How does a chatbot "remember" my previous message? How does an AI assistant search a PDF I uploaded five seconds ago?
These questions nagged at me enough that I decided to stop watching tutorials passively and start building. I picked LangChain because it's the most widely adopted framework for composing AI applications with Python, and because its modular design meant I could learn one piece at a time without needing to understand everything at once.
The first week was humbling. There are a lot of concepts, and the documentation assumes you already know half of them. But here is the thing about LangChain — once you understand its core philosophy, everything else starts to snap into place.
The core philosophy: Every component in a GenAI pipeline is a "Runnable" — something that takes an input, does something with it, and passes an output to the next step. If you understand that, you understand LangChain's entire architecture.
The Pipeline Mental Model That Changed Everything
Before I started building, I thought of AI applications as black boxes. You send text in, text comes out. Once I started working with LangChain, I realised that what looks like magic is actually a series of very deliberate, inspectable steps. Here's the mental model I eventually formed:
Raw Data → Load Documents → Split into Chunks → Embed Vectors → Store in VectorDB → Retrieve Context → Prompt + LLM → Parse Output
Each box in that pipeline is a separate module in my repository. Learning them in order — not all at once — is what I'd recommend to anyone starting out. Let me walk you through what I learnt at each stage.
Stage 1 — Loading Your Data into the System
The first thing any RAG application needs to do is get your data into a format the AI can work with. This sounds trivially simple. It isn't.
PDFs are notoriously messy. Web pages have navigation menus, footers, and ads mixed into the actual content. Word documents have invisible formatting characters. This is where Document Loaders come in — they're LangChain's way of ingesting content from virtually any source and converting it into a clean, standardised Document object.
from langchain_community.document_loaders import PyPDFLoader
# Load a PDF — each page becomes a Document object
loader = PyPDFLoader("my_report.pdf")
documents = loader.load()
# Each document has content + metadata (page number, source, etc.)
print(documents[0].page_content)
print(documents[0].metadata)
What surprised me was how much variety the LangChain community has built here — loaders for CSV files, web URLs, YouTube transcripts, Notion pages, GitHub repos, and more. The abstraction is clean: no matter the source, you always end up with the same Document format.
My honest learning: Don't skip inspecting your loaded documents. I wasted two days wondering why my AI gave bad answers, only to discover my PDF loader was pulling garbled text from a scanned, non-searchable document. Always print(documents[0].page_content) before moving on.
Stage 2 — Splitting: The Step Everyone Underestimates
Language models can only process a limited amount of text at a time (their "context window"). Even with today's very large context windows, embedding an entire 200-page report is impractical and expensive. More importantly, if you give an AI too much irrelevant context alongside the relevant bits, the answer quality drops.
Text splitters break your documents into smaller, overlapping chunks — small enough to embed and retrieve efficiently, but large enough to retain meaning.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # max characters per chunk
chunk_overlap=50 # overlap so context isn't lost at boundaries
)
chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")
The chunk_overlap parameter is the one that took me the longest to appreciate. Without overlap, a sentence that spans the end of one chunk and the start of another gets cut in half — and neither chunk retains the full idea. With overlap, both chunks contain that sentence, so retrieval is more robust.
Chunk size is a dial, not a switch. Too small and you lose context. Too large and retrieval becomes imprecise. Most tutorials say 500–1000 tokens is a good starting point, but your ideal number depends entirely on your data.
Stage 3 — Embeddings and Vector Stores: Where It Gets Fascinating
This is the part that genuinely fascinated me and had me falling down rabbit holes for days. An embedding is a way of converting text into a list of numbers — a vector — where texts that are semantically similar end up as vectors that are numerically close to each other.
Think about what that means: "How do I fix a car engine?" and "My automobile won't start — what's wrong?" might not share a single word, but their embeddings will be close together in vector space because they mean similar things. This is what enables semantic search — the kind that finds relevant documents even when the user's exact words don't appear in the document.
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Convert chunks into vectors and store them
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)
# Save to disk so you don't re-embed every run
vectorstore.save_local("my_index")
I used Chroma for most of my experiments because it runs entirely locally — no cloud required, no API costs for the storage step. For more persistent needs, FAISS (Facebook AI Similarity Search) is another excellent open-source option.
The moment I ran my first similarity search and watched it pull back the exact paragraph relevant to my query from a 50-page document — without any keyword matching — was the moment this whole field stopped feeling like magic and started feeling like engineering.
Stage 4 — Retrievers: The Bridge Between Data and Intelligence
A retriever wraps a vector store and exposes a simple interface: give it a question, get back the most relevant chunks. But there's more nuance here than I initially expected.
- Similarity Search: Returns the top-k chunks most similar to the query. Fast and straightforward. Good starting point.
- MMR (Max Marginal Relevance): Balances relevance and diversity. Avoids returning five near-identical chunks when your document has repeated phrasing.
- MultiQuery Retriever: Uses an LLM to generate multiple variants of your query before searching — improves recall when user phrasing is ambiguous.
- Contextual Compression: After retrieval, an LLM trims each chunk to only the sentence or two actually relevant to the question.
I spent a lot of time in the Retrievers/ section of my repo because this is often where RAG applications live or die. A well-tuned retriever with a mediocre LLM outperforms a brilliant LLM fed irrelevant chunks every single time.
Stage 5 — Prompt Templates: The Overlooked Craft
If embeddings and retrieval are the science, prompt engineering is the craft. A prompt template is how you package the user's question, the retrieved context, and any instructions to the model into a coherent message.
Early on I would write prompts as raw strings and wonder why results were inconsistent. Once I started using ChatPromptTemplate properly — with distinct system and human message roles — the quality and predictability of responses improved noticeably.
from langchain.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant.
Answer the question using ONLY the context below.
If the answer isn't in the context, say 'I don't know'.
Context: {context}"""),
("human", "{question}")
])
The "I don't know" instruction in the system prompt matters more than it looks. Without it, LLMs will confidently hallucinate an answer rather than admit ignorance. Grounding the model explicitly in the retrieved context is what separates a useful RAG system from a confident liar.
Stage 6 — Chains and Runnables: The Modern Way to Compose
This is where LangChain's architecture really shines — and also where I initially got lost, because the framework has evolved significantly. The old way used explicit LLMChain classes. The modern way uses LangChain Expression Language (LCEL) with the pipe operator (|).
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# The full RAG chain in 5 lines
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()} | prompt| llm| StrOutputParser()
)
answer = rag_chain.invoke("What does the document say about climate change?")
Read that chain left to right: the user's question goes in, the retriever fetches context, both are fed into the prompt template, the formatted prompt goes to the LLM, and the output is parsed into a clean string. Elegant. Composable. And every step is individually testable.
The | pipe operator was inspired by Unix pipes. If you've ever written cat file.txt | grep "error" | sort in a terminal, you already intuitively understand LCEL. The paradigm is the same: chain small, focused operations together.
Stage 7 — Output Parsers and Structured Outputs
One of the practical frustrations of working with LLMs is that they return text — and your application usually needs data. Output parsers bridge that gap.
The two approaches I found most useful:
- Pydantic-based structured output — define a Python class with typed fields, and the model is instructed to return something that exactly matches that schema.
- JSON output parser — simpler, but requires the model to reliably produce valid JSON (which it usually does with the right instructions).
from pydantic import BaseModel
class ArticleSummary(BaseModel):
title: str
key_points: list[str]
sentiment: str # "positive", "negative", or "neutral"
confidence: float
# Now the LLM's output is a typed Python object, not a string
structured_llm = llm.with_structured_output(ArticleSummary)
result = structured_llm.invoke("Summarise this news article: ...")
print(result.key_points) # ['Point 1', 'Point 2', ...]
Once I started using structured outputs, a whole class of post-processing code disappeared from my projects. No more string slicing, no more fragile regex patterns, no more "what if the model formats it differently today?" anxiety.
Stage 8 — Tools and Agents: When the AI Takes Action
Everything up to this point has been about answering questions. Tools and agents are about taking action. This is what makes a system genuinely "agentic."
A Tool is any function the AI can decide to call. You describe what it does in plain English, and the model determines when and how to use it based on the user's request. The simplest example:
from langchain.tools import tool
@tool
def search_wikipedia(query: str) -> str:
"""Search Wikipedia and return a summary for the given query."""
# ... implementation
return summary
# The agent decides WHEN to call this based on the user's question
agent = create_tool_calling_agent(llm, [search_wikipedia], prompt)
What blew my mind was the ReAct loop — the pattern that most agents follow internally: Reason about what to do next, Act by calling a tool, Observe the result, then repeat. It's a surprisingly simple loop that enables surprisingly sophisticated behaviour.
Watching an agent decide "I need to search the web for this, then use the result to answer the question" — and then do exactly that — was the first time building this stuff genuinely felt exciting rather than educational.
What I'd Tell Someone Starting Today
If you're at the beginning of this journey, here's the condensed version of everything I learnt the hard way:
- Start with the pipeline, not the components. Build one tiny end-to-end RAG system first — even if it's bad. Seeing the whole thing work, however crudely, gives every subsequent concept a place to live in your mental model.
- Always inspect your data between steps. Print your loaded documents. Print your chunks. Print your retrieved documents before they hit the LLM. Most bugs in RAG systems aren't model bugs — they're data quality bugs upstream.
- Use local models for experimentation. Ollama lets you run models like Llama 3 entirely on your machine. This removes API cost anxiety and lets you iterate freely without worrying about every token.
- Embeddings are not all equal. A free HuggingFace embedding model will get you 80% of the way. OpenAI's embeddings get you further. For most projects, start free and upgrade when you hit a wall.
- LCEL pipes are your friend. Once you're comfortable writing chains with the
|operator, you'll look back at the older chain syntax and wonder how you ever tolerated it. Invest time here early. - Build something you actually want to use. My portfolio project was a document Q&A tool for my own notes. Because I cared about the output, I was motivated to fix the subtle bugs I would have otherwise ignored.
Where I'm Going Next
This repository covers the foundational stack. What it doesn't yet cover — but what I'm actively exploring — is the next layer of complexity: LangGraph for building stateful, multi-agent workflows; memory architectures that let an agent remember conversations across sessions; and evaluation frameworks for measuring whether a RAG system is actually performing well, not just looking like it is.
I'm also fascinated by the economics of inference — how to choose between a fast, cheap model and a slow, capable one depending on the subtask. Agent systems that route simple questions to Llama-3.2-3B and complex reasoning to GPT-4o dynamically are architecturally elegant and practically cost-effective.
But that's for future posts. For now, the repository stands as my working notebook — every directory a chapter, every file a lesson I had to learn by breaking things first.
✦ ✦ ✦
If you're building in this space too, or just starting to explore it — I'd genuinely love to hear from you. What clicked for you? What's still confusing? Drop it in the comments. The best learning I've had in this field has been from conversations with people at all different stages of the same journey.
The code for everything I've described here is open on GitHub. Go break it, extend it, and make it yours.
0 Comments