The 7 Levels of Claude Code & RAG

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Let's solve the problem of clod code and memory getting AI systems to reliably and accurately

00:00:06answer questions about past conversations or giant troves of documents is a problem we have been

00:00:13trying to solve for years and the typical response has been rag retrieval augmented generation and

00:00:20while this video is titled the seven levels of clod code and rag what this video is really about

00:00:26is deconstructing that problem of clod code and really AI systems in general and memory and even

00:00:33more importantly this video is about giving you a roadmap that shows you where you stand in this

00:00:37fight between AI systems and memory and what you can do to get to the next level. So as we journey

00:00:43through these seven levels of clod code and rag we are going to hit on a number of topics but we

00:00:48are not going to start here in graphrag or anything complicated we're going to start at the beginning

00:00:53which is just the basic memory systems that are native to clod code because sad as it is to say

00:00:59this is where most people not only begin but it's where they stay from auto memory and things like

00:01:04clod md we're going to move to outside tools things like obsidian before we eventually find ourselves

00:01:10with the big boys with the true rag systems at these levels we'll talk about what rag actually is

00:01:16how it works the different types of rag naive rag versus graphrag versus agentic rag things like

00:01:21re-rankers and everything in between and at each level we're going to break it down in the same

00:01:25manner we're going to talk about what to expect at that level the skills you need to master the

00:01:29traps you need to avoid and what you need to do to move on to the follow-on level what this video

00:01:34will not be is a super in-depth technical explanation of how to necessarily set up these

00:01:40specific systems because i've already done this in many instances when we talk about graphrag and

00:01:45lightrag for example or even more advanced topics like rag anything in these different sort of

00:01:50embedding systems i've done videos where i break down from the very beginning to the very end how

00:01:55to set that up yourself so when we get to those sections i will link those videos and this is for

00:02:00both our sakes so this video isn't five hours long but for those levels we're still going to talk

00:02:04about what that actually means what each system buys you and when you should be using it but before

00:02:09we start with level one a quick word from today's sponsor me so just last month i released the clod

00:02:15code master class and is the number one way to go from zero to ai dev especially if you don't come

00:02:21from a technical background and this master class is a little bit different because we focus on a

00:02:25number of different use cases to learn how to use clod code one of those is something like production

00:02:31level rag how to build the rag systems you're going to see in this video in a real life scenario and

00:02:37actually use it as a member of a team or sell it to a client that's the kind of stuff we focus on so

00:02:42if you want to get access you can find it inside of chase ai plus there's a link to that in the pinned

00:02:47comment and we'd love to have you there so now let's start with level one and that's auto memory

00:02:51these are the systems that clod code automatically uses to create some sort of memory apparatus to

00:02:58actually remember things that you've talked about and you know you're here if you've never set

00:03:02anything up intentionally to help clod code remember context in general about previous conversations

00:03:09or just stuff that's going on in your code base and when we talk about auto memory that is quite

00:03:13literally what it is called the auto memory system which is automatically enabled when you use clod

00:03:18code essentially allows clod code to create markdown files on its own that sort of list out

00:03:26things it thinks are important about you in that particular project and this is purely based off

00:03:32of its own intuition based on your conversations and i can see these memory files it's created again

00:03:37it does this on its own if you go into your dot clod folder you go into projects you will see a

00:03:42folder there that is called memory and inside that file you will see a number of markdown files here

00:03:47there are four of them and they're like clod codes version of post-it notes saying oh yeah he mentioned

00:03:51this one time about his youtube project growth goals let's write that down and inside of everyone's

00:03:59memory folder there will be a memory.md file so you see in this memory file it has a little note about

00:04:04one of my skills and then it has you know essentially an index of all these sub memory files saying

00:04:09hey there's a youtube growth one in here a revenue one or references one and here's what's inside of

00:04:13it so if i'm just talking to clod code in my vault file and i mention something about youtube and sort

00:04:19of my goals with growth whatever it's going to reference this and say oh yeah chase is trying to

00:04:23get you know x amount of subscribers by the end of 2026. it's cute but ultimately it's not that useful

00:04:30it's kind of like when you're inside of chat gpt and it will bring up random stuff about

00:04:35previous conversations and it almost like shoehorns it in it's like okay i get it you remembered this

00:04:40but i don't really care and honestly it's a little weird to keep bringing that up i prefer if you

00:04:44didn't and unfortunately this is where most people stay in their memory journey and it's built upon a

00:04:49somewhat almost abusive past that we all have when it comes to using these chatbots

00:04:54because these chatbots don't have any sort of real memory from conversation to conversation and so

00:05:00we're always scared to death of having to exit out of a chat window or exit out of a terminal session

00:05:06because you think oh my gosh it's not going to remember my conversation and this is actually a

00:05:10real problem because what is everybody's answer to the chat window not being able to remember anything

00:05:17well the answer is you just keep that conversation going forever because you don't want to get to a

00:05:22scenario where you have to exit out and it forgets everything this is a fear that is born here inside

00:05:26of these chat windows beginning with chat gpt and same thing with claud's web app and honestly used

00:05:31to be infinitely worse with claud's web app because i think we all remember before the days of the 1

00:05:35million context window where you would have like 30 minutes to talk with claud and be like well see

00:05:39you in four hours the issue is people have brought that sort of psychotic neurotic behavior to the

00:05:45terminal and what they do in large part because you now can get away with it with a 1 million context

00:05:50window is they never clear they just keep talking and talking and talking with claud code because

00:05:55they never want it to forget what they're talking about because of these memory problems and the

00:06:00issue with that is your efficiency goes way down over time the more you talk with claud code inside

00:06:05of the same session and this is the fundamental idea of context rod if you don't know what context rod

00:06:10is it's the phenomenon that the more i use an ai system within its same session within its same chat

00:06:16and i fill up that context window the worse it gets you can see that right here claud code 1 million

00:06:23context window at 256k tokens aka i've only filled up about a quarter of its context window we're at

00:06:3092 by the end i'm at 78 so the more you use it in the same chat the worse it gets and that's one of

00:06:36the primary issues people have with ai systems and memory i have claud code it has a million context

00:06:42now and yet i do not want it to forget about the conversation i'm having so i just never exit the

00:06:47window i just fill it up and fill it up and fill it up and two things happen one effectiveness goes

00:06:51down like you just saw two your usage fills up a ton because the amount of tokens that are used at

00:06:591 million that 800,000 you know context is way more than an 80,000 context so this isn't the only issue

00:07:08but kind of off topic we're in a current ecosystem where everyone complains about claud code being

00:07:12nerfed and my usage just gets run up automatically there's a number of reasons for that but one of

00:07:18them undoubtedly is the fact that since 1 million context got introduced people have no clue how to

00:07:24manage their own context window and they aren't nearly they aren't nearly as aggressive with

00:07:29clearing and resetting the conversation as authentication but that's kind of off topic

00:07:34the point of that whole discussion is that when it comes to memory in this discussion about rag and

00:07:39claud code we have to keep context rot in the back of our mind because we're constantly trying to deal

00:07:44with this tension of okay i want to ingest context so claud code can answer questions about a number

00:07:50of things yet at the same time i don't want the context to get too large because then it's worse

00:07:55so we just that always needs to be something we're thinking about in this conversation about memory

00:08:02but to bring this back to the actual video and level one what are people doing at level one the

00:08:06answer is they're not really doing anything and because they're not doing anything they just rely

00:08:10on a bloated context window to remember things so you know you're here when you've never edited

00:08:15a claud.md file and you've never created any sort of artifact or any sort of file that allows claud

00:08:23code to realize what the heck is going on what it's actually done in the past and what it needs

00:08:27to do in the future so what do we need to master at this level well really all you really need to

00:08:31master despite everything i wrote here is you just need to understand that auto memory isn't enough

00:08:35and we need to take an active role when it comes to claud code and memory because a trap at this level

00:08:40if you don't take an active role you you have no control and we need to control what claud code

00:08:44considers when it answers our questions and so to unlock level one and move on to level two

00:08:50we need memory that's explicit and we need to figure out how to actually do that what files do

00:08:57you need to edit and understand that they even exist in order to take an active role in this

00:09:01relationship now level two is all about one specific file and that is the claud.md file when you learn

00:09:06about this thing it feels like a godsend finally there is a single place where i can tell claud

00:09:12code some rules and conventions that i always wanted to follow and it's going to do it and in

00:09:16fact i can include things that i wanted to remember and it always will and it definitely feels like

00:09:20progress at first so here's a template of a standard claud.md file for a personal assistant project now

00:09:29claud code is going to automatically create a claud.md file but you have the ability to

00:09:33edit this or even update it on demand by using a command like forward slash init and the idea of

00:09:38this thing is is it is again like the holy grail of instructions for claud code for that particular

00:09:43project for all intents and purposes claud code is going to take a look at this before any task it

00:09:50executes so if you want it to remember specific things what are you going to do you're going to

00:09:54put it in the claud.md theoretically it's a bit of smaller scale in something like rag you know we

00:10:00aren't putting in you know complete documents in here but it's things you want claud code to

00:10:05always remember and conventions you want it to follow so for this one we have an about me section

00:10:09we have a breakdown of the structure of the file system and how we want it to actually operate when

00:10:14we give it commands and like i said because this is referenced on essentially every prompt claud code

00:10:18is really good at following this so the idea of like hey i wanted to remember specific things this

00:10:22seems like a great place to put it but we got to be careful because we can overdo it when we look at

00:10:28studies like this one evaluating agents.md and you can swap agents.md for claud.md

00:10:33they found in the study that these sort of files can actually reduce the effectiveness of large

00:10:40language models at large and why is that well it's because the thing that makes it so good the fact

00:10:45that it's injected into basically every prompt is what also can make it so bad are we actually

00:10:51injecting the correct context have we pushed through the noise and are we actually giving it a proper

00:10:57signal or are we just throwing in things that we think are good because if it isn't relevant to

00:11:02virtually every single prompt that's going to do in your project should it be here in the claud.md

00:11:08is this a good way to let claud code remember things i would argue no not really and that goes

00:11:15contrary to what a lot of people say about claud.md and how you should structure it based on studies

00:11:20like that and based on personal experience less is more context pollution is real context rot is real

00:11:26so if something is inside of claud.md and it doesn't make sense for again virtually every

00:11:32single prompt you give it should it be in there the answer is no but most people don't realize that and

00:11:37instead they fall into this trap of a bloated rulebook instead the skills we should be mastering

00:11:42are how do we create project context that is high signal how do i make sure what i'm actually putting

00:11:48inside this thing makes sense and with that comes the idea of context-rot awareness like we talked

00:11:53about in the last level and you take all that together and level two feels like you've been

00:11:57moving forward like hey i'm taking an active role in memory i have this claud.md file you realize

00:12:02it's not really enough and when we talk about level three and what we can do to move forward there

00:12:08we want to think about sort of not a static rulebook but something that can evolve and it's

00:12:14something that can include claud.md instead of relying on claud.md to do everything what if we

00:12:18use claud.md as sort of like an index file that points claud code in the right direction instead

00:12:24so what did i mean about claud.md acting as sort of an index and pointing towards other files

00:12:30well i'm talking about a architecture within your code base that doesn't just have one markdown file

00:12:37trying to deal with all the sort of memory issues in the form of claud.md i'm talking about having

00:12:41multiple files for specific tasks i think a great example of this in action is sort of what gsd the

00:12:47get shit done orchestration tool does it doesn't just create one file that says hey this is what

00:12:53we're going to build and these are the requirements and this is what we've done and where we're going

00:12:56instead it creates multiple you can see over here on the left we have a project.md a requirements.md

00:13:02a roadmap in a state so the requirements exist so claud code always knows and has memory of

00:13:08what it's supposed to be building the roadmap breaks down what exactly we are going to be

00:13:12creating not just now but what we've done in the past and in the future and the project gives it

00:13:16memory gives it context of what we are doing at a high level overview what is our north star and by

00:13:22breaking up memory and context and conventions in this sort of system we're fighting against the idea

00:13:29of context raw and the idea brought up in that study which is injecting these files into every

00:13:34prompt all the time like we do in claud.md it's actually counterintuitive it doesn't help us get

00:13:39better outputs furthermore breaking it down into these chunks and having a clear path for claud

00:13:44code to go down and says like hey i want to figure out where this information is oh i go to claud.md

00:13:49oh claud.md says these are my five options okay here's that one let me go and find it

00:13:54that sort of structure is what you're going to see 100 in the follow-on level when we talk about

00:13:58obsidian and really is sort of like a crude reimagining of the chunking system and the

00:14:04vector similarity search that we see in true rag systems but obviously this is kind of small scale

00:14:10at this level we're talking about four markdown files here we're not talking about a system that

00:14:14can handle thousands and thousands and thousands of documents but like you're going to hear me talk

00:14:20about a lot what does that mean for you do you need a system that we're going to talk about levels four

00:14:26five six seven that can handle this many documents the answer is maybe not and so part of this rag

00:14:32journey is understanding not just where you stand but like where do you actually need to go do you

00:14:36always need to be at level seven and know how to do an agentic rag system side of claud code it's

00:14:41probably good to know how to do it but it's also just as good to know when you don't need to

00:14:46implement that sometimes what we see in these systems like this is enough for a lot of people

00:14:52so it's just as important to know how to do it and to know like do you need to should you do it

00:14:58when we talk about level three and we talk about state files how do we know we're here

00:15:00well we know we're here when we're still strictly inside the claud code ecosystem we have an

00:15:04integrated outside tools or applications and really we're just at the place where we're just creating

00:15:09multiple markdown files to create our own homemade sort of like memory chunking system

00:15:14but this still is really important we're still mastering some true skills here the idea of like

00:15:18actually structuring docs having some sort of system in place that updates state at every

00:15:23session because this is can be a problem with rag too like how do you make sure everything is up to

00:15:28date and chances are you're also starting to lean into orchestration layers at this point things like

00:15:33gst and superpowers that do things like this this multi markdown file architecture on their own but

00:15:40there is a real trap here what we create in this project is very much just for that project it's

00:15:46kind of clunky to then take those markdown files and shift them over to another project so level

00:15:51four is where we bring in obsidian and this is a tool that has been getting a ton of hype

00:15:56and for good reason when you have people like andre karpathy talking about these

00:16:00llm knowledge bases they've created which are built for all intents and purposes on an obsidian

00:16:06foundation it's getting almost 20 million views we should probably listen and see how this is actually

00:16:11operating now for context i've done a full deep dive on this obsidian andre karpathy llm knowledge

00:16:18base i'll link that above so if you want to focus on that how to build that make sure you check that

00:16:22out above and what i also want to mention to most people is that this obsidian thing we're going to

00:16:27talk about right here in level four this is honestly the level most people should strive

00:16:32for because this is enough for most people in most use cases when we talk about levels five six and

00:16:37seven we're going to talk about true rag structures and to be honest it's overkill for most people this

00:16:43is overkill for most people like we love talking about rag like it's great i understand that but

00:16:50obsidian is that 80 solution that in reality is like a 99 solution for most people because it's free

00:16:56there's basically no overhead and it does the job for the solo operator and when i say it does a job

00:17:02for the solo operator i mean it solves the problem of having clod code connected to a bunch of

00:17:07different documents a bunch of different markdown files and being able to get accurate timely

00:17:13information from it and having insight to those documents as the human being because when i click

00:17:19on these documents it's very clear what is going on inside here and it's very clear what documents

00:17:24are related to it when i click these links i'm brought to more documents when i click these links

00:17:30i'm brought to more documents and so for me as the human being having this insight is important

00:17:36because to be totally honest the obsidian based insight to the documents i would argue trumps

00:17:42a lot of the insight you get from the rag systems when we talk about thousands of thousands of

00:17:47documents being embedded in something like a grav rag system like this looks great visually

00:17:52looks very stunning do you actually know what's going on inside here maybe you do to be honest

00:17:58you're kind of just relying on the answers you get that will show and the links and stuff but it's a

00:18:03bit hard it's like piece through the embeddings for sure all that to say is you should pay special

00:18:08attention to obsidian and clod code because when we talk about this journey from rag i always suggest

00:18:13to everybody clients included like let's just start with obsidian and see how far we can scale this and

00:18:20eventually if we do hit a wall you can always transition to more robust rag systems so why not

00:18:26try the simple option if it works great it's free cost me no money versus like let's try to knock out

00:18:31this rag system which can be kind of difficult to put into production depending on what you're trying

00:18:35to do like always start with the simple stuff it's never too hard to transition to something more

00:18:40complicated so what are we really talking about here in level four what we're talking about taking

00:18:45sort of that structure we began to build in level 3 you know with an index file pointing at different

00:18:50markdown files and just scaling that up and then bringing in this outside tool obsidian to make it

00:18:56easy for you the human being to actually see these connections and the platonic ideal of this version

00:19:00is pretty much what andre karpathy laid out and building a llm knowledge base on top of obsidian

00:19:05and powered by clod code and what that looks like is a structure like this so when you use obsidian

00:19:11and you download it's completely free again reference that video i posted earlier you set a

00:19:16certain file as the vault think of the vault as sort of like the rag system this this quasi rag

00:19:23system you've created and inside of the vault we then architect that we structure that just with

00:19:30files so we have the overarching file called the vault and inside that vault we create multiple

00:19:36subfolders in andre karpathy's case he talks about three different subfolders the reality is they

00:19:41could be any subfolders it just sort of needs to match the theme we're going to talk about in one

00:19:47folder we have the raw data this is everything we are ingesting and eventually want to structure so

00:19:52that clod code can reference it later think of you know you have clod code do competitive analysis on

00:19:5850 of your competitors and it pulls 50 sites for each right we're talking about a large amount of

00:20:03information it's probably 2500 different things all that will get dumped into some sort of raw folder

00:20:08this is like the staging area for the data we then have the wiki folder the wiki folder is where the

00:20:14structured data goes so we then have clod code take this raw data and structure it into essentially

00:20:20different like wikipedia type articles inside of the wiki folder each article gets its own folder so

00:20:28the idea being when you then ask clod code information about you know let's say we had it

00:20:33search for stuff about ai agents and i say hey clod code talk to me about ai agents the same way

00:20:38you would query a rag system well clod code is going to go to the vault from the vault it's going

00:20:45to go to the wiki the wiki has a master index markdown file think of sort of what we were doing

00:20:50with talked about doing with clod.md before right you see how these sort of themes transition

00:20:56throughout the different levels it takes a look at that master index the master index tells it what

00:21:00exists in the subsidy and rag system oh ai agents exist cool guess what's going on down here it also

00:21:08has an index file which talks about the individual articles that exists what am i saying here i am

00:21:14saying there is a clear hierarchy for clod code to reference when it wants to find information about

00:21:21files vault wiki index article etc so because it is so clear how to find information also why it's so

00:21:31clear to first find information and turn it into wiki we can create a system that has a lot of

00:21:37documents without rag hundreds thousands if you do this properly because if the system is clear hey i

00:21:44check the vault and i check the index and that has a clear delineation of like where everything is well

00:21:50then it's not too hard for clod code to figure out where to find stuff and so you can get away with a

00:21:54non-rag structure for thousands of documents and it's been really hard to do that in the past and

00:21:58that's because most people don't structure anything with any sort of structure they just have a billion

00:22:02documents sitting in one folder it's the equivalent of having 10 million files strewn across the factory

00:22:08floor i mean like well clod code find it like no you actually just need a filing cabinet like clod

00:22:13code is actually pretty smart and you can see that architecture in action right here so right now we're

00:22:17looking at a clod.md file that is in an obsidian vault and what does it say well breaks down the

00:22:24vault structure the wiki system you know the overall structure of the subfolders in how to

00:22:30essentially work it right so again we're using clod md as a conventions type file over here on the left

00:22:36you can see the wiki folder inside the wiki folder is a master index and it lists what is inside of

00:22:43there in this case there's just one article it's on clod managed agents inside that folder we see

00:22:49clod managed agents it has its own wiki folder breaking down the articles inside until you get

00:22:55to the actual article itself so very clear the steps it needs to take and so when i tell clod code

00:23:01talk to me about the managed agents we have a wiki on it it's very easy for it to search for it via

00:23:06its built-in grep tool it links me the actual markdown file and then breaks down everything

00:23:12that's happening now the question at level four really becomes a level of scale how many documents

00:23:16can we get away with where this sort of system continues to work is there a point at which andre

00:23:22karpathy's system begins to fall apart where hey like i get it it's a very clear path that clod

00:23:26code needs to follow it goes to the indexes yada yada yada does that sustain itself at

00:23:312 000 documents 2 500 3 000 is there a clear number the answer is we don't really know and there is an

00:23:37earlier number because all your documents are also different and in terms of hitting a wall it isn't

00:23:43just as simple as well clod codes giving us the wrong answers it has too many files in the

00:23:47subsidian system how much is it costing you in terms of tokens now that we've added so many files and how

00:23:52quickly is it doing it because rag can actually be infinitely faster and cheaper in certain situations

00:23:59what we're looking at here is a comparison between textual llms right in the giant bars and textual

00:24:06rag in terms of the amount of tokens it took to get the correct answer and the amount of time it

00:24:11took to get that answer what do we see here we see that textual rag versus textual llms there's a

00:24:18massive difference the tune of like 1200 times i'm saying rag is 1200 times cheaper and 1200

00:24:25times faster than textual llm in these studies now context this was done in 2025 this is not done with

00:24:33clod code these models have changed significantly since then these are just straight up llms this

00:24:37isn't a coding artist etc etc etc however we were talking a 1200 x difference so when we're evaluating

00:24:48hey is obsidian what i should be doing versus is should i be doing rag system it isn't as simple as

00:24:54just well it's giving the right answer or not because you could be you could have a scenario

00:24:59where you get the right answer with obsidian yet if you went to rag it's a thousand times cheaper

00:25:04and faster right so it's this very fuzzy line between when is obsidian good enough and these

00:25:10sort of like just markdown file architecture is good enough for when like we need to use rag

00:25:15there's not a great answer i don't have a great answer for you the answer is you have to experiment

00:25:18and you need to try both and see what works because this is frankly out of date totally like 2025 older

00:25:25models the difference between rag and textual llms is not 1200 times but how much has that gap shrunk

00:25:32because that is an insane gap that isn't like 10x it's 1200x so there's a lot you have to know and

00:25:39again you you won't know the answer ahead of time you just won't watch every video you want

00:25:45no one's going to tell you where that line in the sand is you literally just need to experiment

00:25:49and see what works for you as you increase the amount of documents you're asking clod code

00:25:54to answer questions about so on that note let's move on to level five which is where we finally

00:25:59begin to talk about real rag systems and talk about some of the rag fundamentals like embeddings

00:26:04vector databases and how data actually flows through a system when it becomes part of our

00:26:10rag knowledge base so let's begin by talking about naive rag which is the most basic type of rag out

00:26:16there but it provides the foundation for everything else we do now you can kind of think of rag systems

00:26:21being broken out into three parts on the left hand side we have the embedding stage we then

00:26:27have the vector database and then we have the actual retrieval going on with the large language

00:26:33model so one two and three and to best illustrate this model let's start with sort of the journey of

00:26:40a document that is going to be part of our knowledge base remember in a large rag system we could be

00:26:45talking about thousands of documents and in each document could be thousands of pages but in this

00:26:50example we have a one-page document that we're talking about now if we want to add this document

00:26:56to our database the way it's going to work is it's not going to be ingested as a whole unit instead we

00:27:03are going to take this document and we are going to chunk it up into pieces so this one pager

00:27:08essentially becomes three different chunks these three chunks are then sent to an embedding model

00:27:15and the job of the embedding model is to take these three chunks and turn it into a vector

00:27:21in a vector database now a vector database is just a different variation of your standard database

00:27:27when we talk about a standard database think of something like an excel document right you have

00:27:32columns and you have rows well in a vector database it's not two-dimensional columns and rows it's

00:27:37actually hundreds if not thousands of dimensions but for the purposes of today just think of a

00:27:43three-dimensional graph like you see here and the vectors are just points in that graph and each

00:27:50point is represented by a series of numbers so you can see here we have bananas and bananas is

00:27:57represented by 0.52 5.12 and then 9.31 you see that up here now that continues for hundreds of numbers

00:28:06now where each vector gets placed in this giant multi-dimensional graph depends on its semantic

00:28:13meaning what what do the words actually mean so you can see over here this is like the the fruit

00:28:19section we have bananas we have apples we have pears over here we have ships and we have boats

00:28:24so going back to our document let's imagine that this document is about world war ii ships

00:28:31so each of these chunks is going to get turned into a series of numbers and those series of numbers

00:28:37will be represented as a dot in this graph where do you think it's going to go well they'll probably

00:28:42go around this area right so that would be one two and three so that's how documents get placed every

00:28:49document is going to get chunked each chunk goes through the embedding model and the embedding model

00:28:54inserts them into the vector database repeat repeat repeat for every single document and in the end

00:28:58after we do that several thousand times we get a vector database which represents our knowledge

00:29:04graph so to speak our our knowledge base and that moves us on to step three which is the retrieval

00:29:09part so where do you play into this well normally let's let's depict you well we'll give you a

00:29:16different color you can be you get to be pink so this is you all right you normally just talk to

00:29:23claud code and you ask claud code questions about world war ii battleships well in your standard

00:29:29non-rag setup what's going to happen well the large language model opus 4.6 is going to take a

00:29:34look at its training data and then it's going to give you an answer based on its training data

00:29:39information about world war ii battleships but with a rag system it's going to do more it's going to

00:29:44retrieve the appropriate vectors it's going to use those vectors to augment the answer it generates

00:29:51for you hence retrieval augmented generation that's the power of rag it allows our large language

00:29:56models to pull in information that is not a part of its training data to augment its answer in this

00:30:02example world war ii battleships yes i understand the large language model already knows that but

00:30:06replace this with any sort of proprietary company data that isn't just available for the web and do

00:30:15it at scale that's the cell for rag now in our example when we ask claud code for questions for

00:30:21information about world war ii battleships and it's in a rag setup what it's going to do is it's going

00:30:25to take our question and it's going to turn our question into a series of numbers similar to the

00:30:32vectors over here it is then going to take a look at what the number is for our question and the numbers

00:30:39of the vectors and it's going to see which of these vectors most closely matches the questions vector

00:30:46right how similar are the vectors to the question pretty much and then it's going to pull a certain

00:30:51amount of vectors whether that's one two three four or five or ten or twenty and it's going to pull

00:30:56those vectors and their information into the large language model so now the large language model has

00:31:02its training data answer plus say 10 vectors worth of information right that was the retrieval part

00:31:09and then it augments and generates an answer with that additional information and that is how rag

00:31:13works that is how naive rag works now this is not particularly effective for a number of reasons this

00:31:19very basic structure kind of falls apart at the beginning when we begin to think about okay how

00:31:25are we chunking up these documents is it random is it just off a pure number of tokens do we have

00:31:31a certain number of overlap are the documents themselves set up in a way where it even makes

00:31:36sense to chunk them because what if you know chunk number three is referencing something in chunk

00:31:42number one and then our vector situation when we pull the chunks what if it doesn't get the right

00:31:47one what if it doesn't get that other chunk that's required as context even makes sense what number

00:31:53three says you get what i'm saying like very often the entire document itself is needed to answer

00:31:59questions about said documents so this idea of getting these piecemeal answers doesn't really

00:32:05work in practice yet this is how rag was set up for a long long time other issues that can come into

00:32:10play are things like what if i have questions about the relationships between different vectors because

00:32:17right now i kind of just pull vectors in a silo but what if i wanted to know how boats related to

00:32:22bananas sounds random but what if i did you know this standard sort of vector database naive rag

00:32:31approach everything's kind of in a silo it's hard to connect information and a lot of it just depends

00:32:36on how well those original documents are even structured are they structured in a manner that

00:32:41makes sense for ragging now over the years we've come up with some ways to alleviate these issues

00:32:46things like re-rankers or ranking systems that take a look at all the vectors we grab and essentially

00:32:51then do another pass on them with a large language model to rank them in terms of their relevance but

00:32:56by and large this naive rag system has kind of fallen out of vogue yet it's still important to

00:33:03understand how this works at a foundational level so it can inform your decisions if you go for a

00:33:07more robust rag approach because if you don't understand how chunking or embeddings even work

00:33:13how can you make decisions about how you should structure your documents when we talk about

00:33:17something like graphrag or we talk about more complicating embedding systems like the brand

00:33:22new one from google which can actually ingest not just text but videos and if you don't understand

00:33:27this sort of foundation it's hard for you to actually understand this trap and the trap is that

00:33:31we've kind of just created a crappy search engine because with these naive rag systems where all we

00:33:36do is grab chunks and we can't really understand the relationships between them how is that different

00:33:42from basically just having an over complicated control f system the answer there's really not

00:33:48much of a difference which is why in these simple when in the simplistic kind of outdated rag

00:33:54structures that actually are still all over the place if you see someone who's like oh here's my

00:33:58pine cone rag system or here's my super base racks and they don't mention anything about graphrag

00:34:03or they don't mention anything about like hey here's how we have like the sophisticated re-ranker

00:34:07system and these it's gonna suck to the tune of like oh the actual effectiveness of this is like

00:34:1225% of the time you get something right like you're almost better guessing so if you don't know that

00:34:18going in you can definitely be sort of hoodwinked or confused or in some cases like basically scammed

00:34:23into buying these rag systems that do not make sense and so level five isn't about implementing

00:34:28these sort of naive rag systems it's about understanding how they work so that you when it

00:34:34comes time to implement something more sophisticated you actually understand what's going on because

00:34:38that five-minute explanation of rag is sadly not something most people understand when they say i

00:34:43need a rag system well do you because you also have to ask yourself what kind of questions are you

00:34:48actually asking about your system if you're just asking you know essentially treating your knowledge

00:34:54base as a giant rule book and you just need specific things from that knowledge system

00:34:59brought up well then obsidian is probably enough or you could probably even get away with a naive

00:35:02rag system but if we need to know about relationships if we need to know about how x interacts with y and

00:35:09they're two separate documents they never even really mention each other and it's not something

00:35:13i can just stick inside the context directly because i have thousands of said documents well that is

00:35:19where when you're going to need rag and that's when you're going to need something more sophisticated

00:35:23than basic vector rag that is when we need to start talking about graphrag so when we talk about level

00:35:29six of clawed code and rag we're talking about graphrag and we're talking about this and in my

00:35:34opinion if you are going to use rag this is sort of the lowest level of infrastructure you need to

00:35:39create this is using light rag which is a completely open source tool i'll put a link above where i

00:35:44break down exactly how to use it and how to build it but the idea of graphrag is pretty obvious it's

00:35:50the idea that everything is connected this isn't a vector database with a bunch of vectors in a silo

00:35:55this is a bunch of things connected to one another right i click on this document i can see over here

00:36:00on the right and i'll move this over you know the description of the vector the name the type the

00:36:05file the chunk and then more importantly the different relationships and this relationship

00:36:10based approach results in more effective outcomes here is a chart from light rags github this is

00:36:15about i would say six to eight months old and also of note light rag is the lightest weight graphrag

00:36:23system out there that i know of there's some very robust versions including graph rag itself from

00:36:30microsoft it's a graph it's literally called a graph rag but when we compare naive rag the light rag

00:36:35across the board we get jumps of oftentimes more than 100 percent right 31.6 versus 68.4

00:36:4324 versus 76 24 versus 75 on and on and on and that being said according to the light rag it

00:36:49actually holds its own and beats out graph rag itself but hey these are light rags numbers so

00:36:54taken with a grain of salt now when we look at this knowledge graph system right away your mind

00:36:58probably goes to obsidian because this looks very similar however what we're looking at here in

00:37:04obsidian is way more rudimentary than what's going on inside of light rag or any graph rag system

00:37:10because this series of connections we see here this is all manual and somewhat arbitrary it's only

00:37:16connected because we set related documents where clog code set related documents when it generated

00:37:22this particular document for example just added a couple brackets boom that document's connected

00:37:27so in theory i could connect a bunch of random documents that in reality have nothing to do with

00:37:30one another now because clod code isn't stupid it's not going to do that but that's a lot different

00:37:35than what went on here like this went through an actual embedding system it looked at the actual

00:37:41content it set a relationship it sent an entity there's a lot more work going on here inside of

00:37:46light rag in terms of defining the relationships than obsidian now does that difference actually

00:37:52equate to some wild gap in terms of the performance at a low level though at a huge scale maybe again

00:38:02we're in sort of that gray area kind of depends on your scale and what we're actually talking about

00:38:07and nobody can answer that question except you and some personal experience but understand these two

00:38:13things are not the same we are not the same brother two totally different systems one is pretty

00:38:20sophisticated one's pretty rudimentary understand that and so to wrap up level six in graph rag

00:38:26we're really here when we're when we've decided hey stuff like obsidian isn't working we can't use

00:38:31something like naive right because it just doesn't work and we need something that can extract entities

00:38:36and relationships and really leverage the sort of hybrid vector plus graph query system design

00:38:43but there are some traps there are some serious roadblocks even here at level six when we talk

00:38:48about light rag this is just text what if i have scannable pdfs what if i have videos what if i have

00:38:55images we don't live in a world where all your documents are just going to be google docs and

00:39:01so what do we do in those instances so multimodal retrieval is a huge thing and on top of that what

00:39:06about bringing some more agentic qualities to these systems give it a little more ai power some sort of

00:39:11boost in that department well if we're talking about things that are multimodal then we can finally move

00:39:17to sort of like the bleeding edge of rag in today's day and age as of april 2026 that's what level 7 is

00:39:24all about now when we talk about level 7 in agentic rag the big thing we kind of want to index on here

00:39:31is things that have to do with multimodal ingestion now we've done videos on these things things like

00:39:36rag anything which allow us to import images and non-text documents again think scannable pdfs

00:39:44into structures like the light rag knowledge graph you saw here we also have new releases like gemini

00:39:49embedding too which just came out in march which allows us to actually embed videos into our vector

00:39:56database videos itself and this is frankly where the space is going it's not enough to just do text

00:40:01documents how much information how much knowledge is trapped on the internet especially on places

00:40:06like youtube we're just purely video and we want more than just a transcript as well a transcript

00:40:10doesn't do enough so this sort of multimodal problem is real and again this is stuff that

00:40:16just came out weeks ago and level 7 is also where we need to start paying attention to our

00:40:20architecture and pipelines when it comes to the data going in and out of our rag system it's not

00:40:25enough to just get data in here like this is great you know okay we have all these connections and

00:40:30stuff how does the data getting there how is the data getting there in the context of a team how

00:40:35is data getting out of there like what if some of the information here has changed in a particular

00:40:40document what if somebody edits it how does it get updated what if we add duplicates who can actually

00:40:46put these things in there when it comes to production level stuff these are all questions

00:40:50you need to begin to ask yourself and so when we look at an agentic rag system like this one from

00:40:54n8n you can see the vast majority of the infrastructure everything outlined here is all about

00:41:01data ingestion and data syncing there's only a very small part that has anything due to rag which is

00:41:06right there because we need systems that clean up the data that are able to look at okay we just

00:41:11ingested this document in fact this was version 2 of version 1 can we now go back and clean that data

00:41:17here's something like a data ingestion pipeline where documents don't get directly put into the

00:41:21system or in light rag we instead put it inside of like a google drive and from there it gets ingested

00:41:27into the graph rag system and logged these are the sort of things that will actually make or break

00:41:31your rag system when you're using it for real and when we talk about agentic rag you can see here and

00:41:37i know this is rather blurry but if we have an ai agent running this whole program so you set up

00:41:42imagine some sort of chatbot for your team does it always need to hit this database the answer is

00:41:49probably not chances are in a team setting in a business setting you're going to have information

00:41:54that's in a database like this like text or something but you probably also have another set

00:41:58of databases like just standard postgres databases with a bunch of information you want to query

00:42:03with sql as well so when we talk about an agentic rag system we need something that has all of that

00:42:08the ability to intelligently decide oh am i going to be hitting the graph rag database represented

00:42:15here or am i just going to be doing some sort of sql queries in postgres these things can get

00:42:20complicated right and all of this is use case dependent which is why it's kind of hard to

00:42:23sometimes make these videos and try to hit every single edge case the point here at level 7 is not

00:42:30that there's necessarily some super rag system you've never heard of it's that you're actually

00:42:34the devil's in the details here and that's really mostly the data ingestion piece and keeping it up

00:42:39to date but also like how do you actually access this thing easy to do in a demo right here oh we

00:42:46just go to the light rag thing and i go to retrieval and i ask it questions different scenario when

00:42:50we're talking about it with a team and everyone's approaching it from different angles and you

00:42:55probably don't want everyone to have access to actually uploading it to light rag itself on a

00:43:01web app that being said for the solo operator who is trying to create some sort of sophisticated rag

00:43:07system that is able to do multimodal stuff i would suggest the rag anything plus light rag combination

00:43:14i've done a video on that and if i don't link that already i'll link it above i suggest that for a few

00:43:19reasons one it's open source and it's lightweight so it's not like you're spending a bunch of money

00:43:26or time to spin something like this up to make sure it actually makes sense for your use case

00:43:31again the the thing we want is we don't want to get stuck in systems where there's no way out and

00:43:37we spent a bunch of money to get there which is why i do love obsidian and i always recommend things

00:43:42like light rag and rag anything because hey if you try this out it doesn't work for you it doesn't

00:43:45make sense okay whatever you wasted a handful of hours you know it's not like you are spending a

00:43:50bunch of money on microsoft's graph rag which is in no in no way is cheap and so when do you know

00:43:56you're in level 7 really multimodal stuff like you need to index images tables and videos and you're

00:44:02integrating some sort of agent system where it can intelligently decide like which path it goes down

00:44:06to answer information because at level 7 you're probably integrating all this stuff you probably

00:44:12have a claud md file with some permanent information you probably have it in a code base with some mark

00:44:16down files that sort of makes sense for easy retrieval perhaps you're also including obsidian

00:44:20it's in some sort of vault plus you probably have some section of documents that are in a graph rag

00:44:25database and you have a top of the funnel ai system that can decide they ask this question i go down

00:44:33this route that's a mature sort of memory architecture that i would suggest but what's the trap here the

00:44:40trap honestly is trying to force yourself into this level and this sort of sophistication when it's

00:44:47just not needed to be honest after all this most of you are fine with obsidian there's more than enough

00:44:52you don't need graph rag you really don't need rag in general and if it's not obvious that you

00:44:57need level 7 and certainly if you haven't already tried the obsidian route you don't need to be here

00:45:01it's probably a waste of your time but the whole point of this video was to the best of my ability

00:45:07was to expose you to what i see is the different levels of rag and memory and clod code and what

00:45:12this problem is what some of the tensions are what the trade-offs are and where you should probably be

00:45:18for your use case and again the biggest thing is just experiment you don't have to know the answer

00:45:24before you get into this just try them out and i would try in ascending order if you can get away

00:45:28with just mark down files in a clod system and it's basically just clod.md on steroids sweet go ahead

00:45:34and then try obsidian if obsidian is not enough try light rag and so on and so forth so that is

00:45:39where i'm going to leave you guys for today if you want to learn more especially about the production

00:45:43side of rag like how to spin this up for a team or package it for a client we have a whole module

00:45:47on that inside of chase ai plus so check that out other than that let me know what you thought

00:45:52i know this was a long one and i will see you around

Key Takeaway

Progressing through seven levels of memory architecture—from native auto-memory to agentic multimodal RAG—mitigates context rot and reduces costs, though a structured Obsidian vault remains the most efficient 80% solution for solo operators.

Highlights

RAG systems can be 1200 times cheaper and faster than standard large language models by retrieving only relevant vectors instead of processing massive context windows.
Context rot significantly degrades performance once 256k tokens are reached in a 1 million context window, dropping accuracy from 92% to 78%.
LightRAG improves performance by over 100% compared to naive RAG by mapping entity relationships rather than treating text chunks as isolated silos.
A hierarchical folder structure in Obsidian using a 'Wiki' index allows Claude Code to navigate thousands of documents effectively without complex vector databases.
Gemini 1.5 Pro and 2.0 models now support multimodal embedding, allowing AI systems to ingest and query raw video files alongside text-based documentation.

Timeline

Native Auto-Memory and the Context Rot Problem

Claude Code automatically creates markdown files in a hidden memory folder to track user goals and project states.
Relying on a 1 million token context window leads to context rot where AI effectiveness drops as the session grows.
Users often avoid clearing terminal sessions due to a fear of the AI forgetting previous conversation context.

Native memory systems function like digital post-it notes but lack the sophistication needed for complex engineering tasks. Accuracy benchmarks show a 14% performance decline as the context window fills up. Effective workflows require an active role in managing context rather than letting the session bloat indefinitely.

Optimizing Project Instructions with claud.md

The claud.md file serves as a global instruction set that the AI references before every executed task.
Injecting too much irrelevant information into claud.md creates noise that reduces the model's focus on specific prompts.
High-signal project context focuses only on conventions and rules that apply to virtually every single interaction.

Studies on agents.md files reveal that bloated rulebooks can actually hinder performance. A successful configuration uses claud.md as a lean index rather than a repository for every project detail. This level transitions the user from passive memory reliance to intentional context control.

State Management and Multi-File Architectures

Dividing memory into specialized files like project.md, requirements.md, and roadmap.md prevents context pollution.
Orchestration tools like GSD automate the updating of state files at the end of every coding session.
A multi-file architecture mimics a crude version of the chunking systems used in professional RAG setups.

Separating high-level 'north star' goals from granular technical requirements helps the AI maintain focus. This modular approach ensures the most relevant data is available without flooding the prompt with unnecessary detail. It solves the issue of project-specific memory but remains difficult to scale across multiple disparate projects.

The Obsidian Vault as an LLM Knowledge Base

An Obsidian-based vault provides a 99% solution for solo operators by connecting the AI to thousands of structured markdown files.
A hierarchical Wiki structure utilizes master index files to guide the AI through a clear path from raw data to structured articles.
RAG-based retrieval is up to 1200 times faster than reading full documents within a standard LLM context window.

Obsidian serves as a free, low-overhead alternative to expensive vector databases. By using a 'Raw Data' staging folder and a 'Wiki' folder for structured insights, users can maintain human-readable documentation that the AI can easily query via grep tools. This architecture supports thousands of documents while keeping token costs significantly lower than long-context prompts.

Naive RAG Fundamentals and Vector Databases

Naive RAG works by chunking documents, turning them into numerical vectors, and storing them in multi-dimensional space.
The system retrieves relevant text chunks based on semantic similarity to the user's query vector.
Basic vector search often fails to capture the relationships between different documents or sections.

The transition to true RAG involves embedding models that map the semantic meaning of text into a vector database. While this allows for proprietary data ingestion at scale, naive systems act as glorified 'Control+F' tools because they lack relational awareness. Understanding these foundations is necessary before implementing more advanced graph-based systems.

GraphRAG and Relational Deep Linking

GraphRAG connects data points through explicit relationships and entities rather than just spatial similarity.
LightRAG provides a lightweight, open-source framework that outperforms standard naive RAG by over 100% in accuracy metrics.
Relational approaches are essential for answering questions about how different, seemingly unrelated documents interact.

Unlike Obsidian's manual linking, GraphRAG automatically extracts entities and maps their connections during the embedding process. This allows the AI to traverse a network of information to provide more holistic answers. It represents the minimum viable infrastructure for high-scale enterprise knowledge management.

Level 7: Agentic and Multimodal Retrieval

Modern RAG systems utilize multimodal embeddings to ingest scannable PDFs, images, and raw video content.
Agentic RAG architectures use a 'top-of-funnel' AI to decide whether to query a graph database or a SQL database.
The primary challenge at the highest level is maintaining data ingestion pipelines and ensuring data sync across a team.

The bleeding edge of memory involves AI agents that intelligently choose the retrieval path based on the query type. Tools like 'Rag Anything' combined with LightRAG allow for the processing of non-textual data trapped in videos or images. For most users, this level is overkill, but it is necessary for production environments requiring high data integrity and complex multimodal insights.

Community Posts

Write about this video