The 7 Levels of Claude Code & RAG

CChase AI
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Let's solve the problem of clod code and memory getting AI systems to reliably and accurately
00:00:06answer questions about past conversations or giant troves of documents is a problem we have been
00:00:13trying to solve for years and the typical response has been rag retrieval augmented generation and
00:00:20while this video is titled the seven levels of clod code and rag what this video is really about
00:00:26is deconstructing that problem of clod code and really AI systems in general and memory and even
00:00:33more importantly this video is about giving you a roadmap that shows you where you stand in this
00:00:37fight between AI systems and memory and what you can do to get to the next level. So as we journey
00:00:43through these seven levels of clod code and rag we are going to hit on a number of topics but we
00:00:48are not going to start here in graphrag or anything complicated we're going to start at the beginning
00:00:53which is just the basic memory systems that are native to clod code because sad as it is to say
00:00:59this is where most people not only begin but it's where they stay from auto memory and things like
00:01:04clod md we're going to move to outside tools things like obsidian before we eventually find ourselves
00:01:10with the big boys with the true rag systems at these levels we'll talk about what rag actually is
00:01:16how it works the different types of rag naive rag versus graphrag versus agentic rag things like
00:01:21re-rankers and everything in between and at each level we're going to break it down in the same
00:01:25manner we're going to talk about what to expect at that level the skills you need to master the
00:01:29traps you need to avoid and what you need to do to move on to the follow-on level what this video
00:01:34will not be is a super in-depth technical explanation of how to necessarily set up these
00:01:40specific systems because i've already done this in many instances when we talk about graphrag and
00:01:45lightrag for example or even more advanced topics like rag anything in these different sort of
00:01:50embedding systems i've done videos where i break down from the very beginning to the very end how
00:01:55to set that up yourself so when we get to those sections i will link those videos and this is for
00:02:00both our sakes so this video isn't five hours long but for those levels we're still going to talk
00:02:04about what that actually means what each system buys you and when you should be using it but before
00:02:09we start with level one a quick word from today's sponsor me so just last month i released the clod
00:02:15code master class and is the number one way to go from zero to ai dev especially if you don't come
00:02:21from a technical background and this master class is a little bit different because we focus on a
00:02:25number of different use cases to learn how to use clod code one of those is something like production
00:02:31level rag how to build the rag systems you're going to see in this video in a real life scenario and
00:02:37actually use it as a member of a team or sell it to a client that's the kind of stuff we focus on so
00:02:42if you want to get access you can find it inside of chase ai plus there's a link to that in the pinned
00:02:47comment and we'd love to have you there so now let's start with level one and that's auto memory
00:02:51these are the systems that clod code automatically uses to create some sort of memory apparatus to
00:02:58actually remember things that you've talked about and you know you're here if you've never set
00:03:02anything up intentionally to help clod code remember context in general about previous conversations
00:03:09or just stuff that's going on in your code base and when we talk about auto memory that is quite
00:03:13literally what it is called the auto memory system which is automatically enabled when you use clod
00:03:18code essentially allows clod code to create markdown files on its own that sort of list out
00:03:26things it thinks are important about you in that particular project and this is purely based off
00:03:32of its own intuition based on your conversations and i can see these memory files it's created again
00:03:37it does this on its own if you go into your dot clod folder you go into projects you will see a
00:03:42folder there that is called memory and inside that file you will see a number of markdown files here
00:03:47there are four of them and they're like clod codes version of post-it notes saying oh yeah he mentioned
00:03:51this one time about his youtube project growth goals let's write that down and inside of everyone's
00:03:59memory folder there will be a memory.md file so you see in this memory file it has a little note about
00:04:04one of my skills and then it has you know essentially an index of all these sub memory files saying
00:04:09hey there's a youtube growth one in here a revenue one or references one and here's what's inside of
00:04:13it so if i'm just talking to clod code in my vault file and i mention something about youtube and sort
00:04:19of my goals with growth whatever it's going to reference this and say oh yeah chase is trying to
00:04:23get you know x amount of subscribers by the end of 2026. it's cute but ultimately it's not that useful
00:04:30it's kind of like when you're inside of chat gpt and it will bring up random stuff about
00:04:35previous conversations and it almost like shoehorns it in it's like okay i get it you remembered this
00:04:40but i don't really care and honestly it's a little weird to keep bringing that up i prefer if you
00:04:44didn't and unfortunately this is where most people stay in their memory journey and it's built upon a
00:04:49somewhat almost abusive past that we all have when it comes to using these chatbots
00:04:54because these chatbots don't have any sort of real memory from conversation to conversation and so
00:05:00we're always scared to death of having to exit out of a chat window or exit out of a terminal session
00:05:06because you think oh my gosh it's not going to remember my conversation and this is actually a
00:05:10real problem because what is everybody's answer to the chat window not being able to remember anything
00:05:17well the answer is you just keep that conversation going forever because you don't want to get to a
00:05:22scenario where you have to exit out and it forgets everything this is a fear that is born here inside
00:05:26of these chat windows beginning with chat gpt and same thing with claud's web app and honestly used
00:05:31to be infinitely worse with claud's web app because i think we all remember before the days of the 1
00:05:35million context window where you would have like 30 minutes to talk with claud and be like well see
00:05:39you in four hours the issue is people have brought that sort of psychotic neurotic behavior to the
00:05:45terminal and what they do in large part because you now can get away with it with a 1 million context
00:05:50window is they never clear they just keep talking and talking and talking with claud code because
00:05:55they never want it to forget what they're talking about because of these memory problems and the
00:06:00issue with that is your efficiency goes way down over time the more you talk with claud code inside
00:06:05of the same session and this is the fundamental idea of context rod if you don't know what context rod
00:06:10is it's the phenomenon that the more i use an ai system within its same session within its same chat
00:06:16and i fill up that context window the worse it gets you can see that right here claud code 1 million
00:06:23context window at 256k tokens aka i've only filled up about a quarter of its context window we're at
00:06:3092 by the end i'm at 78 so the more you use it in the same chat the worse it gets and that's one of
00:06:36the primary issues people have with ai systems and memory i have claud code it has a million context
00:06:42now and yet i do not want it to forget about the conversation i'm having so i just never exit the
00:06:47window i just fill it up and fill it up and fill it up and two things happen one effectiveness goes
00:06:51down like you just saw two your usage fills up a ton because the amount of tokens that are used at
00:06:591 million that 800,000 you know context is way more than an 80,000 context so this isn't the only issue
00:07:08but kind of off topic we're in a current ecosystem where everyone complains about claud code being
00:07:12nerfed and my usage just gets run up automatically there's a number of reasons for that but one of
00:07:18them undoubtedly is the fact that since 1 million context got introduced people have no clue how to
00:07:24manage their own context window and they aren't nearly they aren't nearly as aggressive with
00:07:29clearing and resetting the conversation as authentication but that's kind of off topic
00:07:34the point of that whole discussion is that when it comes to memory in this discussion about rag and
00:07:39claud code we have to keep context rot in the back of our mind because we're constantly trying to deal
00:07:44with this tension of okay i want to ingest context so claud code can answer questions about a number
00:07:50of things yet at the same time i don't want the context to get too large because then it's worse
00:07:55so we just that always needs to be something we're thinking about in this conversation about memory
00:08:02but to bring this back to the actual video and level one what are people doing at level one the
00:08:06answer is they're not really doing anything and because they're not doing anything they just rely
00:08:10on a bloated context window to remember things so you know you're here when you've never edited
00:08:15a claud.md file and you've never created any sort of artifact or any sort of file that allows claud
00:08:23code to realize what the heck is going on what it's actually done in the past and what it needs
00:08:27to do in the future so what do we need to master at this level well really all you really need to
00:08:31master despite everything i wrote here is you just need to understand that auto memory isn't enough
00:08:35and we need to take an active role when it comes to claud code and memory because a trap at this level
00:08:40if you don't take an active role you you have no control and we need to control what claud code
00:08:44considers when it answers our questions and so to unlock level one and move on to level two
00:08:50we need memory that's explicit and we need to figure out how to actually do that what files do
00:08:57you need to edit and understand that they even exist in order to take an active role in this
00:09:01relationship now level two is all about one specific file and that is the claud.md file when you learn
00:09:06about this thing it feels like a godsend finally there is a single place where i can tell claud
00:09:12code some rules and conventions that i always wanted to follow and it's going to do it and in
00:09:16fact i can include things that i wanted to remember and it always will and it definitely feels like
00:09:20progress at first so here's a template of a standard claud.md file for a personal assistant project now
00:09:29claud code is going to automatically create a claud.md file but you have the ability to
00:09:33edit this or even update it on demand by using a command like forward slash init and the idea of
00:09:38this thing is is it is again like the holy grail of instructions for claud code for that particular
00:09:43project for all intents and purposes claud code is going to take a look at this before any task it
00:09:50executes so if you want it to remember specific things what are you going to do you're going to
00:09:54put it in the claud.md theoretically it's a bit of smaller scale in something like rag you know we
00:10:00aren't putting in you know complete documents in here but it's things you want claud code to
00:10:05always remember and conventions you want it to follow so for this one we have an about me section
00:10:09we have a breakdown of the structure of the file system and how we want it to actually operate when
00:10:14we give it commands and like i said because this is referenced on essentially every prompt claud code
00:10:18is really good at following this so the idea of like hey i wanted to remember specific things this
00:10:22seems like a great place to put it but we got to be careful because we can overdo it when we look at
00:10:28studies like this one evaluating agents.md and you can swap agents.md for claud.md
00:10:33they found in the study that these sort of files can actually reduce the effectiveness of large
00:10:40language models at large and why is that well it's because the thing that makes it so good the fact
00:10:45that it's injected into basically every prompt is what also can make it so bad are we actually
00:10:51injecting the correct context have we pushed through the noise and are we actually giving it a proper
00:10:57signal or are we just throwing in things that we think are good because if it isn't relevant to
00:11:02virtually every single prompt that's going to do in your project should it be here in the claud.md
00:11:08is this a good way to let claud code remember things i would argue no not really and that goes
00:11:15contrary to what a lot of people say about claud.md and how you should structure it based on studies
00:11:20like that and based on personal experience less is more context pollution is real context rot is real
00:11:26so if something is inside of claud.md and it doesn't make sense for again virtually every
00:11:32single prompt you give it should it be in there the answer is no but most people don't realize that and
00:11:37instead they fall into this trap of a bloated rulebook instead the skills we should be mastering
00:11:42are how do we create project context that is high signal how do i make sure what i'm actually putting
00:11:48inside this thing makes sense and with that comes the idea of context-rot awareness like we talked
00:11:53about in the last level and you take all that together and level two feels like you've been
00:11:57moving forward like hey i'm taking an active role in memory i have this claud.md file you realize
00:12:02it's not really enough and when we talk about level three and what we can do to move forward there
00:12:08we want to think about sort of not a static rulebook but something that can evolve and it's
00:12:14something that can include claud.md instead of relying on claud.md to do everything what if we
00:12:18use claud.md as sort of like an index file that points claud code in the right direction instead
00:12:24so what did i mean about claud.md acting as sort of an index and pointing towards other files
00:12:30well i'm talking about a architecture within your code base that doesn't just have one markdown file
00:12:37trying to deal with all the sort of memory issues in the form of claud.md i'm talking about having
00:12:41multiple files for specific tasks i think a great example of this in action is sort of what gsd the
00:12:47get shit done orchestration tool does it doesn't just create one file that says hey this is what
00:12:53we're going to build and these are the requirements and this is what we've done and where we're going
00:12:56instead it creates multiple you can see over here on the left we have a project.md a requirements.md
00:13:02a roadmap in a state so the requirements exist so claud code always knows and has memory of
00:13:08what it's supposed to be building the roadmap breaks down what exactly we are going to be
00:13:12creating not just now but what we've done in the past and in the future and the project gives it
00:13:16memory gives it context of what we are doing at a high level overview what is our north star and by
00:13:22breaking up memory and context and conventions in this sort of system we're fighting against the idea
00:13:29of context raw and the idea brought up in that study which is injecting these files into every
00:13:34prompt all the time like we do in claud.md it's actually counterintuitive it doesn't help us get
00:13:39better outputs furthermore breaking it down into these chunks and having a clear path for claud
00:13:44code to go down and says like hey i want to figure out where this information is oh i go to claud.md
00:13:49oh claud.md says these are my five options okay here's that one let me go and find it
00:13:54that sort of structure is what you're going to see 100 in the follow-on level when we talk about
00:13:58obsidian and really is sort of like a crude reimagining of the chunking system and the
00:14:04vector similarity search that we see in true rag systems but obviously this is kind of small scale
00:14:10at this level we're talking about four markdown files here we're not talking about a system that
00:14:14can handle thousands and thousands and thousands of documents but like you're going to hear me talk
00:14:20about a lot what does that mean for you do you need a system that we're going to talk about levels four
00:14:26five six seven that can handle this many documents the answer is maybe not and so part of this rag
00:14:32journey is understanding not just where you stand but like where do you actually need to go do you
00:14:36always need to be at level seven and know how to do an agentic rag system side of claud code it's
00:14:41probably good to know how to do it but it's also just as good to know when you don't need to
00:14:46implement that sometimes what we see in these systems like this is enough for a lot of people
00:14:52so it's just as important to know how to do it and to know like do you need to should you do it
00:14:58when we talk about level three and we talk about state files how do we know we're here
00:15:00well we know we're here when we're still strictly inside the claud code ecosystem we have an
00:15:04integrated outside tools or applications and really we're just at the place where we're just creating
00:15:09multiple markdown files to create our own homemade sort of like memory chunking system
00:15:14but this still is really important we're still mastering some true skills here the idea of like
00:15:18actually structuring docs having some sort of system in place that updates state at every
00:15:23session because this is can be a problem with rag too like how do you make sure everything is up to
00:15:28date and chances are you're also starting to lean into orchestration layers at this point things like
00:15:33gst and superpowers that do things like this this multi markdown file architecture on their own but
00:15:40there is a real trap here what we create in this project is very much just for that project it's
00:15:46kind of clunky to then take those markdown files and shift them over to another project so level
00:15:51four is where we bring in obsidian and this is a tool that has been getting a ton of hype
00:15:56and for good reason when you have people like andre karpathy talking about these
00:16:00llm knowledge bases they've created which are built for all intents and purposes on an obsidian
00:16:06foundation it's getting almost 20 million views we should probably listen and see how this is actually
00:16:11operating now for context i've done a full deep dive on this obsidian andre karpathy llm knowledge
00:16:18base i'll link that above so if you want to focus on that how to build that make sure you check that
00:16:22out above and what i also want to mention to most people is that this obsidian thing we're going to
00:16:27talk about right here in level four this is honestly the level most people should strive
00:16:32for because this is enough for most people in most use cases when we talk about levels five six and
00:16:37seven we're going to talk about true rag structures and to be honest it's overkill for most people this
00:16:43is overkill for most people like we love talking about rag like it's great i understand that but
00:16:50obsidian is that 80 solution that in reality is like a 99 solution for most people because it's free
00:16:56there's basically no overhead and it does the job for the solo operator and when i say it does a job
00:17:02for the solo operator i mean it solves the problem of having clod code connected to a bunch of
00:17:07different documents a bunch of different markdown files and being able to get accurate timely
00:17:13information from it and having insight to those documents as the human being because when i click
00:17:19on these documents it's very clear what is going on inside here and it's very clear what documents
00:17:24are related to it when i click these links i'm brought to more documents when i click these links
00:17:30i'm brought to more documents and so for me as the human being having this insight is important
00:17:36because to be totally honest the obsidian based insight to the documents i would argue trumps
00:17:42a lot of the insight you get from the rag systems when we talk about thousands of thousands of
00:17:47documents being embedded in something like a grav rag system like this looks great visually
00:17:52looks very stunning do you actually know what's going on inside here maybe you do to be honest
00:17:58you're kind of just relying on the answers you get that will show and the links and stuff but it's a
00:18:03bit hard it's like piece through the embeddings for sure all that to say is you should pay special
00:18:08attention to obsidian and clod code because when we talk about this journey from rag i always suggest
00:18:13to everybody clients included like let's just start with obsidian and see how far we can scale this and
00:18:20eventually if we do hit a wall you can always transition to more robust rag systems so why not
00:18:26try the simple option if it works great it's free cost me no money versus like let's try to knock out
00:18:31this rag system which can be kind of difficult to put into production depending on what you're trying
00:18:35to do like always start with the simple stuff it's never too hard to transition to something more
00:18:40complicated so what are we really talking about here in level four what we're talking about taking
00:18:45sort of that structure we began to build in level 3 you know with an index file pointing at different
00:18:50markdown files and just scaling that up and then bringing in this outside tool obsidian to make it
00:18:56easy for you the human being to actually see these connections and the platonic ideal of this version
00:19:00is pretty much what andre karpathy laid out and building a llm knowledge base on top of obsidian
00:19:05and powered by clod code and what that looks like is a structure like this so when you use obsidian
00:19:11and you download it's completely free again reference that video i posted earlier you set a
00:19:16certain file as the vault think of the vault as sort of like the rag system this this quasi rag
00:19:23system you've created and inside of the vault we then architect that we structure that just with
00:19:30files so we have the overarching file called the vault and inside that vault we create multiple
00:19:36subfolders in andre karpathy's case he talks about three different subfolders the reality is they
00:19:41could be any subfolders it just sort of needs to match the theme we're going to talk about in one
00:19:47folder we have the raw data this is everything we are ingesting and eventually want to structure so
00:19:52that clod code can reference it later think of you know you have clod code do competitive analysis on
00:19:5850 of your competitors and it pulls 50 sites for each right we're talking about a large amount of
00:20:03information it's probably 2500 different things all that will get dumped into some sort of raw folder
00:20:08this is like the staging area for the data we then have the wiki folder the wiki folder is where the
00:20:14structured data goes so we then have clod code take this raw data and structure it into essentially
00:20:20different like wikipedia type articles inside of the wiki folder each article gets its own folder so
00:20:28the idea being when you then ask clod code information about you know let's say we had it
00:20:33search for stuff about ai agents and i say hey clod code talk to me about ai agents the same way
00:20:38you would query a rag system well clod code is going to go to the vault from the vault it's going
00:20:45to go to the wiki the wiki has a master index markdown file think of sort of what we were doing
00:20:50with talked about doing with clod.md before right you see how these sort of themes transition
00:20:56throughout the different levels it takes a look at that master index the master index tells it what
00:21:00exists in the subsidy and rag system oh ai agents exist cool guess what's going on down here it also
00:21:08has an index file which talks about the individual articles that exists what am i saying here i am
00:21:14saying there is a clear hierarchy for clod code to reference when it wants to find information about
00:21:21files vault wiki index article etc so because it is so clear how to find information also why it's so
00:21:31clear to first find information and turn it into wiki we can create a system that has a lot of
00:21:37documents without rag hundreds thousands if you do this properly because if the system is clear hey i
00:21:44check the vault and i check the index and that has a clear delineation of like where everything is well
00:21:50then it's not too hard for clod code to figure out where to find stuff and so you can get away with a
00:21:54non-rag structure for thousands of documents and it's been really hard to do that in the past and
00:21:58that's because most people don't structure anything with any sort of structure they just have a billion
00:22:02documents sitting in one folder it's the equivalent of having 10 million files strewn across the factory
00:22:08floor i mean like well clod code find it like no you actually just need a filing cabinet like clod
00:22:13code is actually pretty smart and you can see that architecture in action right here so right now we're
00:22:17looking at a clod.md file that is in an obsidian vault and what does it say well breaks down the
00:22:24vault structure the wiki system you know the overall structure of the subfolders in how to
00:22:30essentially work it right so again we're using clod md as a conventions type file over here on the left
00:22:36you can see the wiki folder inside the wiki folder is a master index and it lists what is inside of
00:22:43there in this case there's just one article it's on clod managed agents inside that folder we see
00:22:49clod managed agents it has its own wiki folder breaking down the articles inside until you get
00:22:55to the actual article itself so very clear the steps it needs to take and so when i tell clod code
00:23:01talk to me about the managed agents we have a wiki on it it's very easy for it to search for it via
00:23:06its built-in grep tool it links me the actual markdown file and then breaks down everything
00:23:12that's happening now the question at level four really becomes a level of scale how many documents
00:23:16can we get away with where this sort of system continues to work is there a point at which andre
00:23:22karpathy's system begins to fall apart where hey like i get it it's a very clear path that clod
00:23:26code needs to follow it goes to the indexes yada yada yada does that sustain itself at
00:23:312 000 documents 2 500 3 000 is there a clear number the answer is we don't really know and there is an
00:23:37earlier number because all your documents are also different and in terms of hitting a wall it isn't
00:23:43just as simple as well clod codes giving us the wrong answers it has too many files in the
00:23:47subsidian system how much is it costing you in terms of tokens now that we've added so many files and how
00:23:52quickly is it doing it because rag can actually be infinitely faster and cheaper in certain situations
00:23:59what we're looking at here is a comparison between textual llms right in the giant bars and textual
00:24:06rag in terms of the amount of tokens it took to get the correct answer and the amount of time it
00:24:11took to get that answer what do we see here we see that textual rag versus textual llms there's a
00:24:18massive difference the tune of like 1200 times i'm saying rag is 1200 times cheaper and 1200
00:24:25times faster than textual llm in these studies now context this was done in 2025 this is not done with
00:24:33clod code these models have changed significantly since then these are just straight up llms this
00:24:37isn't a coding artist etc etc etc however we were talking a 1200 x difference so when we're evaluating
00:24:48hey is obsidian what i should be doing versus is should i be doing rag system it isn't as simple as
00:24:54just well it's giving the right answer or not because you could be you could have a scenario
00:24:59where you get the right answer with obsidian yet if you went to rag it's a thousand times cheaper
00:25:04and faster right so it's this very fuzzy line between when is obsidian good enough and these
00:25:10sort of like just markdown file architecture is good enough for when like we need to use rag
00:25:15there's not a great answer i don't have a great answer for you the answer is you have to experiment
00:25:18and you need to try both and see what works because this is frankly out of date totally like 2025 older
00:25:25models the difference between rag and textual llms is not 1200 times but how much has that gap shrunk
00:25:32because that is an insane gap that isn't like 10x it's 1200x so there's a lot you have to know and
00:25:39again you you won't know the answer ahead of time you just won't watch every video you want
00:25:45no one's going to tell you where that line in the sand is you literally just need to experiment
00:25:49and see what works for you as you increase the amount of documents you're asking clod code
00:25:54to answer questions about so on that note let's move on to level five which is where we finally
00:25:59begin to talk about real rag systems and talk about some of the rag fundamentals like embeddings
00:26:04vector databases and how data actually flows through a system when it becomes part of our
00:26:10rag knowledge base so let's begin by talking about naive rag which is the most basic type of rag out
00:26:16there but it provides the foundation for everything else we do now you can kind of think of rag systems
00:26:21being broken out into three parts on the left hand side we have the embedding stage we then
00:26:27have the vector database and then we have the actual retrieval going on with the large language
00:26:33model so one two and three and to best illustrate this model let's start with sort of the journey of
00:26:40a document that is going to be part of our knowledge base remember in a large rag system we could be
00:26:45talking about thousands of documents and in each document could be thousands of pages but in this
00:26:50example we have a one-page document that we're talking about now if we want to add this document
00:26:56to our database the way it's going to work is it's not going to be ingested as a whole unit instead we
00:27:03are going to take this document and we are going to chunk it up into pieces so this one pager
00:27:08essentially becomes three different chunks these three chunks are then sent to an embedding model
00:27:15and the job of the embedding model is to take these three chunks and turn it into a vector
00:27:21in a vector database now a vector database is just a different variation of your standard database
00:27:27when we talk about a standard database think of something like an excel document right you have
00:27:32columns and you have rows well in a vector database it's not two-dimensional columns and rows it's
00:27:37actually hundreds if not thousands of dimensions but for the purposes of today just think of a
00:27:43three-dimensional graph like you see here and the vectors are just points in that graph and each
00:27:50point is represented by a series of numbers so you can see here we have bananas and bananas is
00:27:57represented by 0.52 5.12 and then 9.31 you see that up here now that continues for hundreds of numbers
00:28:06now where each vector gets placed in this giant multi-dimensional graph depends on its semantic
00:28:13meaning what what do the words actually mean so you can see over here this is like the the fruit
00:28:19section we have bananas we have apples we have pears over here we have ships and we have boats
00:28:24so going back to our document let's imagine that this document is about world war ii ships
00:28:31so each of these chunks is going to get turned into a series of numbers and those series of numbers
00:28:37will be represented as a dot in this graph where do you think it's going to go well they'll probably
00:28:42go around this area right so that would be one two and three so that's how documents get placed every
00:28:49document is going to get chunked each chunk goes through the embedding model and the embedding model
00:28:54inserts them into the vector database repeat repeat repeat for every single document and in the end
00:28:58after we do that several thousand times we get a vector database which represents our knowledge
00:29:04graph so to speak our our knowledge base and that moves us on to step three which is the retrieval
00:29:09part so where do you play into this well normally let's let's depict you well we'll give you a
00:29:16different color you can be you get to be pink so this is you all right you normally just talk to
00:29:23claud code and you ask claud code questions about world war ii battleships well in your standard
00:29:29non-rag setup what's going to happen well the large language model opus 4.6 is going to take a
00:29:34look at its training data and then it's going to give you an answer based on its training data
00:29:39information about world war ii battleships but with a rag system it's going to do more it's going to
00:29:44retrieve the appropriate vectors it's going to use those vectors to augment the answer it generates
00:29:51for you hence retrieval augmented generation that's the power of rag it allows our large language
00:29:56models to pull in information that is not a part of its training data to augment its answer in this
00:30:02example world war ii battleships yes i understand the large language model already knows that but
00:30:06replace this with any sort of proprietary company data that isn't just available for the web and do
00:30:15it at scale that's the cell for rag now in our example when we ask claud code for questions for
00:30:21information about world war ii battleships and it's in a rag setup what it's going to do is it's going
00:30:25to take our question and it's going to turn our question into a series of numbers similar to the
00:30:32vectors over here it is then going to take a look at what the number is for our question and the numbers
00:30:39of the vectors and it's going to see which of these vectors most closely matches the questions vector
00:30:46right how similar are the vectors to the question pretty much and then it's going to pull a certain
00:30:51amount of vectors whether that's one two three four or five or ten or twenty and it's going to pull
00:30:56those vectors and their information into the large language model so now the large language model has
00:31:02its training data answer plus say 10 vectors worth of information right that was the retrieval part
00:31:09and then it augments and generates an answer with that additional information and that is how rag
00:31:13works that is how naive rag works now this is not particularly effective for a number of reasons this
00:31:19very basic structure kind of falls apart at the beginning when we begin to think about okay how
00:31:25are we chunking up these documents is it random is it just off a pure number of tokens do we have
00:31:31a certain number of overlap are the documents themselves set up in a way where it even makes
00:31:36sense to chunk them because what if you know chunk number three is referencing something in chunk
00:31:42number one and then our vector situation when we pull the chunks what if it doesn't get the right
00:31:47one what if it doesn't get that other chunk that's required as context even makes sense what number
00:31:53three says you get what i'm saying like very often the entire document itself is needed to answer
00:31:59questions about said documents so this idea of getting these piecemeal answers doesn't really
00:32:05work in practice yet this is how rag was set up for a long long time other issues that can come into
00:32:10play are things like what if i have questions about the relationships between different vectors because
00:32:17right now i kind of just pull vectors in a silo but what if i wanted to know how boats related to
00:32:22bananas sounds random but what if i did you know this standard sort of vector database naive rag
00:32:31approach everything's kind of in a silo it's hard to connect information and a lot of it just depends
00:32:36on how well those original documents are even structured are they structured in a manner that
00:32:41makes sense for ragging now over the years we've come up with some ways to alleviate these issues
00:32:46things like re-rankers or ranking systems that take a look at all the vectors we grab and essentially
00:32:51then do another pass on them with a large language model to rank them in terms of their relevance but
00:32:56by and large this naive rag system has kind of fallen out of vogue yet it's still important to
00:33:03understand how this works at a foundational level so it can inform your decisions if you go for a
00:33:07more robust rag approach because if you don't understand how chunking or embeddings even work
00:33:13how can you make decisions about how you should structure your documents when we talk about
00:33:17something like graphrag or we talk about more complicating embedding systems like the brand
00:33:22new one from google which can actually ingest not just text but videos and if you don't understand
00:33:27this sort of foundation it's hard for you to actually understand this trap and the trap is that
00:33:31we've kind of just created a crappy search engine because with these naive rag systems where all we
00:33:36do is grab chunks and we can't really understand the relationships between them how is that different
00:33:42from basically just having an over complicated control f system the answer there's really not
00:33:48much of a difference which is why in these simple when in the simplistic kind of outdated rag
00:33:54structures that actually are still all over the place if you see someone who's like oh here's my
00:33:58pine cone rag system or here's my super base racks and they don't mention anything about graphrag
00:34:03or they don't mention anything about like hey here's how we have like the sophisticated re-ranker
00:34:07system and these it's gonna suck to the tune of like oh the actual effectiveness of this is like
00:34:1225% of the time you get something right like you're almost better guessing so if you don't know that
00:34:18going in you can definitely be sort of hoodwinked or confused or in some cases like basically scammed
00:34:23into buying these rag systems that do not make sense and so level five isn't about implementing
00:34:28these sort of naive rag systems it's about understanding how they work so that you when it
00:34:34comes time to implement something more sophisticated you actually understand what's going on because
00:34:38that five-minute explanation of rag is sadly not something most people understand when they say i
00:34:43need a rag system well do you because you also have to ask yourself what kind of questions are you
00:34:48actually asking about your system if you're just asking you know essentially treating your knowledge
00:34:54base as a giant rule book and you just need specific things from that knowledge system
00:34:59brought up well then obsidian is probably enough or you could probably even get away with a naive
00:35:02rag system but if we need to know about relationships if we need to know about how x interacts with y and
00:35:09they're two separate documents they never even really mention each other and it's not something
00:35:13i can just stick inside the context directly because i have thousands of said documents well that is
00:35:19where when you're going to need rag and that's when you're going to need something more sophisticated
00:35:23than basic vector rag that is when we need to start talking about graphrag so when we talk about level
00:35:29six of clawed code and rag we're talking about graphrag and we're talking about this and in my
00:35:34opinion if you are going to use rag this is sort of the lowest level of infrastructure you need to
00:35:39create this is using light rag which is a completely open source tool i'll put a link above where i
00:35:44break down exactly how to use it and how to build it but the idea of graphrag is pretty obvious it's
00:35:50the idea that everything is connected this isn't a vector database with a bunch of vectors in a silo
00:35:55this is a bunch of things connected to one another right i click on this document i can see over here
00:36:00on the right and i'll move this over you know the description of the vector the name the type the
00:36:05file the chunk and then more importantly the different relationships and this relationship
00:36:10based approach results in more effective outcomes here is a chart from light rags github this is
00:36:15about i would say six to eight months old and also of note light rag is the lightest weight graphrag
00:36:23system out there that i know of there's some very robust versions including graph rag itself from
00:36:30microsoft it's a graph it's literally called a graph rag but when we compare naive rag the light rag
00:36:35across the board we get jumps of oftentimes more than 100 percent right 31.6 versus 68.4
00:36:4324 versus 76 24 versus 75 on and on and on and that being said according to the light rag it
00:36:49actually holds its own and beats out graph rag itself but hey these are light rags numbers so
00:36:54taken with a grain of salt now when we look at this knowledge graph system right away your mind
00:36:58probably goes to obsidian because this looks very similar however what we're looking at here in
00:37:04obsidian is way more rudimentary than what's going on inside of light rag or any graph rag system
00:37:10because this series of connections we see here this is all manual and somewhat arbitrary it's only
00:37:16connected because we set related documents where clog code set related documents when it generated
00:37:22this particular document for example just added a couple brackets boom that document's connected
00:37:27so in theory i could connect a bunch of random documents that in reality have nothing to do with
00:37:30one another now because clod code isn't stupid it's not going to do that but that's a lot different
00:37:35than what went on here like this went through an actual embedding system it looked at the actual
00:37:41content it set a relationship it sent an entity there's a lot more work going on here inside of
00:37:46light rag in terms of defining the relationships than obsidian now does that difference actually
00:37:52equate to some wild gap in terms of the performance at a low level though at a huge scale maybe again
00:38:02we're in sort of that gray area kind of depends on your scale and what we're actually talking about
00:38:07and nobody can answer that question except you and some personal experience but understand these two
00:38:13things are not the same we are not the same brother two totally different systems one is pretty
00:38:20sophisticated one's pretty rudimentary understand that and so to wrap up level six in graph rag
00:38:26we're really here when we're when we've decided hey stuff like obsidian isn't working we can't use
00:38:31something like naive right because it just doesn't work and we need something that can extract entities
00:38:36and relationships and really leverage the sort of hybrid vector plus graph query system design
00:38:43but there are some traps there are some serious roadblocks even here at level six when we talk
00:38:48about light rag this is just text what if i have scannable pdfs what if i have videos what if i have
00:38:55images we don't live in a world where all your documents are just going to be google docs and
00:39:01so what do we do in those instances so multimodal retrieval is a huge thing and on top of that what
00:39:06about bringing some more agentic qualities to these systems give it a little more ai power some sort of
00:39:11boost in that department well if we're talking about things that are multimodal then we can finally move
00:39:17to sort of like the bleeding edge of rag in today's day and age as of april 2026 that's what level 7 is
00:39:24all about now when we talk about level 7 in agentic rag the big thing we kind of want to index on here
00:39:31is things that have to do with multimodal ingestion now we've done videos on these things things like
00:39:36rag anything which allow us to import images and non-text documents again think scannable pdfs
00:39:44into structures like the light rag knowledge graph you saw here we also have new releases like gemini
00:39:49embedding too which just came out in march which allows us to actually embed videos into our vector
00:39:56database videos itself and this is frankly where the space is going it's not enough to just do text
00:40:01documents how much information how much knowledge is trapped on the internet especially on places
00:40:06like youtube we're just purely video and we want more than just a transcript as well a transcript
00:40:10doesn't do enough so this sort of multimodal problem is real and again this is stuff that
00:40:16just came out weeks ago and level 7 is also where we need to start paying attention to our
00:40:20architecture and pipelines when it comes to the data going in and out of our rag system it's not
00:40:25enough to just get data in here like this is great you know okay we have all these connections and
00:40:30stuff how does the data getting there how is the data getting there in the context of a team how
00:40:35is data getting out of there like what if some of the information here has changed in a particular
00:40:40document what if somebody edits it how does it get updated what if we add duplicates who can actually
00:40:46put these things in there when it comes to production level stuff these are all questions
00:40:50you need to begin to ask yourself and so when we look at an agentic rag system like this one from
00:40:54n8n you can see the vast majority of the infrastructure everything outlined here is all about
00:41:01data ingestion and data syncing there's only a very small part that has anything due to rag which is
00:41:06right there because we need systems that clean up the data that are able to look at okay we just
00:41:11ingested this document in fact this was version 2 of version 1 can we now go back and clean that data
00:41:17here's something like a data ingestion pipeline where documents don't get directly put into the
00:41:21system or in light rag we instead put it inside of like a google drive and from there it gets ingested
00:41:27into the graph rag system and logged these are the sort of things that will actually make or break
00:41:31your rag system when you're using it for real and when we talk about agentic rag you can see here and
00:41:37i know this is rather blurry but if we have an ai agent running this whole program so you set up
00:41:42imagine some sort of chatbot for your team does it always need to hit this database the answer is
00:41:49probably not chances are in a team setting in a business setting you're going to have information
00:41:54that's in a database like this like text or something but you probably also have another set
00:41:58of databases like just standard postgres databases with a bunch of information you want to query
00:42:03with sql as well so when we talk about an agentic rag system we need something that has all of that
00:42:08the ability to intelligently decide oh am i going to be hitting the graph rag database represented
00:42:15here or am i just going to be doing some sort of sql queries in postgres these things can get
00:42:20complicated right and all of this is use case dependent which is why it's kind of hard to
00:42:23sometimes make these videos and try to hit every single edge case the point here at level 7 is not
00:42:30that there's necessarily some super rag system you've never heard of it's that you're actually
00:42:34the devil's in the details here and that's really mostly the data ingestion piece and keeping it up
00:42:39to date but also like how do you actually access this thing easy to do in a demo right here oh we
00:42:46just go to the light rag thing and i go to retrieval and i ask it questions different scenario when
00:42:50we're talking about it with a team and everyone's approaching it from different angles and you
00:42:55probably don't want everyone to have access to actually uploading it to light rag itself on a
00:43:01web app that being said for the solo operator who is trying to create some sort of sophisticated rag
00:43:07system that is able to do multimodal stuff i would suggest the rag anything plus light rag combination
00:43:14i've done a video on that and if i don't link that already i'll link it above i suggest that for a few
00:43:19reasons one it's open source and it's lightweight so it's not like you're spending a bunch of money
00:43:26or time to spin something like this up to make sure it actually makes sense for your use case
00:43:31again the the thing we want is we don't want to get stuck in systems where there's no way out and
00:43:37we spent a bunch of money to get there which is why i do love obsidian and i always recommend things
00:43:42like light rag and rag anything because hey if you try this out it doesn't work for you it doesn't
00:43:45make sense okay whatever you wasted a handful of hours you know it's not like you are spending a
00:43:50bunch of money on microsoft's graph rag which is in no in no way is cheap and so when do you know
00:43:56you're in level 7 really multimodal stuff like you need to index images tables and videos and you're
00:44:02integrating some sort of agent system where it can intelligently decide like which path it goes down
00:44:06to answer information because at level 7 you're probably integrating all this stuff you probably
00:44:12have a claud md file with some permanent information you probably have it in a code base with some mark
00:44:16down files that sort of makes sense for easy retrieval perhaps you're also including obsidian
00:44:20it's in some sort of vault plus you probably have some section of documents that are in a graph rag
00:44:25database and you have a top of the funnel ai system that can decide they ask this question i go down
00:44:33this route that's a mature sort of memory architecture that i would suggest but what's the trap here the
00:44:40trap honestly is trying to force yourself into this level and this sort of sophistication when it's
00:44:47just not needed to be honest after all this most of you are fine with obsidian there's more than enough
00:44:52you don't need graph rag you really don't need rag in general and if it's not obvious that you
00:44:57need level 7 and certainly if you haven't already tried the obsidian route you don't need to be here
00:45:01it's probably a waste of your time but the whole point of this video was to the best of my ability
00:45:07was to expose you to what i see is the different levels of rag and memory and clod code and what
00:45:12this problem is what some of the tensions are what the trade-offs are and where you should probably be
00:45:18for your use case and again the biggest thing is just experiment you don't have to know the answer
00:45:24before you get into this just try them out and i would try in ascending order if you can get away
00:45:28with just mark down files in a clod system and it's basically just clod.md on steroids sweet go ahead
00:45:34and then try obsidian if obsidian is not enough try light rag and so on and so forth so that is
00:45:39where i'm going to leave you guys for today if you want to learn more especially about the production
00:45:43side of rag like how to spin this up for a team or package it for a client we have a whole module
00:45:47on that inside of chase ai plus so check that out other than that let me know what you thought
00:45:52i know this was a long one and i will see you around

Key Takeaway

Progressing through seven levels of memory architecture—from native auto-memory to agentic multimodal RAG—mitigates context rot and reduces costs, though a structured Obsidian vault remains the most efficient 80% solution for solo operators.

Highlights

RAG systems can be 1200 times cheaper and faster than standard large language models by retrieving only relevant vectors instead of processing massive context windows.

Context rot significantly degrades performance once 256k tokens are reached in a 1 million context window, dropping accuracy from 92% to 78%.

LightRAG improves performance by over 100% compared to naive RAG by mapping entity relationships rather than treating text chunks as isolated silos.

A hierarchical folder structure in Obsidian using a 'Wiki' index allows Claude Code to navigate thousands of documents effectively without complex vector databases.

Gemini 1.5 Pro and 2.0 models now support multimodal embedding, allowing AI systems to ingest and query raw video files alongside text-based documentation.

Timeline

Native Auto-Memory and the Context Rot Problem

  • Claude Code automatically creates markdown files in a hidden memory folder to track user goals and project states.
  • Relying on a 1 million token context window leads to context rot where AI effectiveness drops as the session grows.
  • Users often avoid clearing terminal sessions due to a fear of the AI forgetting previous conversation context.

Native memory systems function like digital post-it notes but lack the sophistication needed for complex engineering tasks. Accuracy benchmarks show a 14% performance decline as the context window fills up. Effective workflows require an active role in managing context rather than letting the session bloat indefinitely.

Optimizing Project Instructions with claud.md

  • The claud.md file serves as a global instruction set that the AI references before every executed task.
  • Injecting too much irrelevant information into claud.md creates noise that reduces the model's focus on specific prompts.
  • High-signal project context focuses only on conventions and rules that apply to virtually every single interaction.

Studies on agents.md files reveal that bloated rulebooks can actually hinder performance. A successful configuration uses claud.md as a lean index rather than a repository for every project detail. This level transitions the user from passive memory reliance to intentional context control.

State Management and Multi-File Architectures

  • Dividing memory into specialized files like project.md, requirements.md, and roadmap.md prevents context pollution.
  • Orchestration tools like GSD automate the updating of state files at the end of every coding session.
  • A multi-file architecture mimics a crude version of the chunking systems used in professional RAG setups.

Separating high-level 'north star' goals from granular technical requirements helps the AI maintain focus. This modular approach ensures the most relevant data is available without flooding the prompt with unnecessary detail. It solves the issue of project-specific memory but remains difficult to scale across multiple disparate projects.

The Obsidian Vault as an LLM Knowledge Base

  • An Obsidian-based vault provides a 99% solution for solo operators by connecting the AI to thousands of structured markdown files.
  • A hierarchical Wiki structure utilizes master index files to guide the AI through a clear path from raw data to structured articles.
  • RAG-based retrieval is up to 1200 times faster than reading full documents within a standard LLM context window.

Obsidian serves as a free, low-overhead alternative to expensive vector databases. By using a 'Raw Data' staging folder and a 'Wiki' folder for structured insights, users can maintain human-readable documentation that the AI can easily query via grep tools. This architecture supports thousands of documents while keeping token costs significantly lower than long-context prompts.

Naive RAG Fundamentals and Vector Databases

  • Naive RAG works by chunking documents, turning them into numerical vectors, and storing them in multi-dimensional space.
  • The system retrieves relevant text chunks based on semantic similarity to the user's query vector.
  • Basic vector search often fails to capture the relationships between different documents or sections.

The transition to true RAG involves embedding models that map the semantic meaning of text into a vector database. While this allows for proprietary data ingestion at scale, naive systems act as glorified 'Control+F' tools because they lack relational awareness. Understanding these foundations is necessary before implementing more advanced graph-based systems.

GraphRAG and Relational Deep Linking

  • GraphRAG connects data points through explicit relationships and entities rather than just spatial similarity.
  • LightRAG provides a lightweight, open-source framework that outperforms standard naive RAG by over 100% in accuracy metrics.
  • Relational approaches are essential for answering questions about how different, seemingly unrelated documents interact.

Unlike Obsidian's manual linking, GraphRAG automatically extracts entities and maps their connections during the embedding process. This allows the AI to traverse a network of information to provide more holistic answers. It represents the minimum viable infrastructure for high-scale enterprise knowledge management.

Level 7: Agentic and Multimodal Retrieval

  • Modern RAG systems utilize multimodal embeddings to ingest scannable PDFs, images, and raw video content.
  • Agentic RAG architectures use a 'top-of-funnel' AI to decide whether to query a graph database or a SQL database.
  • The primary challenge at the highest level is maintaining data ingestion pipelines and ensuring data sync across a team.

The bleeding edge of memory involves AI agents that intelligently choose the retrieval path based on the query type. Tools like 'Rag Anything' combined with LightRAG allow for the processing of non-textual data trapped in videos or images. For most users, this level is overkill, but it is necessary for production environments requiring high data integrity and complex multimodal insights.

Community Posts

View all posts