00:00:00Let's solve the problem of clod code and memory getting AI systems to reliably and accurately
00:00:06answer questions about past conversations or giant troves of documents is a problem we have been
00:00:13trying to solve for years and the typical response has been rag retrieval augmented generation and
00:00:20while this video is titled the seven levels of clod code and rag what this video is really about
00:00:26is deconstructing that problem of clod code and really AI systems in general and memory and even
00:00:33more importantly this video is about giving you a roadmap that shows you where you stand in this
00:00:37fight between AI systems and memory and what you can do to get to the next level. So as we journey
00:00:43through these seven levels of clod code and rag we are going to hit on a number of topics but we
00:00:48are not going to start here in graphrag or anything complicated we're going to start at the beginning
00:00:53which is just the basic memory systems that are native to clod code because sad as it is to say
00:00:59this is where most people not only begin but it's where they stay from auto memory and things like
00:01:04clod md we're going to move to outside tools things like obsidian before we eventually find ourselves
00:01:10with the big boys with the true rag systems at these levels we'll talk about what rag actually is
00:01:16how it works the different types of rag naive rag versus graphrag versus agentic rag things like
00:01:21re-rankers and everything in between and at each level we're going to break it down in the same
00:01:25manner we're going to talk about what to expect at that level the skills you need to master the
00:01:29traps you need to avoid and what you need to do to move on to the follow-on level what this video
00:01:34will not be is a super in-depth technical explanation of how to necessarily set up these
00:01:40specific systems because i've already done this in many instances when we talk about graphrag and
00:01:45lightrag for example or even more advanced topics like rag anything in these different sort of
00:01:50embedding systems i've done videos where i break down from the very beginning to the very end how
00:01:55to set that up yourself so when we get to those sections i will link those videos and this is for
00:02:00both our sakes so this video isn't five hours long but for those levels we're still going to talk
00:02:04about what that actually means what each system buys you and when you should be using it but before
00:02:09we start with level one a quick word from today's sponsor me so just last month i released the clod
00:02:15code master class and is the number one way to go from zero to ai dev especially if you don't come
00:02:21from a technical background and this master class is a little bit different because we focus on a
00:02:25number of different use cases to learn how to use clod code one of those is something like production
00:02:31level rag how to build the rag systems you're going to see in this video in a real life scenario and
00:02:37actually use it as a member of a team or sell it to a client that's the kind of stuff we focus on so
00:02:42if you want to get access you can find it inside of chase ai plus there's a link to that in the pinned
00:02:47comment and we'd love to have you there so now let's start with level one and that's auto memory
00:02:51these are the systems that clod code automatically uses to create some sort of memory apparatus to
00:02:58actually remember things that you've talked about and you know you're here if you've never set
00:03:02anything up intentionally to help clod code remember context in general about previous conversations
00:03:09or just stuff that's going on in your code base and when we talk about auto memory that is quite
00:03:13literally what it is called the auto memory system which is automatically enabled when you use clod
00:03:18code essentially allows clod code to create markdown files on its own that sort of list out
00:03:26things it thinks are important about you in that particular project and this is purely based off
00:03:32of its own intuition based on your conversations and i can see these memory files it's created again
00:03:37it does this on its own if you go into your dot clod folder you go into projects you will see a
00:03:42folder there that is called memory and inside that file you will see a number of markdown files here
00:03:47there are four of them and they're like clod codes version of post-it notes saying oh yeah he mentioned
00:03:51this one time about his youtube project growth goals let's write that down and inside of everyone's
00:03:59memory folder there will be a memory.md file so you see in this memory file it has a little note about
00:04:04one of my skills and then it has you know essentially an index of all these sub memory files saying
00:04:09hey there's a youtube growth one in here a revenue one or references one and here's what's inside of
00:04:13it so if i'm just talking to clod code in my vault file and i mention something about youtube and sort
00:04:19of my goals with growth whatever it's going to reference this and say oh yeah chase is trying to
00:04:23get you know x amount of subscribers by the end of 2026. it's cute but ultimately it's not that useful
00:04:30it's kind of like when you're inside of chat gpt and it will bring up random stuff about
00:04:35previous conversations and it almost like shoehorns it in it's like okay i get it you remembered this
00:04:40but i don't really care and honestly it's a little weird to keep bringing that up i prefer if you
00:04:44didn't and unfortunately this is where most people stay in their memory journey and it's built upon a
00:04:49somewhat almost abusive past that we all have when it comes to using these chatbots
00:04:54because these chatbots don't have any sort of real memory from conversation to conversation and so
00:05:00we're always scared to death of having to exit out of a chat window or exit out of a terminal session
00:05:06because you think oh my gosh it's not going to remember my conversation and this is actually a
00:05:10real problem because what is everybody's answer to the chat window not being able to remember anything
00:05:17well the answer is you just keep that conversation going forever because you don't want to get to a
00:05:22scenario where you have to exit out and it forgets everything this is a fear that is born here inside
00:05:26of these chat windows beginning with chat gpt and same thing with claud's web app and honestly used
00:05:31to be infinitely worse with claud's web app because i think we all remember before the days of the 1
00:05:35million context window where you would have like 30 minutes to talk with claud and be like well see
00:05:39you in four hours the issue is people have brought that sort of psychotic neurotic behavior to the
00:05:45terminal and what they do in large part because you now can get away with it with a 1 million context
00:05:50window is they never clear they just keep talking and talking and talking with claud code because
00:05:55they never want it to forget what they're talking about because of these memory problems and the
00:06:00issue with that is your efficiency goes way down over time the more you talk with claud code inside
00:06:05of the same session and this is the fundamental idea of context rod if you don't know what context rod
00:06:10is it's the phenomenon that the more i use an ai system within its same session within its same chat
00:06:16and i fill up that context window the worse it gets you can see that right here claud code 1 million
00:06:23context window at 256k tokens aka i've only filled up about a quarter of its context window we're at
00:06:3092 by the end i'm at 78 so the more you use it in the same chat the worse it gets and that's one of
00:06:36the primary issues people have with ai systems and memory i have claud code it has a million context
00:06:42now and yet i do not want it to forget about the conversation i'm having so i just never exit the
00:06:47window i just fill it up and fill it up and fill it up and two things happen one effectiveness goes
00:06:51down like you just saw two your usage fills up a ton because the amount of tokens that are used at
00:06:591 million that 800,000 you know context is way more than an 80,000 context so this isn't the only issue
00:07:08but kind of off topic we're in a current ecosystem where everyone complains about claud code being
00:07:12nerfed and my usage just gets run up automatically there's a number of reasons for that but one of
00:07:18them undoubtedly is the fact that since 1 million context got introduced people have no clue how to
00:07:24manage their own context window and they aren't nearly they aren't nearly as aggressive with
00:07:29clearing and resetting the conversation as authentication but that's kind of off topic
00:07:34the point of that whole discussion is that when it comes to memory in this discussion about rag and
00:07:39claud code we have to keep context rot in the back of our mind because we're constantly trying to deal
00:07:44with this tension of okay i want to ingest context so claud code can answer questions about a number
00:07:50of things yet at the same time i don't want the context to get too large because then it's worse
00:07:55so we just that always needs to be something we're thinking about in this conversation about memory
00:08:02but to bring this back to the actual video and level one what are people doing at level one the
00:08:06answer is they're not really doing anything and because they're not doing anything they just rely
00:08:10on a bloated context window to remember things so you know you're here when you've never edited
00:08:15a claud.md file and you've never created any sort of artifact or any sort of file that allows claud
00:08:23code to realize what the heck is going on what it's actually done in the past and what it needs
00:08:27to do in the future so what do we need to master at this level well really all you really need to
00:08:31master despite everything i wrote here is you just need to understand that auto memory isn't enough
00:08:35and we need to take an active role when it comes to claud code and memory because a trap at this level
00:08:40if you don't take an active role you you have no control and we need to control what claud code
00:08:44considers when it answers our questions and so to unlock level one and move on to level two
00:08:50we need memory that's explicit and we need to figure out how to actually do that what files do
00:08:57you need to edit and understand that they even exist in order to take an active role in this
00:09:01relationship now level two is all about one specific file and that is the claud.md file when you learn
00:09:06about this thing it feels like a godsend finally there is a single place where i can tell claud
00:09:12code some rules and conventions that i always wanted to follow and it's going to do it and in
00:09:16fact i can include things that i wanted to remember and it always will and it definitely feels like
00:09:20progress at first so here's a template of a standard claud.md file for a personal assistant project now
00:09:29claud code is going to automatically create a claud.md file but you have the ability to
00:09:33edit this or even update it on demand by using a command like forward slash init and the idea of
00:09:38this thing is is it is again like the holy grail of instructions for claud code for that particular
00:09:43project for all intents and purposes claud code is going to take a look at this before any task it
00:09:50executes so if you want it to remember specific things what are you going to do you're going to
00:09:54put it in the claud.md theoretically it's a bit of smaller scale in something like rag you know we
00:10:00aren't putting in you know complete documents in here but it's things you want claud code to
00:10:05always remember and conventions you want it to follow so for this one we have an about me section
00:10:09we have a breakdown of the structure of the file system and how we want it to actually operate when
00:10:14we give it commands and like i said because this is referenced on essentially every prompt claud code
00:10:18is really good at following this so the idea of like hey i wanted to remember specific things this
00:10:22seems like a great place to put it but we got to be careful because we can overdo it when we look at
00:10:28studies like this one evaluating agents.md and you can swap agents.md for claud.md
00:10:33they found in the study that these sort of files can actually reduce the effectiveness of large
00:10:40language models at large and why is that well it's because the thing that makes it so good the fact
00:10:45that it's injected into basically every prompt is what also can make it so bad are we actually
00:10:51injecting the correct context have we pushed through the noise and are we actually giving it a proper
00:10:57signal or are we just throwing in things that we think are good because if it isn't relevant to
00:11:02virtually every single prompt that's going to do in your project should it be here in the claud.md
00:11:08is this a good way to let claud code remember things i would argue no not really and that goes
00:11:15contrary to what a lot of people say about claud.md and how you should structure it based on studies
00:11:20like that and based on personal experience less is more context pollution is real context rot is real
00:11:26so if something is inside of claud.md and it doesn't make sense for again virtually every
00:11:32single prompt you give it should it be in there the answer is no but most people don't realize that and
00:11:37instead they fall into this trap of a bloated rulebook instead the skills we should be mastering
00:11:42are how do we create project context that is high signal how do i make sure what i'm actually putting
00:11:48inside this thing makes sense and with that comes the idea of context-rot awareness like we talked
00:11:53about in the last level and you take all that together and level two feels like you've been
00:11:57moving forward like hey i'm taking an active role in memory i have this claud.md file you realize
00:12:02it's not really enough and when we talk about level three and what we can do to move forward there
00:12:08we want to think about sort of not a static rulebook but something that can evolve and it's
00:12:14something that can include claud.md instead of relying on claud.md to do everything what if we
00:12:18use claud.md as sort of like an index file that points claud code in the right direction instead
00:12:24so what did i mean about claud.md acting as sort of an index and pointing towards other files
00:12:30well i'm talking about a architecture within your code base that doesn't just have one markdown file
00:12:37trying to deal with all the sort of memory issues in the form of claud.md i'm talking about having
00:12:41multiple files for specific tasks i think a great example of this in action is sort of what gsd the
00:12:47get shit done orchestration tool does it doesn't just create one file that says hey this is what
00:12:53we're going to build and these are the requirements and this is what we've done and where we're going
00:12:56instead it creates multiple you can see over here on the left we have a project.md a requirements.md
00:13:02a roadmap in a state so the requirements exist so claud code always knows and has memory of
00:13:08what it's supposed to be building the roadmap breaks down what exactly we are going to be
00:13:12creating not just now but what we've done in the past and in the future and the project gives it
00:13:16memory gives it context of what we are doing at a high level overview what is our north star and by
00:13:22breaking up memory and context and conventions in this sort of system we're fighting against the idea
00:13:29of context raw and the idea brought up in that study which is injecting these files into every
00:13:34prompt all the time like we do in claud.md it's actually counterintuitive it doesn't help us get
00:13:39better outputs furthermore breaking it down into these chunks and having a clear path for claud
00:13:44code to go down and says like hey i want to figure out where this information is oh i go to claud.md
00:13:49oh claud.md says these are my five options okay here's that one let me go and find it
00:13:54that sort of structure is what you're going to see 100 in the follow-on level when we talk about
00:13:58obsidian and really is sort of like a crude reimagining of the chunking system and the
00:14:04vector similarity search that we see in true rag systems but obviously this is kind of small scale
00:14:10at this level we're talking about four markdown files here we're not talking about a system that
00:14:14can handle thousands and thousands and thousands of documents but like you're going to hear me talk
00:14:20about a lot what does that mean for you do you need a system that we're going to talk about levels four
00:14:26five six seven that can handle this many documents the answer is maybe not and so part of this rag
00:14:32journey is understanding not just where you stand but like where do you actually need to go do you
00:14:36always need to be at level seven and know how to do an agentic rag system side of claud code it's
00:14:41probably good to know how to do it but it's also just as good to know when you don't need to
00:14:46implement that sometimes what we see in these systems like this is enough for a lot of people
00:14:52so it's just as important to know how to do it and to know like do you need to should you do it
00:14:58when we talk about level three and we talk about state files how do we know we're here
00:15:00well we know we're here when we're still strictly inside the claud code ecosystem we have an
00:15:04integrated outside tools or applications and really we're just at the place where we're just creating
00:15:09multiple markdown files to create our own homemade sort of like memory chunking system
00:15:14but this still is really important we're still mastering some true skills here the idea of like
00:15:18actually structuring docs having some sort of system in place that updates state at every
00:15:23session because this is can be a problem with rag too like how do you make sure everything is up to
00:15:28date and chances are you're also starting to lean into orchestration layers at this point things like
00:15:33gst and superpowers that do things like this this multi markdown file architecture on their own but
00:15:40there is a real trap here what we create in this project is very much just for that project it's
00:15:46kind of clunky to then take those markdown files and shift them over to another project so level
00:15:51four is where we bring in obsidian and this is a tool that has been getting a ton of hype
00:15:56and for good reason when you have people like andre karpathy talking about these
00:16:00llm knowledge bases they've created which are built for all intents and purposes on an obsidian
00:16:06foundation it's getting almost 20 million views we should probably listen and see how this is actually
00:16:11operating now for context i've done a full deep dive on this obsidian andre karpathy llm knowledge
00:16:18base i'll link that above so if you want to focus on that how to build that make sure you check that
00:16:22out above and what i also want to mention to most people is that this obsidian thing we're going to
00:16:27talk about right here in level four this is honestly the level most people should strive
00:16:32for because this is enough for most people in most use cases when we talk about levels five six and
00:16:37seven we're going to talk about true rag structures and to be honest it's overkill for most people this
00:16:43is overkill for most people like we love talking about rag like it's great i understand that but
00:16:50obsidian is that 80 solution that in reality is like a 99 solution for most people because it's free
00:16:56there's basically no overhead and it does the job for the solo operator and when i say it does a job
00:17:02for the solo operator i mean it solves the problem of having clod code connected to a bunch of
00:17:07different documents a bunch of different markdown files and being able to get accurate timely
00:17:13information from it and having insight to those documents as the human being because when i click
00:17:19on these documents it's very clear what is going on inside here and it's very clear what documents
00:17:24are related to it when i click these links i'm brought to more documents when i click these links
00:17:30i'm brought to more documents and so for me as the human being having this insight is important
00:17:36because to be totally honest the obsidian based insight to the documents i would argue trumps
00:17:42a lot of the insight you get from the rag systems when we talk about thousands of thousands of
00:17:47documents being embedded in something like a grav rag system like this looks great visually
00:17:52looks very stunning do you actually know what's going on inside here maybe you do to be honest
00:17:58you're kind of just relying on the answers you get that will show and the links and stuff but it's a
00:18:03bit hard it's like piece through the embeddings for sure all that to say is you should pay special
00:18:08attention to obsidian and clod code because when we talk about this journey from rag i always suggest
00:18:13to everybody clients included like let's just start with obsidian and see how far we can scale this and
00:18:20eventually if we do hit a wall you can always transition to more robust rag systems so why not
00:18:26try the simple option if it works great it's free cost me no money versus like let's try to knock out
00:18:31this rag system which can be kind of difficult to put into production depending on what you're trying
00:18:35to do like always start with the simple stuff it's never too hard to transition to something more
00:18:40complicated so what are we really talking about here in level four what we're talking about taking
00:18:45sort of that structure we began to build in level 3 you know with an index file pointing at different
00:18:50markdown files and just scaling that up and then bringing in this outside tool obsidian to make it
00:18:56easy for you the human being to actually see these connections and the platonic ideal of this version
00:19:00is pretty much what andre karpathy laid out and building a llm knowledge base on top of obsidian
00:19:05and powered by clod code and what that looks like is a structure like this so when you use obsidian
00:19:11and you download it's completely free again reference that video i posted earlier you set a
00:19:16certain file as the vault think of the vault as sort of like the rag system this this quasi rag
00:19:23system you've created and inside of the vault we then architect that we structure that just with
00:19:30files so we have the overarching file called the vault and inside that vault we create multiple
00:19:36subfolders in andre karpathy's case he talks about three different subfolders the reality is they
00:19:41could be any subfolders it just sort of needs to match the theme we're going to talk about in one
00:19:47folder we have the raw data this is everything we are ingesting and eventually want to structure so
00:19:52that clod code can reference it later think of you know you have clod code do competitive analysis on
00:19:5850 of your competitors and it pulls 50 sites for each right we're talking about a large amount of
00:20:03information it's probably 2500 different things all that will get dumped into some sort of raw folder
00:20:08this is like the staging area for the data we then have the wiki folder the wiki folder is where the
00:20:14structured data goes so we then have clod code take this raw data and structure it into essentially
00:20:20different like wikipedia type articles inside of the wiki folder each article gets its own folder so
00:20:28the idea being when you then ask clod code information about you know let's say we had it
00:20:33search for stuff about ai agents and i say hey clod code talk to me about ai agents the same way
00:20:38you would query a rag system well clod code is going to go to the vault from the vault it's going
00:20:45to go to the wiki the wiki has a master index markdown file think of sort of what we were doing
00:20:50with talked about doing with clod.md before right you see how these sort of themes transition
00:20:56throughout the different levels it takes a look at that master index the master index tells it what
00:21:00exists in the subsidy and rag system oh ai agents exist cool guess what's going on down here it also
00:21:08has an index file which talks about the individual articles that exists what am i saying here i am
00:21:14saying there is a clear hierarchy for clod code to reference when it wants to find information about
00:21:21files vault wiki index article etc so because it is so clear how to find information also why it's so
00:21:31clear to first find information and turn it into wiki we can create a system that has a lot of
00:21:37documents without rag hundreds thousands if you do this properly because if the system is clear hey i
00:21:44check the vault and i check the index and that has a clear delineation of like where everything is well
00:21:50then it's not too hard for clod code to figure out where to find stuff and so you can get away with a
00:21:54non-rag structure for thousands of documents and it's been really hard to do that in the past and
00:21:58that's because most people don't structure anything with any sort of structure they just have a billion
00:22:02documents sitting in one folder it's the equivalent of having 10 million files strewn across the factory
00:22:08floor i mean like well clod code find it like no you actually just need a filing cabinet like clod
00:22:13code is actually pretty smart and you can see that architecture in action right here so right now we're
00:22:17looking at a clod.md file that is in an obsidian vault and what does it say well breaks down the
00:22:24vault structure the wiki system you know the overall structure of the subfolders in how to
00:22:30essentially work it right so again we're using clod md as a conventions type file over here on the left
00:22:36you can see the wiki folder inside the wiki folder is a master index and it lists what is inside of
00:22:43there in this case there's just one article it's on clod managed agents inside that folder we see
00:22:49clod managed agents it has its own wiki folder breaking down the articles inside until you get
00:22:55to the actual article itself so very clear the steps it needs to take and so when i tell clod code
00:23:01talk to me about the managed agents we have a wiki on it it's very easy for it to search for it via
00:23:06its built-in grep tool it links me the actual markdown file and then breaks down everything
00:23:12that's happening now the question at level four really becomes a level of scale how many documents
00:23:16can we get away with where this sort of system continues to work is there a point at which andre
00:23:22karpathy's system begins to fall apart where hey like i get it it's a very clear path that clod
00:23:26code needs to follow it goes to the indexes yada yada yada does that sustain itself at
00:23:312 000 documents 2 500 3 000 is there a clear number the answer is we don't really know and there is an
00:23:37earlier number because all your documents are also different and in terms of hitting a wall it isn't
00:23:43just as simple as well clod codes giving us the wrong answers it has too many files in the
00:23:47subsidian system how much is it costing you in terms of tokens now that we've added so many files and how
00:23:52quickly is it doing it because rag can actually be infinitely faster and cheaper in certain situations
00:23:59what we're looking at here is a comparison between textual llms right in the giant bars and textual
00:24:06rag in terms of the amount of tokens it took to get the correct answer and the amount of time it
00:24:11took to get that answer what do we see here we see that textual rag versus textual llms there's a
00:24:18massive difference the tune of like 1200 times i'm saying rag is 1200 times cheaper and 1200
00:24:25times faster than textual llm in these studies now context this was done in 2025 this is not done with
00:24:33clod code these models have changed significantly since then these are just straight up llms this
00:24:37isn't a coding artist etc etc etc however we were talking a 1200 x difference so when we're evaluating
00:24:48hey is obsidian what i should be doing versus is should i be doing rag system it isn't as simple as
00:24:54just well it's giving the right answer or not because you could be you could have a scenario
00:24:59where you get the right answer with obsidian yet if you went to rag it's a thousand times cheaper
00:25:04and faster right so it's this very fuzzy line between when is obsidian good enough and these
00:25:10sort of like just markdown file architecture is good enough for when like we need to use rag
00:25:15there's not a great answer i don't have a great answer for you the answer is you have to experiment
00:25:18and you need to try both and see what works because this is frankly out of date totally like 2025 older
00:25:25models the difference between rag and textual llms is not 1200 times but how much has that gap shrunk
00:25:32because that is an insane gap that isn't like 10x it's 1200x so there's a lot you have to know and
00:25:39again you you won't know the answer ahead of time you just won't watch every video you want
00:25:45no one's going to tell you where that line in the sand is you literally just need to experiment
00:25:49and see what works for you as you increase the amount of documents you're asking clod code
00:25:54to answer questions about so on that note let's move on to level five which is where we finally
00:25:59begin to talk about real rag systems and talk about some of the rag fundamentals like embeddings
00:26:04vector databases and how data actually flows through a system when it becomes part of our
00:26:10rag knowledge base so let's begin by talking about naive rag which is the most basic type of rag out
00:26:16there but it provides the foundation for everything else we do now you can kind of think of rag systems
00:26:21being broken out into three parts on the left hand side we have the embedding stage we then
00:26:27have the vector database and then we have the actual retrieval going on with the large language
00:26:33model so one two and three and to best illustrate this model let's start with sort of the journey of
00:26:40a document that is going to be part of our knowledge base remember in a large rag system we could be
00:26:45talking about thousands of documents and in each document could be thousands of pages but in this
00:26:50example we have a one-page document that we're talking about now if we want to add this document
00:26:56to our database the way it's going to work is it's not going to be ingested as a whole unit instead we
00:27:03are going to take this document and we are going to chunk it up into pieces so this one pager
00:27:08essentially becomes three different chunks these three chunks are then sent to an embedding model
00:27:15and the job of the embedding model is to take these three chunks and turn it into a vector
00:27:21in a vector database now a vector database is just a different variation of your standard database
00:27:27when we talk about a standard database think of something like an excel document right you have
00:27:32columns and you have rows well in a vector database it's not two-dimensional columns and rows it's
00:27:37actually hundreds if not thousands of dimensions but for the purposes of today just think of a
00:27:43three-dimensional graph like you see here and the vectors are just points in that graph and each
00:27:50point is represented by a series of numbers so you can see here we have bananas and bananas is
00:27:57represented by 0.52 5.12 and then 9.31 you see that up here now that continues for hundreds of numbers
00:28:06now where each vector gets placed in this giant multi-dimensional graph depends on its semantic
00:28:13meaning what what do the words actually mean so you can see over here this is like the the fruit
00:28:19section we have bananas we have apples we have pears over here we have ships and we have boats
00:28:24so going back to our document let's imagine that this document is about world war ii ships
00:28:31so each of these chunks is going to get turned into a series of numbers and those series of numbers
00:28:37will be represented as a dot in this graph where do you think it's going to go well they'll probably
00:28:42go around this area right so that would be one two and three so that's how documents get placed every
00:28:49document is going to get chunked each chunk goes through the embedding model and the embedding model
00:28:54inserts them into the vector database repeat repeat repeat for every single document and in the end
00:28:58after we do that several thousand times we get a vector database which represents our knowledge
00:29:04graph so to speak our our knowledge base and that moves us on to step three which is the retrieval
00:29:09part so where do you play into this well normally let's let's depict you well we'll give you a
00:29:16different color you can be you get to be pink so this is you all right you normally just talk to
00:29:23claud code and you ask claud code questions about world war ii battleships well in your standard
00:29:29non-rag setup what's going to happen well the large language model opus 4.6 is going to take a
00:29:34look at its training data and then it's going to give you an answer based on its training data
00:29:39information about world war ii battleships but with a rag system it's going to do more it's going to
00:29:44retrieve the appropriate vectors it's going to use those vectors to augment the answer it generates
00:29:51for you hence retrieval augmented generation that's the power of rag it allows our large language
00:29:56models to pull in information that is not a part of its training data to augment its answer in this
00:30:02example world war ii battleships yes i understand the large language model already knows that but
00:30:06replace this with any sort of proprietary company data that isn't just available for the web and do
00:30:15it at scale that's the cell for rag now in our example when we ask claud code for questions for
00:30:21information about world war ii battleships and it's in a rag setup what it's going to do is it's going
00:30:25to take our question and it's going to turn our question into a series of numbers similar to the
00:30:32vectors over here it is then going to take a look at what the number is for our question and the numbers
00:30:39of the vectors and it's going to see which of these vectors most closely matches the questions vector
00:30:46right how similar are the vectors to the question pretty much and then it's going to pull a certain
00:30:51amount of vectors whether that's one two three four or five or ten or twenty and it's going to pull
00:30:56those vectors and their information into the large language model so now the large language model has
00:31:02its training data answer plus say 10 vectors worth of information right that was the retrieval part
00:31:09and then it augments and generates an answer with that additional information and that is how rag
00:31:13works that is how naive rag works now this is not particularly effective for a number of reasons this
00:31:19very basic structure kind of falls apart at the beginning when we begin to think about okay how
00:31:25are we chunking up these documents is it random is it just off a pure number of tokens do we have
00:31:31a certain number of overlap are the documents themselves set up in a way where it even makes
00:31:36sense to chunk them because what if you know chunk number three is referencing something in chunk
00:31:42number one and then our vector situation when we pull the chunks what if it doesn't get the right
00:31:47one what if it doesn't get that other chunk that's required as context even makes sense what number
00:31:53three says you get what i'm saying like very often the entire document itself is needed to answer
00:31:59questions about said documents so this idea of getting these piecemeal answers doesn't really
00:32:05work in practice yet this is how rag was set up for a long long time other issues that can come into
00:32:10play are things like what if i have questions about the relationships between different vectors because
00:32:17right now i kind of just pull vectors in a silo but what if i wanted to know how boats related to
00:32:22bananas sounds random but what if i did you know this standard sort of vector database naive rag
00:32:31approach everything's kind of in a silo it's hard to connect information and a lot of it just depends
00:32:36on how well those original documents are even structured are they structured in a manner that
00:32:41makes sense for ragging now over the years we've come up with some ways to alleviate these issues
00:32:46things like re-rankers or ranking systems that take a look at all the vectors we grab and essentially
00:32:51then do another pass on them with a large language model to rank them in terms of their relevance but
00:32:56by and large this naive rag system has kind of fallen out of vogue yet it's still important to
00:33:03understand how this works at a foundational level so it can inform your decisions if you go for a
00:33:07more robust rag approach because if you don't understand how chunking or embeddings even work
00:33:13how can you make decisions about how you should structure your documents when we talk about
00:33:17something like graphrag or we talk about more complicating embedding systems like the brand
00:33:22new one from google which can actually ingest not just text but videos and if you don't understand
00:33:27this sort of foundation it's hard for you to actually understand this trap and the trap is that
00:33:31we've kind of just created a crappy search engine because with these naive rag systems where all we
00:33:36do is grab chunks and we can't really understand the relationships between them how is that different
00:33:42from basically just having an over complicated control f system the answer there's really not
00:33:48much of a difference which is why in these simple when in the simplistic kind of outdated rag
00:33:54structures that actually are still all over the place if you see someone who's like oh here's my
00:33:58pine cone rag system or here's my super base racks and they don't mention anything about graphrag
00:34:03or they don't mention anything about like hey here's how we have like the sophisticated re-ranker
00:34:07system and these it's gonna suck to the tune of like oh the actual effectiveness of this is like
00:34:1225% of the time you get something right like you're almost better guessing so if you don't know that
00:34:18going in you can definitely be sort of hoodwinked or confused or in some cases like basically scammed
00:34:23into buying these rag systems that do not make sense and so level five isn't about implementing
00:34:28these sort of naive rag systems it's about understanding how they work so that you when it
00:34:34comes time to implement something more sophisticated you actually understand what's going on because
00:34:38that five-minute explanation of rag is sadly not something most people understand when they say i
00:34:43need a rag system well do you because you also have to ask yourself what kind of questions are you
00:34:48actually asking about your system if you're just asking you know essentially treating your knowledge
00:34:54base as a giant rule book and you just need specific things from that knowledge system
00:34:59brought up well then obsidian is probably enough or you could probably even get away with a naive
00:35:02rag system but if we need to know about relationships if we need to know about how x interacts with y and
00:35:09they're two separate documents they never even really mention each other and it's not something
00:35:13i can just stick inside the context directly because i have thousands of said documents well that is
00:35:19where when you're going to need rag and that's when you're going to need something more sophisticated
00:35:23than basic vector rag that is when we need to start talking about graphrag so when we talk about level
00:35:29six of clawed code and rag we're talking about graphrag and we're talking about this and in my
00:35:34opinion if you are going to use rag this is sort of the lowest level of infrastructure you need to
00:35:39create this is using light rag which is a completely open source tool i'll put a link above where i
00:35:44break down exactly how to use it and how to build it but the idea of graphrag is pretty obvious it's
00:35:50the idea that everything is connected this isn't a vector database with a bunch of vectors in a silo
00:35:55this is a bunch of things connected to one another right i click on this document i can see over here
00:36:00on the right and i'll move this over you know the description of the vector the name the type the
00:36:05file the chunk and then more importantly the different relationships and this relationship
00:36:10based approach results in more effective outcomes here is a chart from light rags github this is
00:36:15about i would say six to eight months old and also of note light rag is the lightest weight graphrag
00:36:23system out there that i know of there's some very robust versions including graph rag itself from
00:36:30microsoft it's a graph it's literally called a graph rag but when we compare naive rag the light rag
00:36:35across the board we get jumps of oftentimes more than 100 percent right 31.6 versus 68.4
00:36:4324 versus 76 24 versus 75 on and on and on and that being said according to the light rag it
00:36:49actually holds its own and beats out graph rag itself but hey these are light rags numbers so
00:36:54taken with a grain of salt now when we look at this knowledge graph system right away your mind
00:36:58probably goes to obsidian because this looks very similar however what we're looking at here in
00:37:04obsidian is way more rudimentary than what's going on inside of light rag or any graph rag system
00:37:10because this series of connections we see here this is all manual and somewhat arbitrary it's only
00:37:16connected because we set related documents where clog code set related documents when it generated
00:37:22this particular document for example just added a couple brackets boom that document's connected
00:37:27so in theory i could connect a bunch of random documents that in reality have nothing to do with
00:37:30one another now because clod code isn't stupid it's not going to do that but that's a lot different
00:37:35than what went on here like this went through an actual embedding system it looked at the actual
00:37:41content it set a relationship it sent an entity there's a lot more work going on here inside of
00:37:46light rag in terms of defining the relationships than obsidian now does that difference actually
00:37:52equate to some wild gap in terms of the performance at a low level though at a huge scale maybe again
00:38:02we're in sort of that gray area kind of depends on your scale and what we're actually talking about
00:38:07and nobody can answer that question except you and some personal experience but understand these two
00:38:13things are not the same we are not the same brother two totally different systems one is pretty
00:38:20sophisticated one's pretty rudimentary understand that and so to wrap up level six in graph rag
00:38:26we're really here when we're when we've decided hey stuff like obsidian isn't working we can't use
00:38:31something like naive right because it just doesn't work and we need something that can extract entities
00:38:36and relationships and really leverage the sort of hybrid vector plus graph query system design
00:38:43but there are some traps there are some serious roadblocks even here at level six when we talk
00:38:48about light rag this is just text what if i have scannable pdfs what if i have videos what if i have
00:38:55images we don't live in a world where all your documents are just going to be google docs and
00:39:01so what do we do in those instances so multimodal retrieval is a huge thing and on top of that what
00:39:06about bringing some more agentic qualities to these systems give it a little more ai power some sort of
00:39:11boost in that department well if we're talking about things that are multimodal then we can finally move
00:39:17to sort of like the bleeding edge of rag in today's day and age as of april 2026 that's what level 7 is
00:39:24all about now when we talk about level 7 in agentic rag the big thing we kind of want to index on here
00:39:31is things that have to do with multimodal ingestion now we've done videos on these things things like
00:39:36rag anything which allow us to import images and non-text documents again think scannable pdfs
00:39:44into structures like the light rag knowledge graph you saw here we also have new releases like gemini
00:39:49embedding too which just came out in march which allows us to actually embed videos into our vector
00:39:56database videos itself and this is frankly where the space is going it's not enough to just do text
00:40:01documents how much information how much knowledge is trapped on the internet especially on places
00:40:06like youtube we're just purely video and we want more than just a transcript as well a transcript
00:40:10doesn't do enough so this sort of multimodal problem is real and again this is stuff that
00:40:16just came out weeks ago and level 7 is also where we need to start paying attention to our
00:40:20architecture and pipelines when it comes to the data going in and out of our rag system it's not
00:40:25enough to just get data in here like this is great you know okay we have all these connections and
00:40:30stuff how does the data getting there how is the data getting there in the context of a team how
00:40:35is data getting out of there like what if some of the information here has changed in a particular
00:40:40document what if somebody edits it how does it get updated what if we add duplicates who can actually
00:40:46put these things in there when it comes to production level stuff these are all questions
00:40:50you need to begin to ask yourself and so when we look at an agentic rag system like this one from
00:40:54n8n you can see the vast majority of the infrastructure everything outlined here is all about
00:41:01data ingestion and data syncing there's only a very small part that has anything due to rag which is
00:41:06right there because we need systems that clean up the data that are able to look at okay we just
00:41:11ingested this document in fact this was version 2 of version 1 can we now go back and clean that data
00:41:17here's something like a data ingestion pipeline where documents don't get directly put into the
00:41:21system or in light rag we instead put it inside of like a google drive and from there it gets ingested
00:41:27into the graph rag system and logged these are the sort of things that will actually make or break
00:41:31your rag system when you're using it for real and when we talk about agentic rag you can see here and
00:41:37i know this is rather blurry but if we have an ai agent running this whole program so you set up
00:41:42imagine some sort of chatbot for your team does it always need to hit this database the answer is
00:41:49probably not chances are in a team setting in a business setting you're going to have information
00:41:54that's in a database like this like text or something but you probably also have another set
00:41:58of databases like just standard postgres databases with a bunch of information you want to query
00:42:03with sql as well so when we talk about an agentic rag system we need something that has all of that
00:42:08the ability to intelligently decide oh am i going to be hitting the graph rag database represented
00:42:15here or am i just going to be doing some sort of sql queries in postgres these things can get
00:42:20complicated right and all of this is use case dependent which is why it's kind of hard to
00:42:23sometimes make these videos and try to hit every single edge case the point here at level 7 is not
00:42:30that there's necessarily some super rag system you've never heard of it's that you're actually
00:42:34the devil's in the details here and that's really mostly the data ingestion piece and keeping it up
00:42:39to date but also like how do you actually access this thing easy to do in a demo right here oh we
00:42:46just go to the light rag thing and i go to retrieval and i ask it questions different scenario when
00:42:50we're talking about it with a team and everyone's approaching it from different angles and you
00:42:55probably don't want everyone to have access to actually uploading it to light rag itself on a
00:43:01web app that being said for the solo operator who is trying to create some sort of sophisticated rag
00:43:07system that is able to do multimodal stuff i would suggest the rag anything plus light rag combination
00:43:14i've done a video on that and if i don't link that already i'll link it above i suggest that for a few
00:43:19reasons one it's open source and it's lightweight so it's not like you're spending a bunch of money
00:43:26or time to spin something like this up to make sure it actually makes sense for your use case
00:43:31again the the thing we want is we don't want to get stuck in systems where there's no way out and
00:43:37we spent a bunch of money to get there which is why i do love obsidian and i always recommend things
00:43:42like light rag and rag anything because hey if you try this out it doesn't work for you it doesn't
00:43:45make sense okay whatever you wasted a handful of hours you know it's not like you are spending a
00:43:50bunch of money on microsoft's graph rag which is in no in no way is cheap and so when do you know
00:43:56you're in level 7 really multimodal stuff like you need to index images tables and videos and you're
00:44:02integrating some sort of agent system where it can intelligently decide like which path it goes down
00:44:06to answer information because at level 7 you're probably integrating all this stuff you probably
00:44:12have a claud md file with some permanent information you probably have it in a code base with some mark
00:44:16down files that sort of makes sense for easy retrieval perhaps you're also including obsidian
00:44:20it's in some sort of vault plus you probably have some section of documents that are in a graph rag
00:44:25database and you have a top of the funnel ai system that can decide they ask this question i go down
00:44:33this route that's a mature sort of memory architecture that i would suggest but what's the trap here the
00:44:40trap honestly is trying to force yourself into this level and this sort of sophistication when it's
00:44:47just not needed to be honest after all this most of you are fine with obsidian there's more than enough
00:44:52you don't need graph rag you really don't need rag in general and if it's not obvious that you
00:44:57need level 7 and certainly if you haven't already tried the obsidian route you don't need to be here
00:45:01it's probably a waste of your time but the whole point of this video was to the best of my ability
00:45:07was to expose you to what i see is the different levels of rag and memory and clod code and what
00:45:12this problem is what some of the tensions are what the trade-offs are and where you should probably be
00:45:18for your use case and again the biggest thing is just experiment you don't have to know the answer
00:45:24before you get into this just try them out and i would try in ascending order if you can get away
00:45:28with just mark down files in a clod system and it's basically just clod.md on steroids sweet go ahead
00:45:34and then try obsidian if obsidian is not enough try light rag and so on and so forth so that is
00:45:39where i'm going to leave you guys for today if you want to learn more especially about the production
00:45:43side of rag like how to spin this up for a team or package it for a client we have a whole module
00:45:47on that inside of chase ai plus so check that out other than that let me know what you thought
00:45:52i know this was a long one and i will see you around