00:00:00the death of RAG has been greatly exaggerated.
00:00:03Yes, I know large language models like Opus 4.6
00:00:05have gotten way better lately at handling large contacts.
00:00:09But if you think that means you will never need RAG,
00:00:12you are going to hit a wall
00:00:14that you can't just prompt your way out of.
00:00:16So today I'm going to explain when you need RAG,
00:00:19what sort of RAG actually works in 2026
00:00:22because the landscape has changed a ton over the last year,
00:00:25and I'm gonna show you how to connect Cloud Code
00:00:28to your RAG system,
00:00:30as well as give you some skills you can take home with you.
00:00:32So today's goal is to give you this,
00:00:35a graph RAG system built on the back of Light RAG
00:00:38that we can use with Cloud Code.
00:00:40And more importantly, this is gonna give us a system
00:00:43that we can use when we need to use AI
00:00:45with giant, large corpuses of documents, right?
00:00:49Not just five documents, not just 10 documents
00:00:51like you'll see in the demo,
00:00:52but 500 documents, 1,000 documents,
00:00:55because it's not enough just to rely
00:00:57on the context window Cloud Code comes with,
00:00:59or any other LLM.
00:01:01Because when you start to get huge scale,
00:01:03which you do see in a lot of enterprises
00:01:05or even just smaller businesses,
00:01:06having a RAG system like this is actually cheaper and faster
00:01:10than your standard agentic grep.
00:01:12So with that in mind,
00:01:13having the skill of being able to create
00:01:14these sorts of RAG systems is very important,
00:01:16but luckily it's pretty simple.
00:01:18And like I just alluded to,
00:01:19we will be using Light RAG today.
00:01:21This is an open source repo that I absolutely love.
00:01:25It's been around for a while,
00:01:26and it's something that's been updated over and over again.
00:01:28It's able to compete with more sophisticated
00:01:30graph RAG systems like Microsoft
00:01:32at literally a small percentage of the cost.
00:01:35So it's the perfect place to actually sort of test out
00:01:37these graph RAG concepts if you've never used it before.
00:01:40But in order for us to get the most out of Light RAG,
00:01:43we need to understand how RAG actually works at a base level,
00:01:46because the landscape for RAG has changed.
00:01:48What we were doing at the end of 2024 and early 2025
00:01:51was what is called naive RAG, the most base level of RAG.
00:01:54Remember all those NADN automations where it was like,
00:01:56hey, let's go to Pinecone and let's go to Superbase.
00:01:58That was naive RAG.
00:02:00That doesn't work anymore.
00:02:02That does not cut it.
00:02:03We have to use more sophisticated versions of RAG,
00:02:06but we need to understand the fundamentals first.
00:02:08So let's do a quick refresher of what RAG is
00:02:12and how it works before we dive into the Light RAG setup.
00:02:14So RAG, retrieval augmented generation.
00:02:18The way it works is I first start
00:02:20with some sort of document, right?
00:02:22And I'm going to have thousands of these
00:02:25in a pretty robust RAG system.
00:02:27But what happens is I have this document
00:02:29that I want to go inside of my RAG system,
00:02:31inside of a vector database.
00:02:34Well, what happens isn't the document
00:02:38doesn't just get thrown into this database, right?
00:02:40Like it's some sort of Google Drive system.
00:02:41What happens is the document goes through an embedding model
00:02:44and then it gets turned into a vector.
00:02:46But even more so than that,
00:02:47the document doesn't go as one giant piece.
00:02:50It gets chunked up.
00:02:51So imagine we have this one page document
00:02:54and it gets pushed into chunk one, chunk two, and chunk three.
00:02:59Each of these chunks then become vectors,
00:03:03which is just a point on a graph,
00:03:05a point in a vector database.
00:03:06Now the embedding model is what does this chunking for us.
00:03:09It's in charge of the process of taking this document,
00:03:11figuring out what it's all about,
00:03:13and then turning it into a point on this graph.
00:03:16So the document gets chunked up,
00:03:18it goes through the embedding model,
00:03:20and then our document becomes a vector on this graph.
00:03:24Now, this is a three-dimensional graph.
00:03:27In reality, it is thousands of dimensions,
00:03:30but just think of it as a three-dimensional graph for now.
00:03:33Now, imagine this document was about warships.
00:03:36Okay, and each vector got turned into some sort of chunk
00:03:39about warships.
00:03:40Well, where's it gonna go?
00:03:41Well, it's gonna go over here next to boats and ships,
00:03:43obviously, and it's gonna become its own little vector.
00:03:45And by vector, I mean,
00:03:46it's just given a series of numbers that represent it.
00:03:50You can see that over here with bananas.
00:03:53So banana is 0.52, 5.12, and 9.31, on and on and on.
00:03:57This goes for thousands of numbers.
00:04:00So our little boat guy over here is like one, two, three,
00:04:05dot, dot, dot, dot, dot, forever and ever.
00:04:07Easy enough.
00:04:08Obviously, it's not gonna be next to bananas and apples,
00:04:10but that is the document to embedding process,
00:04:14as well as the chunking.
00:04:15Now, let's say you're over here, okay?
00:04:18You're our happy little guy over here,
00:04:20and you asked the large language model
00:04:21a question about warships.
00:04:24Well, that question in this rag system scenario
00:04:27is also going to be turned into a vector.
00:04:30So your question, you know, the LLM takes a look at it,
00:04:34and it assigns it a series of numbers
00:04:35that also correspond to some sort of vector
00:04:38in this database, okay?
00:04:41And so what it's going to do is it's gonna compare
00:04:43what your question vector is
00:04:45to the other vectors in the graph.
00:04:49It's looking at what's called cosine similarity,
00:04:51but all it's really doing is it's saying,
00:04:53hey, the question was about this.
00:04:55We're assigning these numbers.
00:04:56What vectors are closest to it?
00:04:58What numbers are closest to that question?
00:05:00Well, it's gonna be this one about warships
00:05:02and probably boats and ships.
00:05:04So it is now going to retrieve all those vectors
00:05:08with all their information,
00:05:10and it's going to augment the answer it generates for you,
00:05:13hence retrieval augmented generation.
00:05:16So instead of the large language model
00:05:17relying purely on its training data,
00:05:19it is able to go inside the vector database,
00:05:22grab the relevant vectors,
00:05:24bring them back and give you your answer about warships.
00:05:27That's how RAG works, right?
00:05:29Document ingestion, chunks turned into a vector.
00:05:32The vector isn't compared against the question being asked,
00:05:35brings the closest ones, ta-da, RAG.
00:05:39And that is naive RAG,
00:05:40and that actually really doesn't work very well at all.
00:05:44So smarter people than you have I
00:05:46have come up with better ways to do this,
00:05:49namely hybrid search and graph RAG and agentic RAG.
00:05:53What we're gonna focus on today is graph RAG.
00:05:55Now graph RAG goes through the same process.
00:05:57You're still gonna have that document.
00:05:58It's still going to get chunked.
00:05:59It's still going to be put in this flat vector database,
00:06:03but it's going to do one other thing.
00:06:05It's gonna create this knowledge graph as well.
00:06:07It's gonna create this crazy thing.
00:06:08So what is all this?
00:06:09What are all these vectors and lines?
00:06:11What does this actually mean?
00:06:12Well, all these vectors, these little circles,
00:06:14these are what is known as entities.
00:06:17And the lines that connect two entities
00:06:21are an edge or a relationship.
00:06:23So going back to our document example,
00:06:25imagine this document is about anthropic and clogged code.
00:06:28And the entire chunk that got pulled out said,
00:06:31anthropic created clogged code.
00:06:35It's gonna take that and it's gonna break it out
00:06:36into entities and relationships.
00:06:38What are the two entities?
00:06:39The entities are going to be clogged
00:06:41and are gonna be anthropic and clogged code.
00:06:44And the relationship is anthropic created clogged code.
00:06:48So you have anthropic right here
00:06:51and you have clogged code over here.
00:06:54And you can see this is an entity, this is an entity,
00:06:58and they have a relationship.
00:06:59On the visual graph, it's just a line,
00:07:03but under the hood coding wise,
00:07:05that line between these two entities
00:07:08has a bunch of texts associated to it
00:07:10explaining its relationship.
00:07:11And so in a graph rag system,
00:07:13it does that for each and every document you add to it.
00:07:16Imagine this times a thousand documents.
00:07:19This is with 10 documents,
00:07:21all of these relationships and all of these entities.
00:07:24And you can imagine how much more sophisticated that is
00:07:26than a bunch of random vectors
00:07:28just sort of siloed in a vector database.
00:07:30And so with a system like light rag,
00:07:33we get the creation of a knowledge graph
00:07:35as well as your standard vector database.
00:07:38It does both of these things in parallel.
00:07:40And so when you now ask a question
00:07:43about whatever it is to the large language model,
00:07:45it not only pulls that specific vector
00:07:47that it finds that's closest,
00:07:49it will also go down here and take a look at an entity.
00:07:54So let's say you asked about anthropic.
00:07:56Well, now it's going to traverse the relationships,
00:07:59the edges, and find everything that it thinks is relevant.
00:08:03So what this means for you, the user,
00:08:06with a graph rag system,
00:08:08I can now ask much more deeper questions,
00:08:11not just like about a document
00:08:13and essentially just doing control F
00:08:15for all types of purposes.
00:08:17I can now ask how different documents and different theories
00:08:19and different ideas relate to one another
00:08:21because those relationships are mapped, right?
00:08:24This is what it's all about.
00:08:25It's about taking disparate information and connecting them.
00:08:30That is the power of graph rag.
00:08:32That is the power of light rag.
00:08:33And that's what we're going to learn today.
00:08:35So installing and using light rag
00:08:37is as easy as you want it to be.
00:08:40I'm going to show you the easiest way
00:08:42where we are just going to take cloud code.
00:08:44We are going to give it the URL of light rag,
00:08:48and we're going to say, "Hey, set this up for us."
00:08:50And it's going to do essentially everything.
00:08:52In that scenario, we're just going to need a few things.
00:08:55Like you saw in sort of the breakdown of how rag works,
00:08:58we need an embedding model.
00:08:59So that is going to require an API.
00:09:02I suggest using OpenAI.
00:09:04They have a very effective embedding model.
00:09:07So you will need an OpenAI key.
00:09:09You do have the ability with light rag
00:09:11to make this an entirely local thing.
00:09:14So you could have a local model via Ollama
00:09:17that's doing all like the breakdowns with the embeddings,
00:09:20as well as the question and answer stuff.
00:09:21So understand that's an option too, going fully local.
00:09:24We're going to kind of do half and half.
00:09:25So we're going to set up an OpenAI embedding model
00:09:28as well as the model that's actually doing all the work.
00:09:31And then we also need Docker.
00:09:34So if you've never used Docker before,
00:09:35it's pretty easy to set up.
00:09:36You're just going to need Docker desktop,
00:09:39just download it, install it and have it running
00:09:41when you run light rag,
00:09:42because it is going to need a container.
00:09:45So what you're going to do now
00:09:46is you're going to open up cloud code
00:09:47and you're going to say, clone the light rag repo,
00:09:50write the .env file configured for OpenAI
00:09:53with GPT-5 mini and text embedding three large,
00:09:56use all default local storage
00:09:58and start it with Docker compose,
00:10:00and then give it the link to light rag.
00:10:02If you do that, it's going to do everything for you.
00:10:06I will put this prompt inside of the free school community,
00:10:10link to that in the description.
00:10:12Also, what's going to be there
00:10:13is I'll show you in a little bit,
00:10:15some skills related to cloud code and light rag
00:10:17to make it easier to sort of control it from cloud code.
00:10:19So you'll be able to find that there as well.
00:10:22And you knew it was coming.
00:10:22Speaking of my school,
00:10:24quick plug for the cloud code masterclass,
00:10:25which is the number one way to go from zero to AI dev,
00:10:28especially if you don't come from a technical background,
00:10:31the link to it is in the pinned comment.
00:10:33I update this quite literally every single week
00:10:35in the last two weeks,
00:10:36I've already added like an hour and a half
00:10:38of additional content.
00:10:39So definitely check it out
00:10:40if you're serious about mastering cloud code
00:10:42and AI in general.
00:10:44But again, if you're new, this is all a little too much,
00:10:46definitely check out the free school
00:10:47with tons of great resources for you
00:10:49if you're just starting out.
00:10:50And before you run this,
00:10:51just make sure you have Docker desktop running
00:10:53and have that open AI key ready
00:10:55and let cloud code go to work.
00:10:56Now once cloud code finishes installing it
00:10:58and you add your open AI key to the EMV file,
00:11:01you should see something like this.
00:11:02First of all, on your Docker desktop,
00:11:04you should see a container called LightRag up and running.
00:11:07And then cloud code should also give you a link
00:11:11to your local host, it should be 9621.
00:11:13And it'll take you to a page that looks like this.
00:11:15This is the web UI for LightRag.
00:11:18And it's here where we can upload documents,
00:11:21we can look at the knowledge graph, we can retrieve things,
00:11:24and we can also take a look
00:11:25at all the different API endpoints,
00:11:28which will come in handy later.
00:11:30And what you see here are the documents
00:11:31I've uploaded for this video.
00:11:33To upload documents is very, very simple.
00:11:35We're just gonna come over here to the right
00:11:36where it says Upload, and then you're going to drop 'em in.
00:11:39Now understand there's only certain types of documents
00:11:42we can put in here, right?
00:11:43Text documents, PDFs, essentially,
00:11:46you're limited to text documents.
00:11:49Now there's a way to get around this,
00:11:51namely with things like images and charts and tables
00:11:56and that sort of thing.
00:11:57And we'll talk about that at the end
00:11:59because it's a little outside the scope,
00:12:00but we will learn about it.
00:12:02So drop whatever documents you want into here,
00:12:04and then you will be able to see their status
00:12:07as they're uploaded.
00:12:08It'll take a little bit because again,
00:12:10it's building the knowledge graph as it does this.
00:12:12So this can take a while.
00:12:14And if for whatever reason you're on the knowledge graph page
00:12:16'cause this can kinda happen and it says like,
00:12:18"Hey, it didn't load," or whatever,
00:12:19you just reset it by hitting this button
00:12:21over here on the top left.
00:12:23If you come over to the Retrieval tab,
00:12:25that's where you can ask questions
00:12:27about your knowledge graph to the large language model,
00:12:30which in this case is probably OpenAI
00:12:31if you use the same key for embedding.
00:12:33And over here on the right, we have some parameters.
00:12:36Honestly, off the bat, there's too many you need to change.
00:12:39And in a second, I'll show you how Claude code can do it.
00:12:42But as you ask your questions, like for example,
00:12:44I had a bunch of AI and RAG documents in there.
00:12:47I said, "Hey, what's the full cost picture
00:12:48of running RAG in 2026?"
00:12:50It gives me a pretty sophisticated response.
00:12:53And on top of that, it also gives you the references
00:12:56for everything it's doing, right?
00:12:57See four, three here, two,
00:13:00'cause at the bottom of the page,
00:13:01it will actually give you the references
00:13:03for the documents it grabbed.
00:13:05And obviously inside of our knowledge graph, right,
00:13:07we explain entities and relationships.
00:13:09If I click on one of these entities like OpenAI, for example,
00:13:12I can see some of the properties.
00:13:14So it does more than just pull relationships and entities
00:13:17in the embedding process with LightRag.
00:13:19It actually goes a little deeper and it was like,
00:13:20"All right, what kind of type of entity is it, right?
00:13:22Is it an organization or a person?"
00:13:25It has the specific files it grabbed
00:13:27as well as like chunking IDs.
00:13:29And then you can see the actual relationships
00:13:31down at the bottom right.
00:13:32I'll move this for a second.
00:13:33So down here on the bottom right,
00:13:35if you can't visually see it,
00:13:36'cause it can get kind of like clumped up on the graph,
00:13:40you can actually just like click here
00:13:41and it will take you to them as well.
00:13:43So this server API is what we're going to be using
00:13:46to actually connect this thing to Clod code.
00:13:48Because as great as this is,
00:13:50like I'm not really going to be sitting here
00:13:51every single time I wanna ask a question
00:13:53about my knowledge graph via the retrieval tab.
00:13:56That's too much of a pain in the butt.
00:13:57So instead, we're just going to use these APIs.
00:14:00Now, every single one of these APIs, right,
00:14:03as a description, you can see the parameters and stuff,
00:14:05every one of these APIs can be turned into a skill, right?
00:14:08And that's what I'm about to do and show you here today.
00:14:11That way, when you want Clod code to use light rag,
00:14:15well, we just go inside of Clod code, wherever we are,
00:14:17and say, "Hey, I wanna use the light rag query skill
00:14:19and ask question, blah, blah, blah, blah, blah."
00:14:22It's the same thing as if you were here
00:14:23in the retrieval tab and asked your question.
00:14:26And better yet, Clod code will kind of take the response
00:14:28it gives you and summarize it
00:14:30because these responses can be pretty in depth
00:14:32off the rip when it comes to light rag.
00:14:34But if you just want the raw answer,
00:14:36you can set that up as well.
00:14:37Point is, even though this has a web UI,
00:14:40you never really have to interact with it
00:14:41if you don't want to.
00:14:42And it's really easy to bring it
00:14:44into our Clod code ecosystem.
00:14:46So the four big skills I think you'll use the most
00:14:48are query, upload, explore, and status.
00:14:51All four of these will be inside the free school as well.
00:14:55But what are you gonna be doing mostly?
00:14:56You're gonna be adding new documents
00:14:58and you're gonna be asking questions about those documents.
00:15:01And you'll probably wanna know,
00:15:02"Hey, what did I actually put in there?"
00:15:04'Cause after you have a ton of documents,
00:15:05you kind of wanna avoid putting in the same ones
00:15:07over and over and over again.
00:15:08And so if I ask the same question inside of Clod code,
00:15:12I've just invoked the light rag query skill,
00:15:14it's setting that request off to light rag,
00:15:18which again, is hosted on our computer,
00:15:21it's running inside that Docker container,
00:15:22and it's gonna bring the response back.
00:15:24Now you aren't limited to this semi-local system.
00:15:28If you are someone who's scaling really, really hard
00:15:30with light rag, you can host this
00:15:33on a standard Postgres server.
00:15:36You have a lot of options, you could use something like neon.
00:15:38So it kind of goes the full gamut.
00:15:40You can go fully local or you can push all this off
00:15:43to the cloud if you want to as well.
00:15:44Light rag is very, very customizable.
00:15:46And here's the response Clod code came back with,
00:15:48which again, is a summary of the raw response
00:15:52that light rag gave us, and it also quotes its sources.
00:15:55I also asked it for the raw response
00:15:57because you can get that as well,
00:15:58because it just brings it back to Clod code
00:16:00in a JSON response.
00:16:02So that's all this is.
00:16:04And then again, it also has the references if you want them.
00:16:07So like you just saw, super easy to install light rag
00:16:10and very simple to integrate it into your Clod code workflow.
00:16:14Now the question becomes, okay, Chase, sounds great.
00:16:18I get conceptually that if I have a ton of documents,
00:16:20I should maybe be using this.
00:16:22Well, where's the line in the sand?
00:16:23When should I start integrating light rag?
00:16:26Well, there's not an exact number to this.
00:16:28Gray area is, I would say somewhere between like 500
00:16:33and 2000 pages were the documents.
00:16:36I don't want to just say documents
00:16:37'cause who knows how large those are gonna be,
00:16:39but like 500 to 2000 text pages.
00:16:42At that point at 2000, you're starting to get
00:16:44into like a million tokens.
00:16:47Beyond that, it probably makes sense for sure
00:16:50to start integrating light rag,
00:16:52because the thing is the way rag is set up,
00:16:54it's gonna be cheaper and faster to do that
00:16:57than just relying on standard grep from Clod code.
00:17:00A Gented grep, the way Clod code searches files
00:17:03already is great.
00:17:04Like there is a reason Clod code chose to do that.
00:17:07However, it wasn't under the assumption you had 2000 pages
00:17:12of documents or 4000 or 5000, right?
00:17:14There is an upper limit.
00:17:16The nice thing is you don't have to necessarily have
00:17:19that decision like set in stone, as you saw,
00:17:22it is very easy to implement this.
00:17:24So just experiment.
00:17:26If you feel like you have a ton of documents and it's like,
00:17:28hey, should we be using rag at this point?
00:17:30Well, I don't know, try it out.
00:17:32It doesn't take long to do.
00:17:34The most painful part is the embedding process.
00:17:36That can take a minute for sure, but it's not debilitating.
00:17:40And the cost isn't insane, especially with the light rag.
00:17:43If you compare this again to other graph reg systems
00:17:45like Microsoft graph reg, this is a small,
00:17:48small percentage of the cost.
00:17:49And at the very large document sizes,
00:17:52the cost with rag versus the cost with something like rep
00:17:56is to the tune of a thousand times cheaper.
00:17:58There was a study done the last summer
00:18:04that it was 1250 times cheaper to use rag
00:18:07in those sorts of situations.
00:18:08You can see that right here with textual rag
00:18:10versus textual LLM, as well as the actual response time.
00:18:14Now, full disclosure, this was from July of last year.
00:18:19So the models have changed.
00:18:20I highly doubt it's as insane of a difference
00:18:23when we compare rag versus your standard tech situations.
00:18:26And this was also a Gemini 2.0.
00:18:28We weren't talking about a harness.
00:18:29So a lot of things have changed,
00:18:31but has it changed to close the gap by 1250 X?
00:18:36Maybe, maybe not.
00:18:39I don't think so.
00:18:40Either way, just try it out.
00:18:42I don't think there's much to lose.
00:18:44The other thing with light rag is the idea that,
00:18:46hey, if I want to upload documents,
00:18:48we talked about this a little bit earlier.
00:18:49What do we do if we again have like tables, graphs,
00:18:53stuff that isn't text?
00:18:54Can light rag handle this?
00:18:57Not exactly, but we can fix that.
00:18:59And the answer is rag anything
00:19:02from the same exact makers as light rag.
00:19:04And this is something that can essentially be multimodal.
00:19:07And it's something we can pretty much plug
00:19:09right on top of light rag.
00:19:10Now, I hate to disappoint you,
00:19:13but that is gonna be outside of today,
00:19:15the scope of today's video.
00:19:17However, tomorrow's video,
00:19:18what do you think we're gonna do?
00:19:19Tomorrow, we're gonna be going through rag anything
00:19:22and showing essentially how you can integrate it
00:19:25into what we built with light rag.
00:19:27So it'd be kind of a great one, two punch.
00:19:28So if that's something you're interested in,
00:19:31like and subscribe,
00:19:32because we're gonna be going over it tomorrow.
00:19:34And on that note,
00:19:35this is where we're going to kind of wrap up.
00:19:39Hope you enjoyed it.
00:19:41This is my first video too, with this new camera set up.
00:19:43The lighting, I can already tell is not,
00:19:46not exactly where I wanted it to be.
00:19:48So apologize for all that.
00:19:49Still working out the kinks,
00:19:50just glad it was working at all
00:19:52and the camera didn't overheat in the middle of this thing.
00:19:55But yeah, all the skills are inside of the free school.
00:19:58The rag stuff is super interesting, especially light rag.
00:20:01It's been a great product.
00:20:02I've been using it for quite a while.
00:20:03I'm so 100%, 100% check this thing out.
00:20:06And it's so easy to integrate
00:20:07inside a collage code like you saw.
00:20:08So check out the free school for the skills,
00:20:12as well as the prompt if you need it.
00:20:14To be totally honest,
00:20:15if you just point cloud code at light rag,
00:20:16it will set it up just fine on its own.
00:20:19But other than that,
00:20:20make sure to check out Chase AI Plus
00:20:21if you wanna get your hands on that masterclass.
00:20:24And I'll see you around.