Claude Code + RAG-Anything = LIMITLESS

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Almost every rag system suffers from the exact same problem.

00:00:04They can only handle text documents.

00:00:06So if you try to give it images, charts, graphs, whatever,

00:00:10most rag systems just can't handle it.

00:00:12And when I showed you light rag yesterday,

00:00:13it suffered from the exact same problem.

00:00:16But today I'm gonna show you the fix.

00:00:19And that fix is rag anything.

00:00:20Rag anything solves this document problem for us.

00:00:23It can handle images.

00:00:24It can handle charts.

00:00:25It can handle graphs.

00:00:25And it allows us to create a rag system

00:00:28that actually deals with the documents you use.

00:00:31Rag anything is from the same team that built light rag.

00:00:34It plugs in directly into the light rag system

00:00:36we already built yesterday.

00:00:37So it's really easy to introduce this into our stack.

00:00:40And so today I'm gonna show you exactly how to set it up

00:00:43and how it works under the hood.

00:00:44So you can begin using one of the most powerful

00:00:46rag systems out there.

00:00:48So in case it wasn't obvious enough from the opener,

00:00:50I'm going to assume you've already watched

00:00:52yesterday's light rag video.

00:00:54I'll put a link above if you haven't done that already,

00:00:56because today I'm going to assume you've already set up

00:00:58your light rag server.

00:00:59You understand how rag works and you understand

00:01:02this whole knowledge graph thing.

00:01:03Because rag anything is essentially going to be a wrapper

00:01:06around light rag.

00:01:07We're still gonna have the same light rag web UI

00:01:10with some differences,

00:01:11but everything that gets pushed into rag anything,

00:01:13you know, these non-text documents,

00:01:15eventually find their way to the same knowledge graph.

00:01:17We're gonna be asking it the same questions.

00:01:19We're gonna be using the same API to query it

00:01:22through Claude code that we did yesterday.

00:01:24And the functionality we are going to be adding today

00:01:26is significant.

00:01:28It's not enough to build a rag system that is purely text.

00:01:30We don't operate in a world that's purely text.

00:01:32How many of you have been given a PDF document

00:01:34that isn't even technically text, it's just scanned.

00:01:36Light rag can't really handle that, rag anything can.

00:01:39Now we will go a little technical today.

00:01:40We'll get under the hood and I'll explain exactly

00:01:43how this whole system works.

00:01:44But big picture, what is it doing?

00:01:46Rag anything is just looking at the documents

00:01:49that aren't text.

00:01:50It's basically doing exactly what light rag does,

00:01:52except to these non-text documents.

00:01:55And after it creates its own knowledge graph

00:01:56and its own vector database,

00:01:58it merges it with the light rag one,

00:02:00which is why everything ends up being in one nice,

00:02:04neat little place for us to ask questions about.

00:02:06Now, the only downsides about rag anything

00:02:08is it's a bit heavier.

00:02:09We have to download some models that live on our computer

00:02:12that help parse some of these non-text documents.

00:02:14And when it comes to actually ingesting non-text documents,

00:02:18we can't do it really through the light rag UI.

00:02:22We have to use a script.

00:02:23Luckily, this is where Claude code comes in.

00:02:25So for you, the user, after you've set all this up,

00:02:28all you have to do to ingest non-text documents

00:02:31is tell Claude code, hey, go ahead,

00:02:33use the rag anything skill and ingest this document.

00:02:36It's that simple.

00:02:37And you ask the questions the same way you did before.

00:02:39So really not too bad.

00:02:40And again, you get all this functionality just by doing that.

00:02:43Now, before we go into how rag anything actually works,

00:02:46just want to give a quick plug for my Claude code masterclass

00:02:49just came out a couple of weeks ago,

00:02:50and it's the number one place to go from zero to AI dev,

00:02:53especially if you don't come from technical background.

00:02:55I update this literally every week.

00:02:57There's a new update coming tomorrow.

00:02:59So if you're someone who is really trying to master

00:03:01Claude code and has no idea where to start,

00:03:03well, this is for you.

00:03:05There's a link to that in the comments.

00:03:07It's inside chase AI plus.

00:03:09I also have the free chase AI community.

00:03:11If this is just too much for you,

00:03:12you're just getting started.

00:03:14Link to that is in the description.

00:03:15That is where you also will find the prompts and the skills

00:03:19that I'm going to talk about today.

00:03:20So make sure you check that out regardless.

00:03:22Now let's talk about rag anything

00:03:23and how this thing actually works.

00:03:25To be honest, it's pretty simple, pretty self-explanatory.

00:03:28So not to waste your time,

00:03:29I'm just going to keep this image up for like 10 seconds,

00:03:32and then we'll move on to the next thing.

00:03:34All right, pretty good.

00:03:39All right, let's move on.

00:03:41I'm just kidding.

00:03:42It's actually a bit going on.

00:03:44This image makes it more confusing than it actually is.

00:03:46And if you understand what we did the other day with light rag,

00:03:50remember all this conversation, you're going to be good.

00:03:52Rag anything kind of operates in a similar fashion,

00:03:55just with a few extra steps.

00:03:56And I want to go through,

00:03:57'cause I think it's important to understand

00:03:58how these things work.

00:04:00I think in AI in general,

00:04:01it's easy to become like super practical focus.

00:04:04Like I just want to know how I install it, chase,

00:04:05and then how to use it.

00:04:06That's fine, you can skip ahead if that's you.

00:04:08But I think if you want to become a more mature AI dev

00:04:11and you kind of want to separate yourself

00:04:13from the monkey I could replace you with,

00:04:15that just hits accept, accept, accept, and copies,

00:04:17prompts, and skills,

00:04:18then I think it's important to have some, you know,

00:04:21understanding of architecture,

00:04:22'cause this is what's going to separate you

00:04:23from other people.

00:04:24And not just in terms of like how you can use this rag system,

00:04:27but in higher level, bigger projects, right?

00:04:30This is how you begin to sort of like create your own skills,

00:04:34like actually become good at this stuff.

00:04:35So let's talk about it.

00:04:37So rag anything.

00:04:38Let's talk about the problem, right?

00:04:40The problem is I have a PDF that is a scanned PDF

00:04:44and it's not really text,

00:04:45and yet I need to put it into my rag system.

00:04:46Light rag can't handle it.

00:04:48So in comes rag anything, right?

00:04:51It's got the cool llama with the six shades.

00:04:53So the first thing that happens

00:04:56is I'm going to ingest this document into rag anything.

00:05:00And the first thing it's going to do

00:05:02is it's going to use a program called Minor U,

00:05:05which runs on your computer completely locally for free.

00:05:08And it's going to essentially break down this document

00:05:11into its component parts.

00:05:12Minor U is an open source project.

00:05:14Again, it's essentially a document parser

00:05:16that includes a bunch of like miniature specialized models.

00:05:19All you need to know is if you're scared of this,

00:05:21it's open source.

00:05:22I'll put a link down below.

00:05:23And again, this is what's going to be running

00:05:25and doing most of the work for us today.

00:05:26So Minor U is looking at this document and it says,

00:05:29"Okay, this is a header."

00:05:32It creates a box around the header.

00:05:33It says, "This is text."

00:05:36It says, "This is a chart."

00:05:39It says, "This is an image of a bar graph."

00:05:41And it says, "This is an equation written in latex."

00:05:44What it's done is it's looked at the document

00:05:47and it's broken it out, okay, into its special parts.

00:05:50Minor U doesn't understand what's inside here.

00:05:52Minor U isn't reading the text.

00:05:53It doesn't get the text.

00:05:55It doesn't understand what the chart is about.

00:05:56It just knows chart, text, image, okay?

00:06:01From there, it's going to send these component parts

00:06:05to individual specialized models that are part of Minor U.

00:06:10So this is all invisible to you.

00:06:12This is all happening automatically under the hood.

00:06:15So the model, one of the models is called like Paddle OCR.

00:06:20That's what's going to look at the text.

00:06:21So Minor U is sending this text block to Paddle OCR

00:06:24on your computer, and it's going to pull out the text, okay?

00:06:28So now instead of being scanned text,

00:06:30it's actual text that reads company X reported strong Q323.

00:06:34Results with revenue growth, blah, blah, blah, blah, blah.

00:06:36Right? Same for this text.

00:06:40Same for the chart, right?

00:06:41It's also going to turn it into text, right?

00:06:43Something an LLM can handle.

00:06:45Same thing with latex equations.

00:06:47It has a whole model that handles that, right?

00:06:48This is now, no longer latex, it's actually text.

00:06:52Except for images.

00:06:54So whether this is a bar chart or just,

00:06:57it's really anything that it can't transform to text.

00:07:00What it's going to do instead

00:07:01is it's going to take a screenshot of it,

00:07:03and this is important, all right?

00:07:05So now this is a screenshot.

00:07:07It's an image, screenshot. Love that.

00:07:11So what do we have?

00:07:13We inserted a non-text document.

00:07:16It's been identified into its component parts,

00:07:18and we've taken those component parts

00:07:20and we've broken it down into two buckets, right?

00:07:22We have the text bucket and we have the image bucket.

00:07:26It's important to realize this.

00:07:28There's two paths that can go down, image or text.

00:07:31All right, you with me?

00:07:32So what it's going to do now

00:07:34is we're done using these internal models.

00:07:36Now we need to bring in the big boys.

00:07:37Now we need to bring in something like GPT 5.4 Mini.

00:07:40Of note, that isn't necessarily the case.

00:07:42You could keep this all local if you wanted to.

00:07:44You could do something like Ollama.

00:07:45So now I take the text bucket and I push it to GPT 5.4 Mini.

00:07:50And I include a prompt that says,

00:07:52I want you to break out this text for two things.

00:07:55I want you to take that text

00:07:57and break it out into entities and relationships.

00:08:01Remember entities and relationships?

00:08:03Remember our knowledge graph?

00:08:05Entity, entity, and sort of the relationship between them.

00:08:09Okay, and I want you to break it out

00:08:13into what will be embeddings for a vector database.

00:08:17So embeddings, embed,

00:08:21and then I'm just going to say entities plus relationships.

00:08:26Now, thinking ahead, what's going to happen there?

00:08:29Well, the embeddings are going to become embeddings

00:08:32in a vector database and the entities and relationships

00:08:35are going to become a knowledge graph,

00:08:37just like we did with LightRag, right?

00:08:39Same thing, same thing, except now,

00:08:42now it's from the text bucket.

00:08:44But what about those images we had, right?

00:08:47What are we going to do with these guys?

00:08:48Same thing, this is going to get pushed to 5.4 as well,

00:08:52but it's going to be as a screenshot, as an OCR.

00:08:55So we're telling GPT 5.4, take a look at this screenshot

00:08:59and break it out into two things, right?

00:09:02Embeddings and also entities plus relationships.

00:09:06Now, why do we do that?

00:09:07Why don't we just shove it all into the same exact prompt

00:09:09and have it just OCR this entire thing, right?

00:09:12Why don't we just treat this entire thing as a screenshot?

00:09:14Because it's expensive and slow.

00:09:16What RAG-anything decided to do,

00:09:17and I think it's kind of smart,

00:09:19is it kind of takes a scalpel to this on your computer

00:09:21at the local level, breaking it out into text,

00:09:24breaking it out into screenshots.

00:09:25So when we go through these two paths,

00:09:27you're saving a ton of money and time.

00:09:29Because imagine you were trying to have ChatGPT

00:09:31look at 10,000 screenshots and then break out all the text

00:09:34and from the text, break it out into embeddings

00:09:36and entities and relationships.

00:09:37It takes a lot of time and money.

00:09:38This is smarter.

00:09:40So entities and relationships from the image side,

00:09:44same exact thing.

00:09:45It also gets a vector database

00:09:49and it also gets a knowledge graph.

00:09:52So what does that mean?

00:09:53That means from one document,

00:09:55we've now created four kind of things, right?

00:09:59We have two vector databases

00:10:02and we have two knowledge graphs

00:10:04from our single non-text document.

00:10:08You with me?

00:10:09Now, what do we have to do?

00:10:10Well, it's kind of obvious.

00:10:11We need to merge these.

00:10:12So it's going to take these four things

00:10:15and just push them together, right?

00:10:18They're gonna pretty much overlay on top of one another.

00:10:19It's going to match them based on entities, essentially.

00:10:22And you're just gonna get at the end,

00:10:27one vector database and one knowledge graph.

00:10:31Pretty much the exact same thing

00:10:32we did up here with light rag.

00:10:34Simple enough.

00:10:35If we were just using rag anything,

00:10:38that would kind of be the extent of it.

00:10:40However, remember we're trying to lay rag anything

00:10:44on top of light rag.

00:10:46I want all the power of light rag

00:10:48and I want all the power of rag anything.

00:10:50So what happens now?

00:10:52Well, what happens is just a repeat of what you just saw.

00:10:54So let's kind of bring this guy down.

00:10:55So now we have our rag anything set

00:11:00with a vector database and a knowledge graph

00:11:05and we have our light rag set.

00:11:06So what do we do?

00:11:07We just merge those together.

00:11:09So then what happens is we get the rag everything

00:11:13and the light rag combined,

00:11:15which gives us finally one vector database

00:11:20and one knowledge graph.

00:11:21And from there, it's just like it was before

00:11:24with light rag on its own, right?

00:11:27You ask a question about whatever,

00:11:31that question gets turned into a vector up here.

00:11:33It pulls the relevant vectors

00:11:35and then it also goes down here,

00:11:37finds the correct entity

00:11:39and then takes a look at what's nearby, okay?

00:11:43Maybe that was a little confusing.

00:11:44I hope I explained that okay.

00:11:46The kind of recap to confuse you even more.

00:11:51What happens when I add a document that cannot be text?

00:11:54It goes into rag anything.

00:11:56Rag anything breaks out what text it can

00:11:58and then breaks out what images it can as well.

00:12:00It sends both of those to chat GPT

00:12:02or whatever AI system you want.

00:12:05It breaks that out into embeddings,

00:12:07entities and relationships.

00:12:09Those get turned into knowledge graphs and vector databases.

00:12:13We then merge those together.

00:12:15We now have one vector database

00:12:17and one knowledge graph for rag anything.

00:12:19And since we've already been running this in light rag,

00:12:22or if you've added any more documents on top of that,

00:12:24you have an existing vector database

00:12:27and an existing knowledge graph.

00:12:29To solve that, we simply merge them.

00:12:32And in the end, you didn't notice a dang thing.

00:12:35Again, as the user, all of this is invisible to you, okay?

00:12:39None of this really matters to you.

00:12:41The only thing that might matter to you

00:12:42is what's happening over here with GPT 5.4

00:12:45'cause it's gonna cost you some money.

00:12:47But for educational purposes,

00:12:50that is how the rag anything system

00:12:53integrates with the light rag system.

00:12:55And at the end of the day,

00:12:57it just means that you have a rag system

00:12:58that can handle non-text documents.

00:13:00And if you're still around after all that,

00:13:03now we can go into how you actually install this thing

00:13:07and use it.

00:13:08So now let's talk about the install

00:13:09and how to actually use it

00:13:10and a couple of things you need to watch out for.

00:13:11So I created a one-shot prompt that you can give Claude code

00:13:14that will install everything for you

00:13:17and update the proper models and all of that.

00:13:19All you need to do is just make sure

00:13:20you're in your light rag directory when you run this.

00:13:23So there's really three things it's going to be doing.

00:13:25First of all, it's going to make sure

00:13:27we update that correct storage path

00:13:29since you already have a Docker light rag instance running.

00:13:32Two, we want to update the model

00:13:33because based on the GitHub,

00:13:34it was created a little while ago originally.

00:13:37So all the example scripts and all that

00:13:39use things like GPT 4.0 mini.

00:13:41So I have it on 5.4 nano.

00:13:43Understand you can change that if you want to.

00:13:45But I had it use 5.4 nano as well as keep text

00:13:48embedding three large so that we can just use open AI

00:13:51for everything.

00:13:51It just keeps it simple, play with it as you wish.

00:13:54Lastly, since we're using rag anything

00:13:55as essentially a wrapper on top of light rag,

00:13:58some of the example scripts given in the GitHub repo

00:14:02are kind of wrong.

00:14:03So there's like this embedding double wrap bug,

00:14:05which again, we just tell Claude code to fix

00:14:08and it will fix it.

00:14:09So you're just going to use this prompt.

00:14:12Again, it is inside the free school community.

00:14:14Link is in the description.

00:14:15Just look up rag anything and you will find it there.

00:14:18And once you run that prompt,

00:14:19it will begin downloading everything

00:14:21and understand it's a little heavier

00:14:22'cause it needs to download minor you

00:14:23and all those dependencies as well.

00:14:25Now let's talk about ingesting documents

00:14:26'cause this is kind of annoying and a pain in the butt.

00:14:28In a perfect world, the light rag plus rag anything situation

00:14:33would be very streamlined and I could dump

00:14:35whatever I wanted to into light rag slash rag anything

00:14:40through a singular interface.

00:14:41I could come into the UI, I could go to upload

00:14:44and I could do that.

00:14:45You really can't with rag anything with light rag.

00:14:48You can still do this for text documents.

00:14:50So you can still do the normal workflow

00:14:52that I showed in the previous video where you go to the UI

00:14:54or you use the light rag skill to upload documents.

00:14:59You can't do that with rag anything.

00:15:01It has to go down essentially a different tunnel,

00:15:04a different pathway.

00:15:05But that different pathway with rag anything

00:15:07is a Python script.

00:15:09There's no UI, there's no button to press.

00:15:11It's literally a script.

00:15:12It's code you have to run.

00:15:14Now, luckily this is where Claude code comes in

00:15:16and makes it very simple because we're just going to turn

00:15:19that script inside of the repo into a skill.

00:15:23So for you, once that skill is created,

00:15:25all you have to do is say, Claude code,

00:15:28use the rag anything skill to upload all these documents,

00:15:32all these non-text documents.

00:15:33And when it does that,

00:15:34it will go through the minor you process.

00:15:36It will take some time because it has to do all these,

00:15:39you know, things to it like we explained

00:15:41in the kind of technical section,

00:15:43but it will upload it to light rag

00:15:45and it will show up inside of your documents

00:15:47and inside of your knowledge graph.

00:15:49Okay, that's the only weird part you need to know.

00:15:51The other weird part, to be honest, is once you do that,

00:15:54it also requires you to restart the darker container,

00:15:58but as part of the skill that happens automatically.

00:16:00So again, from your point of view as the user,

00:16:03the only difference is you just need to invoke the skill.

00:16:06Now this skill, the rag anything upload skill

00:16:08is also inside the free community.

00:16:10So just download it and then put it in your doc Claude folder

00:16:13and then it will work just fine.

00:16:14Now, the one note on minor you taking a while,

00:16:17that's because the way rag anything works

00:16:19when you download it, it's going to run on your CPU.

00:16:22If you want it to run on your GPU,

00:16:24you have to have a different version of PyTorch.

00:16:27If that all went over your head,

00:16:29just if it's too slow for you, just tell Claude code,

00:16:32hey, can we run PyTorch?

00:16:34Can we run minor you on our GPU?

00:16:36And it will walk you through it.

00:16:37Or in fact, it'll just do it all on its own.

00:16:39But by default, it's just gonna run on your CPU.

00:16:41So just know that.

00:16:42So let's see an example of this in action.

00:16:44So one of the documents we ingested was

00:16:48this PDF of Novatech, right?

00:16:50SAS revenue analysis.

00:16:51It's totally fake.

00:16:52But the point is we ingested something

00:16:55that has this sort of bar chart, right?

00:16:57So this is something that obviously would have been pulled out

00:16:59as an image sent to chat GPT, yada, yada, yada.

00:17:01Normally light rag wouldn't be able to handle this

00:17:03because it's just an image.

00:17:05It's charts, it's hard for it to sort of break that out.

00:17:07But since we ran this through rag anything,

00:17:10we can now ask a question via Claude code about this.

00:17:13So I asked Claude code,

00:17:14can we query our light rag database

00:17:15about the monthly revenue trend for Novatech Inc

00:17:18for January through September, 2025?

00:17:20You can see here, it actually didn't even use the skill.

00:17:22It just straight up did the API request,

00:17:24which is fine as well with the query.

00:17:26What was the monthly revenue trend for Novatech Inc

00:17:29from blah, blah, blah, blah, blah.

00:17:30Now it gave a full response.

00:17:32So I can take a look at the raw response if I wanted to.

00:17:35But what did it do?

00:17:36It came back with the full monthly breakdowns.

00:17:39We see January 4.6, 4.6, February 4.9, 4.9,

00:17:43March 5.4, 5.4, on and on and on.

00:17:46So in terms of asking questions about these new documents,

00:17:48same thing as before.

00:17:49The only difference is the upload.

00:17:51All you need to do is to invoke that skill

00:17:53that I'm giving you and then tell Claude code

00:17:55what you wanna put in there.

00:17:56You could point it at a whole folder.

00:17:58You can point it at a specific download.

00:18:00It's just as easy.

00:18:01This is the only really weird thing you've gotta get used to

00:18:04is these two upload paths.

00:18:05But the actual question and answer,

00:18:07it's just plain language.

00:18:09Plain language, even if you have the skills as well,

00:18:11which I also gave in the last video,

00:18:13but Claude code's also smart enough

00:18:14to understand the API structure of this whole thing.

00:18:17'Cause it's local, it's on your computer.

00:18:19So that's really it when it comes to rag anything.

00:18:21I know the majority of this video

00:18:22was focused sort of on the technical aspects,

00:18:24but as you see, once we built that light rag foundation,

00:18:28actually adding rag anything on top of it isn't too hard,

00:18:32especially if we just use that one shot prompt I gave you.

00:18:35There are some things you can tweak along the edges

00:18:37like anything when it comes to querying it,

00:18:39but really with Claude code,

00:18:41it's kind of in charge of all the weights

00:18:43that you can tune inside of light rag.

00:18:45And for that, I'm talking about

00:18:45if we go to the retrieval section,

00:18:47all the parameters here on the right.

00:18:49Again, Claude code knows which ones tend to be best for you.

00:18:52So overall, I hope this kind of explained

00:18:56how easy it is to set up rag anything,

00:18:58and also how easy it is to add this level of functionality

00:19:02to your rag systems,

00:19:03which in many rag systems just isn't possible

00:19:05or it's very expensive.

00:19:06And this is relatively cheap,

00:19:08especially with that whole minor U local parsing system

00:19:11we're able to set up.

00:19:12So as always, let me know what you thought.

00:19:14Make sure to check out Chase AI+

00:19:16if you wanna get your hands on that Claude code masterclass,

00:19:18and I'll see you around.

Key Takeaway

RAG-Anything overcomes the text-only limitation of standard RAG systems by using local vision-parsing models to transform charts and scanned images into a unified knowledge graph and vector database searchable through Claude Code.

Highlights

RAG-Anything enables LightRAG to process non-text documents including scanned PDFs, charts, graphs, and LaTeX equations.
The system uses MinerU, an open-source local document parser, to decompose documents into headers, text blocks, and screenshots of visual elements.
Processing costs are reduced by using local models like PaddleOCR for text extraction while reserving high-tier LLMs like GPT-4o-mini for entity and relationship extraction.
Non-text documents follow a dual-path pipeline that creates separate vector databases and knowledge graphs for text and images before merging them into a single searchable index.
Integration with Claude Code allows users to ingest complex documents via a single terminal command using a custom 'rag-anything' skill.
A Novatech SAS revenue analysis test demonstrated the system's ability to accurately extract and query monthly revenue trends from a visual bar chart.

Timeline

Limitations of Text-Centric RAG Systems

Standard Retrieval-Augmented Generation (RAG) systems fail when encountering non-textual data like charts and graphs.
Scanned PDF documents are often unreadable to text-based RAG architectures like LightRAG.
RAG-Anything acts as a specialized wrapper to bridge the gap between visual documents and text-based knowledge graphs.

Most existing RAG solutions are restricted to digital text formats, making them ineffective for real-world business documents that rely on visual data representation. LightRAG specifically struggles with these formats despite its advanced knowledge graph capabilities. RAG-Anything addresses this by plugging directly into the existing LightRAG stack to expand its ingestion capabilities to include any document format.

Local Document Parsing with MinerU

MinerU runs locally on the user's computer to identify and box different document components.
The parser distinguishes between headers, standard text, charts, bar graphs, and mathematical equations.
Understanding document architecture is a prerequisite for creating custom AI skills and advanced developer projects.

The big-picture goal of this system is to look at documents that are not natively text and break them down into their fundamental parts. MinerU is an open-source project that utilizes specialized miniature models to categorize every element on a page. This local step ensures that subsequent processing is handled by the most efficient model for each specific data type.

The Dual-Path Ingestion Pipeline

Textual components are processed via PaddleOCR to convert scanned pixels into readable strings.
Visual elements that cannot be converted to text are captured as screenshots for vision-capable LLMs.
A scalpel-like approach to data extraction prevents the high costs and slow speeds associated with full-page OCR via expensive APIs.

The system splits data into two buckets: text and images. The text bucket is sent to an LLM like GPT-4o-mini to extract entities and relationships for a knowledge graph and generate embeddings for a vector database. Simultaneously, the image bucket is sent to a vision model to perform the same extraction on screenshots. This parallel processing ensures that information hidden in charts is treated with the same structural importance as written prose.

Merging Knowledge Graphs and Vector Databases

Four distinct data objects are created from a single non-text document: two vector databases and two knowledge graphs.
Merging these objects results in a single, unified searchable index where text and visual data overlay.
The final merged database allows for natural language queries that draw from both textual descriptions and visual chart data.

After the LLM processes both the text and image buckets, it generates two sets of embeddings and relationship maps. These are merged based on shared entities to create a comprehensive representation of the document. For users already running a LightRAG server, this new RAG-Anything data is further merged into the existing global knowledge graph, making the entire ingestion process invisible during the query phase.

Installation and Claude Code Integration

A one-shot prompt in Claude Code automates the installation of MinerU and its heavy dependencies.
The system defaults to running on the CPU but can be transitioned to the GPU by updating PyTorch versions.
Standard LightRAG UIs do not support non-text uploads, requiring a Python script or custom Claude Code skill.

Setting up the environment involves updating storage paths and fixing common 'embedding double wrap' bugs found in the original GitHub repository. Because RAG-Anything operates as a wrapper, it requires a separate pathway for ingestion compared to standard text files. By converting the ingestion script into a Claude Code skill, users can upload complex documents simply by pointing the AI to a specific file or folder in the terminal.

Practical Application and Revenue Analysis Test

A query regarding a fake company's SAS revenue successfully extracted data points directly from a bar chart.
The system returned specific monthly figures, such as $4.6M for January and $5.4M for March, which were only present in image form.
Claude Code automatically manages retrieval parameters and API structures for local RAG instances.

The system was tested using a Novatech SAS revenue PDF containing visual charts that a standard RAG would ignore. By using a plain language query, the system accessed the merged knowledge graph to provide a detailed monthly breakdown of revenue trends from January to September 2025. This demonstrates that once the initial architectural hurdle of image parsing is cleared, the user experience remains identical to standard text-based AI interactions.

Community Posts

Write about this video