00:00:00Ralph Wiggum is absolutely blowing up. We made a video about it last year and since
00:00:04then it's all anyone is talking about on Twitter. Matt Pocock has made loads of videos
00:00:09on it, Ryan Carson has written a very popular article on it and Razmike has built on it with
00:00:13his Ralphie Bash script. But is everyone doing it wrong? The creator has already said that
00:00:19some implementations are incorrect.
00:00:21So what's the correct way to do it? And why is Ralph currently the best way to build software
00:00:26with AI? Hit subscribe and let's get into it.
00:00:30The Ralph loop was created by Jeff Huntley and written about way back in June last year.
00:00:35It is essentially a bash loop that gives an AI agent the exact same prompt over and over
00:00:40again. But it's genius on so many levels because it lets the AI agent work in its smartest
00:00:46mode, which is the mode where it has as little context as possible. Take a look at this.
00:00:51So let's imagine this is the total context window for an agent. From 0 to about 30% is
00:00:57what we'll call the smart zone, which is where the agent performs the best. From about
00:01:0130 to 60%, it still performs really well. And from 60% onwards, so 60, 70, 80, 90, that's
00:01:08when it starts to degrade. We'll call it the dumb zone. Now these numbers aren't set
00:01:12in stone and can be different per model. So the smart zone for a certain model could be
00:01:1640, 50%, but usually over 80% context window. That's when the dumbness starts to begin.
00:01:21So for Claude Sonnet or Opus, the typical tokens for a context window is 200,000. So you can
00:01:28say the first 60 is the smart zone. The next 60 is still okay, but not as good as the first
00:01:3360k tokens. And then the last 80k, it doesn't seem to perform as well. Now, this is my personal
00:01:38experience with this model. You might have had other experiences. And the reason for this
00:01:43is because the model itself is what we call autoregressive, meaning it has to look at the
00:01:47previous tokens to predict the next one. And if you have loads and loads of tokens, it has
00:01:52to go through a lot of them to find out the important bits that are relevant to the next
00:01:56task at hand. Now let's focus on the first 30%. Even before you write your first prompt,
00:02:01there's some things that get added to the context window automatically. First is the system prompt,
00:02:06and then the system tools. These on a typical Claude model take 8.3% and 1.4% of the context.
00:02:12So almost 10% of this 30. And then if you have skills, that can get added. And also if
00:02:16you have custom MCP tools. Finally, if you have an agent MD file, that gets added too.
00:02:21And the larger, of course, for any of these things, so the larger the MD file, the more
00:02:25tokens it will take up. And this is all even before you've added your own prompt. So in
00:02:30general, it's best to keep this section as small as possible. So have less tools, have
00:02:35less skills and have less in your agent MD file so that the model is working at its most
00:02:40optimum context. And to get an idea of exactly how much 60K is, if we were to get the whole
00:02:44script of Star Wars A New Hope, that is about 54,000 tokens in GPT-5. So roughly this amount.
00:02:51Now you may be wondering, what about compaction? Can that help with this whole process? And
00:02:56we'll talk about that a bit later. But now let's move on to exactly how Ralph can help
00:03:00with this. So the benefit of Ralph is that you focus on one goal per context window. So
00:03:05the whole 200K context window, we can dedicate that to one goal or one task. And the way
00:03:10we do that is we write a prompt that will firstly inspect the plan MD file. This contains
00:03:15the tasks to be done. So something like create the front end, create the backend, do the database
00:03:19and so on. That is a very high level example. Of course, you'd go way more detailed if you
00:03:23were doing Ralph and more granular, but we'll stick with that example for now. So this prompt
00:03:28will tell the agents to pick the most important task, then make those changes. After making
00:03:33those changes, run and even push and commit those changes as well as doing a test. And
00:03:38once you're done with those, once the tests have passed, then tick the task as done in
00:03:42the plan MD file and do that again. So the agent will keep looking for the most important
00:03:46task to do until it's completed all the tasks. Now, actually, let me take that back because
00:03:52you could keep having the Ralph loop go over and over again, even if it has completed all
00:03:57the tasks. And the benefits of that is that it may even find things to fix or find features
00:04:02to add that don't exist in the plan MD file. But if it is going off the rails, the benefits
00:04:08of having Ralph is that you can stop the whole process whenever you want, adjust the prompt
00:04:12MD file and then run the whole process again. And Ralph makes this so simple because this
00:04:16whole process is executed in one single bash while loop. So here it just cats the prompt
00:04:22MD file, so prints it to the agent and then runs Claude in YOLO mode. Of course, the flag
00:04:26isn't YOLO. It's dangerously skip permissions, but for the sake of space, I've kept it short.
00:04:31And what makes Ralph special is that it's outside of the model's control. So the model
00:04:36can't control when to stop Ralph. It will just keep going. And that way you can ensure that
00:04:41when a new task runs or when a new prompt is triggered, the context. So here is pretty
00:04:46much where it is when you first open the agent. So this is fresh. It doesn't have any compaction.
00:04:50It doesn't have anything added. So each new task gets the most amount of context and uses
00:04:55the model in its most smart or most optimal context window state. Basically compaction
00:05:01is where the agent will look at all the tokens that have been written in the context window
00:05:05and pick out the most important bits for the next prompt. So it will pick what it thinks
00:05:11is most important, but it doesn't know what is actually most important. Therefore compaction
00:05:16might lose some critical information and make your project not work as expected. Anyway,
00:05:21now that we've seen the canonical Ralph loop implementation from the creator, this helps
00:05:27us see why other implementations are different. Let's take a look at the anthropic implementation,
00:05:33which uses a slash command to run Ralph inside of Claude's code, has max iterations and a
00:05:38completion promise. So the problem with this specific Ralph Wigan plugin is the fact that
00:05:43it compacts the information when it's moving on to the next task. So if it finishes one
00:05:48task and reruns the prompt, instead of completely resetting the context window, it compacts what
00:05:54was previously done, therefore could lose some vital information. There's also the slight
00:05:59issue of having max iterations and a completion promise because sometimes it's nice to just
00:06:04let Ralph keep going. It can find very interesting things to fix that you wouldn't have thought
00:06:08of before. And if you watch it, so be a human on the loop, you may see patterns good or bad
00:06:14from a specific model that you can tweak and enhance in your original prompt. If we take
00:06:19a look at Ryan Carson's approach to the Ralph loop, we can see here that it's not quite
00:06:24canonical simply because on each loop, it has the possibility of adjusting or adding
00:06:29to the agents.md file. Now, depending on the system prompt or any user prompts you've
00:06:33added to the model, in my experience, by default, models can be very wordy. And so if on each
00:06:39iteration, you're adding to the agents.md file, which gets added to the context at the
00:06:44beginning of each user prompt, then you're just adding more tokens into the context window,
00:06:48pushing the model into a place where it could potentially give you dumb results. But the
00:06:53fact that people are making their own scripts from the basic Ralph loop bash script is a
00:06:57testament to how simple and easy it is to understand. And although there is a canonical way of doing
00:07:03Ralph, I think it's okay for developers, teams and companies to tweak it to their specific
00:07:08use case. For example, I love the fact that in Raz Mike's Ralphie script, there's a way
00:07:13to run parallel Ralphs and also the fact that you can use the agent browser tool from the
00:07:18cell to do browser testing. I also love the fact that in Matt Pocock's version of Ralph,
00:07:23he adds tasks or things to do as GitHub issues and the Ralph loop will pick the most important
00:07:28one, work on it and mark it as done when it's complete before working on the next one, which
00:07:32I think is really clever. I think the power and simplicity of Ralph means that it's going
00:07:37to stick around for a very long time. And you also may see a lot of iterations and improvements
00:07:42from it. I really like the way Jeffrey is taking this with his Loom and Weaver project where
00:07:47he wants to create a way to make software autonomously and correctly. But with all these
00:07:52Ralphs autonomously creating new software, you need a way to search for errors and make
00:07:56sure they get fixed. This is where better stack comes in because not only can it ingest logs
00:08:01and filter out errors from them, but it can also handle error tracking on the front end.
00:08:06So with this MCP server, you can ask an agent to specifically pick out errors from the front
00:08:11end or back end instead of reading through the whole log, which in turn reduces the context
00:08:16window.
00:08:17So go and check out Better Flux, and let me know what you think in the comments.