I Can't Believe Anthropic Messed Up The Ralph Wiggum

BBetter Stack
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Ralph Wiggum is absolutely blowing up. We made a video about it last year and since
00:00:04then it's all anyone is talking about on Twitter. Matt Pocock has made loads of videos
00:00:09on it, Ryan Carson has written a very popular article on it and Razmike has built on it with
00:00:13his Ralphie Bash script. But is everyone doing it wrong? The creator has already said that
00:00:19some implementations are incorrect.
00:00:21So what's the correct way to do it? And why is Ralph currently the best way to build software
00:00:26with AI? Hit subscribe and let's get into it.
00:00:30The Ralph loop was created by Jeff Huntley and written about way back in June last year.
00:00:35It is essentially a bash loop that gives an AI agent the exact same prompt over and over
00:00:40again. But it's genius on so many levels because it lets the AI agent work in its smartest
00:00:46mode, which is the mode where it has as little context as possible. Take a look at this.
00:00:51So let's imagine this is the total context window for an agent. From 0 to about 30% is
00:00:57what we'll call the smart zone, which is where the agent performs the best. From about
00:01:0130 to 60%, it still performs really well. And from 60% onwards, so 60, 70, 80, 90, that's
00:01:08when it starts to degrade. We'll call it the dumb zone. Now these numbers aren't set
00:01:12in stone and can be different per model. So the smart zone for a certain model could be
00:01:1640, 50%, but usually over 80% context window. That's when the dumbness starts to begin.
00:01:21So for Claude Sonnet or Opus, the typical tokens for a context window is 200,000. So you can
00:01:28say the first 60 is the smart zone. The next 60 is still okay, but not as good as the first
00:01:3360k tokens. And then the last 80k, it doesn't seem to perform as well. Now, this is my personal
00:01:38experience with this model. You might have had other experiences. And the reason for this
00:01:43is because the model itself is what we call autoregressive, meaning it has to look at the
00:01:47previous tokens to predict the next one. And if you have loads and loads of tokens, it has
00:01:52to go through a lot of them to find out the important bits that are relevant to the next
00:01:56task at hand. Now let's focus on the first 30%. Even before you write your first prompt,
00:02:01there's some things that get added to the context window automatically. First is the system prompt,
00:02:06and then the system tools. These on a typical Claude model take 8.3% and 1.4% of the context.
00:02:12So almost 10% of this 30. And then if you have skills, that can get added. And also if
00:02:16you have custom MCP tools. Finally, if you have an agent MD file, that gets added too.
00:02:21And the larger, of course, for any of these things, so the larger the MD file, the more
00:02:25tokens it will take up. And this is all even before you've added your own prompt. So in
00:02:30general, it's best to keep this section as small as possible. So have less tools, have
00:02:35less skills and have less in your agent MD file so that the model is working at its most
00:02:40optimum context. And to get an idea of exactly how much 60K is, if we were to get the whole
00:02:44script of Star Wars A New Hope, that is about 54,000 tokens in GPT-5. So roughly this amount.
00:02:51Now you may be wondering, what about compaction? Can that help with this whole process? And
00:02:56we'll talk about that a bit later. But now let's move on to exactly how Ralph can help
00:03:00with this. So the benefit of Ralph is that you focus on one goal per context window. So
00:03:05the whole 200K context window, we can dedicate that to one goal or one task. And the way
00:03:10we do that is we write a prompt that will firstly inspect the plan MD file. This contains
00:03:15the tasks to be done. So something like create the front end, create the backend, do the database
00:03:19and so on. That is a very high level example. Of course, you'd go way more detailed if you
00:03:23were doing Ralph and more granular, but we'll stick with that example for now. So this prompt
00:03:28will tell the agents to pick the most important task, then make those changes. After making
00:03:33those changes, run and even push and commit those changes as well as doing a test. And
00:03:38once you're done with those, once the tests have passed, then tick the task as done in
00:03:42the plan MD file and do that again. So the agent will keep looking for the most important
00:03:46task to do until it's completed all the tasks. Now, actually, let me take that back because
00:03:52you could keep having the Ralph loop go over and over again, even if it has completed all
00:03:57the tasks. And the benefits of that is that it may even find things to fix or find features
00:04:02to add that don't exist in the plan MD file. But if it is going off the rails, the benefits
00:04:08of having Ralph is that you can stop the whole process whenever you want, adjust the prompt
00:04:12MD file and then run the whole process again. And Ralph makes this so simple because this
00:04:16whole process is executed in one single bash while loop. So here it just cats the prompt
00:04:22MD file, so prints it to the agent and then runs Claude in YOLO mode. Of course, the flag
00:04:26isn't YOLO. It's dangerously skip permissions, but for the sake of space, I've kept it short.
00:04:31And what makes Ralph special is that it's outside of the model's control. So the model
00:04:36can't control when to stop Ralph. It will just keep going. And that way you can ensure that
00:04:41when a new task runs or when a new prompt is triggered, the context. So here is pretty
00:04:46much where it is when you first open the agent. So this is fresh. It doesn't have any compaction.
00:04:50It doesn't have anything added. So each new task gets the most amount of context and uses
00:04:55the model in its most smart or most optimal context window state. Basically compaction
00:05:01is where the agent will look at all the tokens that have been written in the context window
00:05:05and pick out the most important bits for the next prompt. So it will pick what it thinks
00:05:11is most important, but it doesn't know what is actually most important. Therefore compaction
00:05:16might lose some critical information and make your project not work as expected. Anyway,
00:05:21now that we've seen the canonical Ralph loop implementation from the creator, this helps
00:05:27us see why other implementations are different. Let's take a look at the anthropic implementation,
00:05:33which uses a slash command to run Ralph inside of Claude's code, has max iterations and a
00:05:38completion promise. So the problem with this specific Ralph Wigan plugin is the fact that
00:05:43it compacts the information when it's moving on to the next task. So if it finishes one
00:05:48task and reruns the prompt, instead of completely resetting the context window, it compacts what
00:05:54was previously done, therefore could lose some vital information. There's also the slight
00:05:59issue of having max iterations and a completion promise because sometimes it's nice to just
00:06:04let Ralph keep going. It can find very interesting things to fix that you wouldn't have thought
00:06:08of before. And if you watch it, so be a human on the loop, you may see patterns good or bad
00:06:14from a specific model that you can tweak and enhance in your original prompt. If we take
00:06:19a look at Ryan Carson's approach to the Ralph loop, we can see here that it's not quite
00:06:24canonical simply because on each loop, it has the possibility of adjusting or adding
00:06:29to the agents.md file. Now, depending on the system prompt or any user prompts you've
00:06:33added to the model, in my experience, by default, models can be very wordy. And so if on each
00:06:39iteration, you're adding to the agents.md file, which gets added to the context at the
00:06:44beginning of each user prompt, then you're just adding more tokens into the context window,
00:06:48pushing the model into a place where it could potentially give you dumb results. But the
00:06:53fact that people are making their own scripts from the basic Ralph loop bash script is a
00:06:57testament to how simple and easy it is to understand. And although there is a canonical way of doing
00:07:03Ralph, I think it's okay for developers, teams and companies to tweak it to their specific
00:07:08use case. For example, I love the fact that in Raz Mike's Ralphie script, there's a way
00:07:13to run parallel Ralphs and also the fact that you can use the agent browser tool from the
00:07:18cell to do browser testing. I also love the fact that in Matt Pocock's version of Ralph,
00:07:23he adds tasks or things to do as GitHub issues and the Ralph loop will pick the most important
00:07:28one, work on it and mark it as done when it's complete before working on the next one, which
00:07:32I think is really clever. I think the power and simplicity of Ralph means that it's going
00:07:37to stick around for a very long time. And you also may see a lot of iterations and improvements
00:07:42from it. I really like the way Jeffrey is taking this with his Loom and Weaver project where
00:07:47he wants to create a way to make software autonomously and correctly. But with all these
00:07:52Ralphs autonomously creating new software, you need a way to search for errors and make
00:07:56sure they get fixed. This is where better stack comes in because not only can it ingest logs
00:08:01and filter out errors from them, but it can also handle error tracking on the front end.
00:08:06So with this MCP server, you can ask an agent to specifically pick out errors from the front
00:08:11end or back end instead of reading through the whole log, which in turn reduces the context
00:08:16window.
00:08:17So go and check out Better Flux, and let me know what you think in the comments.

Key Takeaway

The Ralph loop is a powerful software development strategy that maintains AI model intelligence by resettting context windows and keeping agents within their optimal processing range.

Highlights

The Ralph loop is a bash-based automation technique that repeatedly feeds an AI agent the same prompt to maximize performance.

AI models perform best in a "smart zone" which constitutes the first 30% to 60% of their total context window.

Context window performance degrades (the "dumb zone") once it exceeds 80% capacity due to autoregressive processing limitations.

The canonical Ralph implementation by Jeff Huntley resets the context window for every task to avoid the data loss caused by AI "compaction."

Anthropic's implementation is criticized for using information compaction

Timeline

Introduction to the Ralph Phenomenon

The speaker introduces the "Ralph Wiggum" concept which has recently gained massive popularity on platforms like Twitter. Mention is made of key community contributors such as Matt Pocock, Ryan Carson, and Razmike who have built their own scripts and articles around it. However, the speaker raises a critical question regarding whether the current widespread implementations are actually being done correctly. The creator of the loop has expressed concerns that many developers are missing the original intent. This section sets the stage for a deeper dive into the "correct" way to build software using AI agents.

The Mechanics of the Context Window

This segment explains that the Ralph loop, created by Jeff Huntley, is essentially a bash script that provides an AI agent with the same prompt repeatedly. The speaker details the "smart zone" and "dumb zone" of context windows, noting that performance degrades significantly after the first 60% of tokens are used. For a model like Claude Sonnet with 200,000 tokens, the first 60,000 are the most reliable for complex reasoning. The speaker emphasizes that system prompts, tools, and custom files like "agent.md" automatically consume a portion of this optimal space. Keeping these auxiliary files small is essential for ensuring the model operates at peak efficiency without getting lost in its own autoregressive history.

Executing the Canonical Ralph Loop

The speaker describes how a dedicated context window is used for a single specific goal or task to maintain clarity. In the Ralph loop, an agent inspects a "plan.md" file, executes the most important task, runs tests, and commits changes before resetting. This cycle is performed in a simple bash while loop that uses a "dangerously skip permissions" mode to keep the process moving autonomously. The core benefit is that each task starts with a fresh context window, avoiding the pitfalls of AI "compaction" where models summarize and lose vital data. By staying outside the model's control, the bash script ensures the AI doesn't prematurely decide to stop or forget its instructions. This section highlights the power of simplicity in maintaining high-quality code generation.

Analyzing Flawed and Custom Implementations

The analysis shifts to why the Anthropic implementation of the Ralph loop is viewed as problematic by the speaker. Anthropic's version uses a slash command and information compaction, which risks losing critical project details as it moves between iterations. The speaker also critiques Ryan Carson's approach because it frequently updates the "agent.md" file, which can inadvertently push the model into the "dumb zone" by bloating the starting context. Despite these critiques, the speaker praises Matt Pocock's GitHub issue integration and Razmike's parallel processing capabilities as brilliant evolutions of the concept. These diverse iterations demonstrate the flexibility of the Ralph loop for different developer workflows and specific project needs. Ultimately, the simplicity of the original script allows for these creative tweaks while still providing a robust framework for autonomous work.

Optimizing Error Tracking and Future Outlook

In the concluding section, the speaker looks toward the future of autonomous software creation with projects like Loom and Weaver. To manage the high volume of code generated by these agents, a robust error-tracking system is necessary to filter through logs. Better Stack is introduced as a solution that offers an MCP server to help agents identify specific front-end or back-end errors quickly. This integration further saves context window space by preventing the agent from having to read through entire, messy log files. The video ends by encouraging viewers to check out these tools and share their thoughts on the evolving Ralph ecosystem. This final segment ties the technical loop mechanics back to practical, scalable software maintenance.

Community Posts

View all posts