Loop Engineering Totally 10x Hermes agents

AAI LABS
Computing/SoftwareInternet Technology

Transcript

00:00:00There's a new term going around and you might have already heard it. It's called loop engineering
00:00:04and just like every other hype term everyone is talking about it like it's something new. It's not
00:00:09but when you combine it with an always running agent like Hermes it stops being hype. Most people
00:00:13who are trying to set these up are getting the loop right and missing the thing that actually
00:00:17makes it work and if you already know there are two types of loops there's a specific setup inside
00:00:22one of them that almost nobody is doing. Once you see it the way you think about building with agents
00:00:27changes completely. By the end of this video you'll understand exactly what it is and you'll have it
00:00:31running on Hermes and even Claude Code without you having to step in at all. With loop engineering
00:00:36the core idea is simple. You stop being the person who writes the prompt that drives the agent and
00:00:41instead you let the agent drive itself but to see why it's a shift in the first place you've got to
00:00:46compare it to what came before. The skill that used to matter was prompt engineering where all our focus
00:00:51went into writing the right series of instructions to drive the coding agent properly but loop engineering
00:00:56flips that around. Instead of writing the prompt yourself you design the system that does the
00:01:01prompt engineering for you and drives the agent on its own so the focus moves away from crafting
00:01:05instructions and toward designing systems that run themselves. All of this started when the creator
00:01:10of OpenClaw said you shouldn't be prompting your coding agents anymore and that you should focus
00:01:15on designing loops that prompt the agent for you and he's not the only one. Boris who is the creator of
00:01:20Claude Code also made the same claim at the Anthropics annual developer conference where he said he
00:01:25doesn't prompt Claude anymore. He's got loops running that prompt Claude and it figures out for itself
00:01:30what needs to be done. So the question is how do you get started with them? All of it comes down to
00:01:34how well you can set up the systems where you don't have to worry about prompting the agent at all.
00:01:39You define what you need and the agent does the rest. That's exactly where AI powered development is
00:01:45heading. Before we get into how to actually build them you need to be clear on what a loop is. A loop is
00:01:50basically a process where you define the end goal and the agent figures out the steps to reach it on its
00:01:56own. It corrects itself along the way and works around problems until it reaches the goal you set.
00:02:01A few months ago before models got capable enough to sustain long tasks this wasn't possible. If you
00:02:06needed to build an app you'd prompt the agent, monitor what it was doing, check the output yourself,
00:02:11find the issues and re-prompt to fix them. You were the loop. You were the part doing the error
00:02:16checking and course correcting between every step. That's what development still looks like for most
00:02:20people and that's exactly what loop engineering is about to take off your plate. Now this might
00:02:25sound like a brand new concept but loops have actually been around for a while. Cron jobs are
00:02:30a good example of a loop you've probably already seen. They're just tasks scheduled to run repeatedly
00:02:35and automatically without you having to trigger them each time. The only real difference is that a
00:02:39cron job runs at a fixed time. So with loops in place the work stops being about writing the prompt.
00:02:44Your agent's performance on a task comes down to how well you define the end goal. To some of you this
00:02:49process will sound a lot like reinforcement learning. If you haven't come across it, reinforcement learning
00:02:54is basically a way of training a model where you don't show it the right answers. Instead you just tell
00:02:59it when it did well and when it didn't and it gradually figures out how to get better on its own.
00:03:04The model finds the right path by trying different things. It gets a positive signal when it moves in
00:03:09the right direction and a negative one when it doesn't. The same idea applies here except the model itself
00:03:14isn't what's being trained. Instead the agent is working toward completing the task you want done,
00:03:19iterating on it in the same way a model would improve during training. If it fails the loop you've
00:03:23put on the agent doesn't mark the task as done. It tries again, keeps going and corrects itself until
00:03:28it reaches the goal you set. Now after hearing all this you might wonder what's actually left for you
00:03:33to do if everything is becoming autonomous. But your role doesn't shrink, it gets more important.
00:03:38Because it's your domain knowledge and experience that define the end goal in the first place and
00:03:43that ends up showing in everything you build and ship. This is exactly why the push toward autonomous
00:03:48loops is only speeding up and it's showing in every new feature that drops right now. Fable 5 is the
00:03:54clearest example yet. Anthropic dropped it even though they'd been calling for a slow down in AI
00:03:59development because the models are getting capable at a pace that's hard to keep up with. And after
00:04:03releasing it for some time they even pulled it. They built it for long and complex tasks and it
00:04:08performs better the longer and more complex the task gets which is basically the opposite of how models
00:04:13used to work. This shift really started with Opus 4.5. Once that dropped, long running tasks got
00:04:19dramatically better. And you didn't need to set agents up with carefully guided harnesses anymore,
00:04:23basically structured setups that walk the agent through each step. The focus moved instead toward
00:04:28preparing the project to run over the long term because the models are now capable enough to
00:04:33handle things on their own without much step-by-step handling. But the loop isn't the only thing that
00:04:38matters. You also need to structure your project in a way that lets the agent work on its own for a
00:04:43long time without you having to step in. So a lot of people have been building and open sourcing systems
00:04:48for exactly this kind of setup. The RALF loop was one of the first. It worked by setting the end goal
00:04:53and making sure the agent couldn't drift away from it. It did this through hooks, which are basically
00:04:57scripts that run automatically when something specific happens. So this script strictly prevents the agent from marking
00:05:03a task as done unless it had actually met the condition. But hooks are rigid, so Claude introduced its own goal
00:05:09command, which did the same thing but with more flexibility. Instead of a hard coded check, it lets
00:05:14another model decide whether the task is actually finished. We covered Goal Buddy 2, which built on
00:05:19that by having the agent track its progress in local files and define exactly what done looks like
00:05:24before it even starts, so it always knows what it's working toward. The Hermes agent and OpenClaw were both
00:05:29built on the same philosophy. They take you out of the picture entirely and let the agent handle everything
00:05:35on its own. Now, if you want to build these loops, we've got a simple five-step system for you and since
00:05:40there are two types of loops, some of those steps work a little differently but we'll get into both types
00:05:45later on. For now, we'll start in Claude code and later in the video, we'll look at how to do the same
00:05:49thing in the Hermes agent. The first step is checking what state the project is in. From that, the model
00:05:54decides what the next action should be. Then it acts on that decision and this is where the actual work
00:05:59happens. The agent calls tools, writes to files and runs commands to get the task done. Once that's
00:06:04finished, it gathers feedback to see what actually happened and based on that, it decides whether the
00:06:09task is done or not. This is also where the difference between prompt engineering and loop engineering becomes
00:06:14obvious. With prompt engineering, you're only ever controlling the decision step while loop engineering
00:06:19handles all five together. Building a loop that works well means getting a handful of things right and
00:06:24each one is there because of a specific problem it solves. The first is context management. You pay
00:06:29attention to what goes into the context on every turn because that's what determines what the agent
00:06:34actually knows at any given point. You can't rely on the chat context alone, even with context windows
00:06:39as large as a million tokens, basically how much the agent can hold in memory at once, because as the
00:06:44conversation grows, your system prompt and instructions get buried under recent tool outputs. The agent's
00:06:50attention naturally pulls toward whatever is most recent, so the important stuff gets lost. That's why
00:06:55managing context matters so much. The next thing to get right is feedback quality. Feedback is what tells
00:07:00the agent how it did and it's one of the most important signals in the whole system. It can take a lot
00:07:05of forms like the output of a test run or a screenshot of the UI it just built and whatever form it takes,
00:07:11that's what the agent reads to figure out its next move. Verification gates are what turn that feedback
00:07:16into a clear verdict. They're the checkpoints that tell the agent whether a task is actually done or
00:07:21not. You also need a termination condition, basically a rule that tells the loop when to stop and this one
00:07:26has to be set explicitly, otherwise the agent either quits too early or keeps going without making real
00:07:31progress. The thing people most often overlook is error handling. You have to spell out what the model
00:07:36should do when a tool call fails, so the system handles it cleanly instead of leaving things in
00:07:41a broken state that just creates more problems. And finally, you need to manage state across turns,
00:07:46basically keep track of where the task is as the conversation grows. The context window can't hold
00:07:51everything forever, so you lean on external files that track information for the agent and let it keep
00:07:57working without losing the thread. One thing to keep in mind though, since you're handing the job of
00:08:01figuring out the path over to the model instead of doing it yourself, loops get expensive in tokens,
00:08:06so you need to be deliberate about when you actually use them. The more tokens a loop can
00:08:11work with, the better it tends to handle the task. But before we move forward, let's have a word from
00:08:15our sponsor, Scrimba. Most python courses are just someone talking over slides. Scrimba is different,
00:08:21their video player is the code editor, so you can pause anytime, edit the instructor's code directly,
00:08:26and see what happens. No tab switching, no copy pasting, just hands-on coding from the start.
00:08:31Their new Learn Python course caught my attention because instead of random exercises, you actually
00:08:37build something real. From day one, you're building PayUp, a fully functional expense-splitting app,
00:08:42and every concept gets applied immediately. You start from absolute zero, no prior Python knowledge needed,
00:08:47and work through variables, strings, capturing user input, arithmetic operators, type conversion,
00:08:53data cleaning, and number formatting, all by building features for the app. By the end,
00:08:57you've built a working project from scratch that proves you actually know Python. This is just part
00:09:02one of several that will become available over the coming weeks, and currently, it's totally free to
00:09:07access. Get started today with their free courses, and our users will get an extra 20% off on their pro
00:09:12plans. So click the link in the pinned comment, or scan the QR code, and start building today.
00:09:18As we mentioned, there are two types of loops. The first one is called the deterministic loop. You use it
00:09:23for tasks that have a clear definition of what done actually looks like, that could be tests passing,
00:09:28code compiling successfully, or anything like that. These loops are fairly straightforward to work
00:09:33toward, because the end goal is clear, so the model knows exactly what it needs to do before it can call
00:09:38the task done. Since Hermes is always running, it's a really good agent implement this loop on. We've
00:09:43created multiple workflows on it before, and showed in our previous video how it handles a lot of our work
00:09:49on its own. The core of a deterministic loop is the clear definition of the end goal, and for the apps
00:09:54you've hosted, that definition is your tests. So you can point the Hermes agent at any app you've
00:09:59deployed with test cases and have it monitor it for you. Now if a change or a commit ends up breaking
00:10:04production, you can set up an automation on Hermes to catch it. The reason it works best here is that it
00:10:09comes with the self-evolving skills feature, so it automatically creates and evolves skills based on the
00:10:14workflow which keeps the health of the app in check. Once you've set up that monitoring automation, you
00:10:18can ask it to launch clawed code in non-interactive mode, basically running it on its own without you
00:10:23having to drive it and have it fix issues in a loop until all the test cases pass. What it does from
00:10:28there is set up the automation workflow and load skills like the sub-agent driven development skill
00:10:34and the GitHub PR workflow skill which tell it how to manage the app on GitHub. It first identifies the
00:10:39issues that were breaking production then launches clawed code in non-interactive mode which takes
00:10:44the tests and commits the changes once all of them pass. After it has run every test and fixed whatever
00:10:50was causing production to fail, it uses the GitHub CLI to commit the changes. The app ends up running
00:10:55without any failures because it has confirmed that all the checks for a successful deployment are in place.
00:11:00If you like these breakdowns, subscribe to the channel, click the notification bell and hit the hype
00:11:05button too. On the channel, we post content that helps you learn new ways to optimize different
00:11:10processes in different businesses with AI. Your support, whether it's subscribing, the notification
00:11:15bell or the hype button, helps us create more content like this and reach more people. It means a lot to us.
00:11:21Now the second type is the non-deterministic loop and these are tasks where you can't just set a clear
00:11:26rule to check whether the job is done the way you can with deterministic loops. Because of that,
00:11:31there's no clean way to verify the outcome. These are the kinds of things that we as humans can look
00:11:36at and judge for ourselves like building a UI or implementing a feature that needs a judgment call.
00:11:41So when you're working with a non-deterministic loop, the workflow is different. If you're applying
00:11:46AI to UI, you already know that it tends to fall back to the same patterns all the time. That's why we
00:11:51created a skill called AI Slop Detector which holds all the instructions on how to avoid AI slop and lists
00:11:57the patterns that actually give it away. And the reason we're using Hermes again is the self-evolving
00:12:02skills. If we still find AI slop in the UI after running the skill, the skill can update itself to
00:12:07incorporate that feedback directly and that's exactly why we set this workflow up on Hermes. So we asked
00:12:13Hermes to use the skill and check whether the UI has any of those patterns. If it does, it fixes them
00:12:18and launches Claude Code in non-interactive mode to run the skill and keep fixing what it finds until
00:12:23there's nothing left to fix. Another benefit we get out of Hermes is that the model reviewing the work is
00:12:28different from the one building it. We were using the GPT models which are known to be among the best for
00:12:33code review, so the Claude models become the builder and the other agent becomes the verifier. That's what
00:12:38completes the adversarial loop where the two check each other's work. Once that loop ran, it generated a
00:12:43much better UI than the generic output the Opus models are putting out nowadays. And if you still spot any sign of AI
00:12:49slop in the UI after the agent loop has ended, you can just mention it and it will update the skill for
00:12:54you, strengthening the verifier you already have. We've enhanced this skill to match multiple AI slop
00:12:59patterns that we and Hermes identified collectively. If you want to use this skill, you can get it from our
00:13:04community AI Labs Pro. The link's going to be in the description. That brings us to the end of this video.
00:13:09If you'd like to support the channel and help us keep making videos like this, you can do so by using the
00:13:14super thanks button below. As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Loop engineering creates autonomous development workflows by designing systems that handle task iteration, error correction, and verification, allowing agents like Hermes to drive coding tasks from start to finish without human prompts.

Highlights

  • Loop engineering shifts the focus from manual prompt engineering to designing self-driving systems where agents manage their own tasks and course corrections.

  • Deterministic loops use clear verification criteria like test passing or compilation success to confirm task completion.

  • Non-deterministic loops utilize adversarial setups where a builder model creates content and a separate verifier model critiques it against quality standards.

  • Effective loops require specific components: robust context management, clear feedback mechanisms, verification gates, explicit termination conditions, and error handling strategies.

  • Autonomous agents like Hermes and Claude Code now operate in non-interactive modes to execute long-term tasks without human intervention.

Timeline

From Prompt Engineering to Loop Engineering

  • Loop engineering replaces manual, step-by-step prompt crafting with system design where agents manage their own workflows.
  • Human involvement in error checking and course correction between steps is replaced by autonomous processes.
  • Advanced models like Claude 3.5 Opus enable the shift toward long-running tasks that previously required human intervention.

The transition in AI development moves away from manually writing instructions to designing self-correcting systems. In this framework, the human defines the end goal while the agent determines the necessary steps, handles errors, and iterates until the objective is achieved. This shift mirrors reinforcement learning concepts, where agents use signals to improve performance over time rather than relying on static prompts.

Essential Components for Building Autonomous Loops

  • System performance in loops relies on clearly defined end goals that guide agent decision-making.
  • Successful implementation requires five components: context management, feedback quality, verification gates, termination conditions, and error handling.
  • State must be maintained across turns using external files to prevent context window limitations from disrupting progress.

Constructing effective loops requires careful architecture. Developers must manage context beyond standard chat windows, implement high-quality feedback loops, and establish explicit verification gates to determine if a task is actually finished. Error handling is critical to ensure tool call failures do not leave the system in a broken, non-functional state.

Deterministic and Non-Deterministic Loop Architectures

  • Deterministic loops are ideal for tasks with clear success criteria, such as passing unit tests or successful code compilation.
  • Non-deterministic loops address subjective tasks like UI design by utilizing an adversarial approach with separate builder and verifier models.
  • Hermes agents leverage self-evolving skills to automatically update workflows based on feedback and identified patterns.

Deterministic loops rely on binary outcomes to judge success, making them ideal for automated maintenance like monitoring production environments via test suites. Non-deterministic loops handle more complex, subjective tasks. By utilizing different models for building and verifying, such as using Claude for construction and GPT for review, agents can improve output quality and detect patterns like 'AI slop' to iteratively refine the results.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video