Loop Engineering Totally 10x Hermes agents

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareInternet Technology

Transcript

00:00:00There's a new term going around and you might have already heard it. It's called loop engineering

00:00:04and just like every other hype term everyone is talking about it like it's something new. It's not

00:00:09but when you combine it with an always running agent like Hermes it stops being hype. Most people

00:00:13who are trying to set these up are getting the loop right and missing the thing that actually

00:00:17makes it work and if you already know there are two types of loops there's a specific setup inside

00:00:22one of them that almost nobody is doing. Once you see it the way you think about building with agents

00:00:27changes completely. By the end of this video you'll understand exactly what it is and you'll have it

00:00:31running on Hermes and even Claude Code without you having to step in at all. With loop engineering

00:00:36the core idea is simple. You stop being the person who writes the prompt that drives the agent and

00:00:41instead you let the agent drive itself but to see why it's a shift in the first place you've got to

00:00:46compare it to what came before. The skill that used to matter was prompt engineering where all our focus

00:00:51went into writing the right series of instructions to drive the coding agent properly but loop engineering

00:00:56flips that around. Instead of writing the prompt yourself you design the system that does the

00:01:01prompt engineering for you and drives the agent on its own so the focus moves away from crafting

00:01:05instructions and toward designing systems that run themselves. All of this started when the creator

00:01:10of OpenClaw said you shouldn't be prompting your coding agents anymore and that you should focus

00:01:15on designing loops that prompt the agent for you and he's not the only one. Boris who is the creator of

00:01:20Claude Code also made the same claim at the Anthropics annual developer conference where he said he

00:01:25doesn't prompt Claude anymore. He's got loops running that prompt Claude and it figures out for itself

00:01:30what needs to be done. So the question is how do you get started with them? All of it comes down to

00:01:34how well you can set up the systems where you don't have to worry about prompting the agent at all.

00:01:39You define what you need and the agent does the rest. That's exactly where AI powered development is

00:01:45heading. Before we get into how to actually build them you need to be clear on what a loop is. A loop is

00:01:50basically a process where you define the end goal and the agent figures out the steps to reach it on its

00:01:56own. It corrects itself along the way and works around problems until it reaches the goal you set.

00:02:01A few months ago before models got capable enough to sustain long tasks this wasn't possible. If you

00:02:06needed to build an app you'd prompt the agent, monitor what it was doing, check the output yourself,

00:02:11find the issues and re-prompt to fix them. You were the loop. You were the part doing the error

00:02:16checking and course correcting between every step. That's what development still looks like for most

00:02:20people and that's exactly what loop engineering is about to take off your plate. Now this might

00:02:25sound like a brand new concept but loops have actually been around for a while. Cron jobs are

00:02:30a good example of a loop you've probably already seen. They're just tasks scheduled to run repeatedly

00:02:35and automatically without you having to trigger them each time. The only real difference is that a

00:02:39cron job runs at a fixed time. So with loops in place the work stops being about writing the prompt.

00:02:44Your agent's performance on a task comes down to how well you define the end goal. To some of you this

00:02:49process will sound a lot like reinforcement learning. If you haven't come across it, reinforcement learning

00:02:54is basically a way of training a model where you don't show it the right answers. Instead you just tell

00:02:59it when it did well and when it didn't and it gradually figures out how to get better on its own.

00:03:04The model finds the right path by trying different things. It gets a positive signal when it moves in

00:03:09the right direction and a negative one when it doesn't. The same idea applies here except the model itself

00:03:14isn't what's being trained. Instead the agent is working toward completing the task you want done,

00:03:19iterating on it in the same way a model would improve during training. If it fails the loop you've

00:03:23put on the agent doesn't mark the task as done. It tries again, keeps going and corrects itself until

00:03:28it reaches the goal you set. Now after hearing all this you might wonder what's actually left for you

00:03:33to do if everything is becoming autonomous. But your role doesn't shrink, it gets more important.

00:03:38Because it's your domain knowledge and experience that define the end goal in the first place and

00:03:43that ends up showing in everything you build and ship. This is exactly why the push toward autonomous

00:03:48loops is only speeding up and it's showing in every new feature that drops right now. Fable 5 is the

00:03:54clearest example yet. Anthropic dropped it even though they'd been calling for a slow down in AI

00:03:59development because the models are getting capable at a pace that's hard to keep up with. And after

00:04:03releasing it for some time they even pulled it. They built it for long and complex tasks and it

00:04:08performs better the longer and more complex the task gets which is basically the opposite of how models

00:04:13used to work. This shift really started with Opus 4.5. Once that dropped, long running tasks got

00:04:19dramatically better. And you didn't need to set agents up with carefully guided harnesses anymore,

00:04:23basically structured setups that walk the agent through each step. The focus moved instead toward

00:04:28preparing the project to run over the long term because the models are now capable enough to

00:04:33handle things on their own without much step-by-step handling. But the loop isn't the only thing that

00:04:38matters. You also need to structure your project in a way that lets the agent work on its own for a

00:04:43long time without you having to step in. So a lot of people have been building and open sourcing systems

00:04:48for exactly this kind of setup. The RALF loop was one of the first. It worked by setting the end goal

00:04:53and making sure the agent couldn't drift away from it. It did this through hooks, which are basically

00:04:57scripts that run automatically when something specific happens. So this script strictly prevents the agent from marking

00:05:03a task as done unless it had actually met the condition. But hooks are rigid, so Claude introduced its own goal

00:05:09command, which did the same thing but with more flexibility. Instead of a hard coded check, it lets

00:05:14another model decide whether the task is actually finished. We covered Goal Buddy 2, which built on

00:05:19that by having the agent track its progress in local files and define exactly what done looks like

00:05:24before it even starts, so it always knows what it's working toward. The Hermes agent and OpenClaw were both

00:05:29built on the same philosophy. They take you out of the picture entirely and let the agent handle everything

00:05:35on its own. Now, if you want to build these loops, we've got a simple five-step system for you and since

00:05:40there are two types of loops, some of those steps work a little differently but we'll get into both types

00:05:45later on. For now, we'll start in Claude code and later in the video, we'll look at how to do the same

00:05:49thing in the Hermes agent. The first step is checking what state the project is in. From that, the model

00:05:54decides what the next action should be. Then it acts on that decision and this is where the actual work

00:05:59happens. The agent calls tools, writes to files and runs commands to get the task done. Once that's

00:06:04finished, it gathers feedback to see what actually happened and based on that, it decides whether the

00:06:09task is done or not. This is also where the difference between prompt engineering and loop engineering becomes

00:06:14obvious. With prompt engineering, you're only ever controlling the decision step while loop engineering

00:06:19handles all five together. Building a loop that works well means getting a handful of things right and

00:06:24each one is there because of a specific problem it solves. The first is context management. You pay

00:06:29attention to what goes into the context on every turn because that's what determines what the agent

00:06:34actually knows at any given point. You can't rely on the chat context alone, even with context windows

00:06:39as large as a million tokens, basically how much the agent can hold in memory at once, because as the

00:06:44conversation grows, your system prompt and instructions get buried under recent tool outputs. The agent's

00:06:50attention naturally pulls toward whatever is most recent, so the important stuff gets lost. That's why

00:06:55managing context matters so much. The next thing to get right is feedback quality. Feedback is what tells

00:07:00the agent how it did and it's one of the most important signals in the whole system. It can take a lot

00:07:05of forms like the output of a test run or a screenshot of the UI it just built and whatever form it takes,

00:07:11that's what the agent reads to figure out its next move. Verification gates are what turn that feedback

00:07:16into a clear verdict. They're the checkpoints that tell the agent whether a task is actually done or

00:07:21not. You also need a termination condition, basically a rule that tells the loop when to stop and this one

00:07:26has to be set explicitly, otherwise the agent either quits too early or keeps going without making real

00:07:31progress. The thing people most often overlook is error handling. You have to spell out what the model

00:07:36should do when a tool call fails, so the system handles it cleanly instead of leaving things in

00:07:41a broken state that just creates more problems. And finally, you need to manage state across turns,

00:07:46basically keep track of where the task is as the conversation grows. The context window can't hold

00:07:51everything forever, so you lean on external files that track information for the agent and let it keep

00:07:57working without losing the thread. One thing to keep in mind though, since you're handing the job of

00:08:01figuring out the path over to the model instead of doing it yourself, loops get expensive in tokens,

00:08:06so you need to be deliberate about when you actually use them. The more tokens a loop can

00:08:11work with, the better it tends to handle the task. But before we move forward, let's have a word from

00:08:15our sponsor, Scrimba. Most python courses are just someone talking over slides. Scrimba is different,

00:08:21their video player is the code editor, so you can pause anytime, edit the instructor's code directly,

00:08:26and see what happens. No tab switching, no copy pasting, just hands-on coding from the start.

00:08:31Their new Learn Python course caught my attention because instead of random exercises, you actually

00:08:37build something real. From day one, you're building PayUp, a fully functional expense-splitting app,

00:08:42and every concept gets applied immediately. You start from absolute zero, no prior Python knowledge needed,

00:08:47and work through variables, strings, capturing user input, arithmetic operators, type conversion,

00:08:53data cleaning, and number formatting, all by building features for the app. By the end,

00:08:57you've built a working project from scratch that proves you actually know Python. This is just part

00:09:02one of several that will become available over the coming weeks, and currently, it's totally free to

00:09:07access. Get started today with their free courses, and our users will get an extra 20% off on their pro

00:09:12plans. So click the link in the pinned comment, or scan the QR code, and start building today.

00:09:18As we mentioned, there are two types of loops. The first one is called the deterministic loop. You use it

00:09:23for tasks that have a clear definition of what done actually looks like, that could be tests passing,

00:09:28code compiling successfully, or anything like that. These loops are fairly straightforward to work

00:09:33toward, because the end goal is clear, so the model knows exactly what it needs to do before it can call

00:09:38the task done. Since Hermes is always running, it's a really good agent implement this loop on. We've

00:09:43created multiple workflows on it before, and showed in our previous video how it handles a lot of our work

00:09:49on its own. The core of a deterministic loop is the clear definition of the end goal, and for the apps

00:09:54you've hosted, that definition is your tests. So you can point the Hermes agent at any app you've

00:09:59deployed with test cases and have it monitor it for you. Now if a change or a commit ends up breaking

00:10:04production, you can set up an automation on Hermes to catch it. The reason it works best here is that it

00:10:09comes with the self-evolving skills feature, so it automatically creates and evolves skills based on the

00:10:14workflow which keeps the health of the app in check. Once you've set up that monitoring automation, you

00:10:18can ask it to launch clawed code in non-interactive mode, basically running it on its own without you

00:10:23having to drive it and have it fix issues in a loop until all the test cases pass. What it does from

00:10:28there is set up the automation workflow and load skills like the sub-agent driven development skill

00:10:34and the GitHub PR workflow skill which tell it how to manage the app on GitHub. It first identifies the

00:10:39issues that were breaking production then launches clawed code in non-interactive mode which takes

00:10:44the tests and commits the changes once all of them pass. After it has run every test and fixed whatever

00:10:50was causing production to fail, it uses the GitHub CLI to commit the changes. The app ends up running

00:10:55without any failures because it has confirmed that all the checks for a successful deployment are in place.

00:11:00If you like these breakdowns, subscribe to the channel, click the notification bell and hit the hype

00:11:05button too. On the channel, we post content that helps you learn new ways to optimize different

00:11:10processes in different businesses with AI. Your support, whether it's subscribing, the notification

00:11:15bell or the hype button, helps us create more content like this and reach more people. It means a lot to us.

00:11:21Now the second type is the non-deterministic loop and these are tasks where you can't just set a clear

00:11:26rule to check whether the job is done the way you can with deterministic loops. Because of that,

00:11:31there's no clean way to verify the outcome. These are the kinds of things that we as humans can look

00:11:36at and judge for ourselves like building a UI or implementing a feature that needs a judgment call.

00:11:41So when you're working with a non-deterministic loop, the workflow is different. If you're applying

00:11:46AI to UI, you already know that it tends to fall back to the same patterns all the time. That's why we

00:11:51created a skill called AI Slop Detector which holds all the instructions on how to avoid AI slop and lists

00:11:57the patterns that actually give it away. And the reason we're using Hermes again is the self-evolving

00:12:02skills. If we still find AI slop in the UI after running the skill, the skill can update itself to

00:12:07incorporate that feedback directly and that's exactly why we set this workflow up on Hermes. So we asked

00:12:13Hermes to use the skill and check whether the UI has any of those patterns. If it does, it fixes them

00:12:18and launches Claude Code in non-interactive mode to run the skill and keep fixing what it finds until

00:12:23there's nothing left to fix. Another benefit we get out of Hermes is that the model reviewing the work is

00:12:28different from the one building it. We were using the GPT models which are known to be among the best for

00:12:33code review, so the Claude models become the builder and the other agent becomes the verifier. That's what

00:12:38completes the adversarial loop where the two check each other's work. Once that loop ran, it generated a

00:12:43much better UI than the generic output the Opus models are putting out nowadays. And if you still spot any sign of AI

00:12:49slop in the UI after the agent loop has ended, you can just mention it and it will update the skill for

00:12:54you, strengthening the verifier you already have. We've enhanced this skill to match multiple AI slop

00:12:59patterns that we and Hermes identified collectively. If you want to use this skill, you can get it from our

00:13:04community AI Labs Pro. The link's going to be in the description. That brings us to the end of this video.

00:13:09If you'd like to support the channel and help us keep making videos like this, you can do so by using the

00:13:14super thanks button below. As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Loop engineering creates autonomous development workflows by designing systems that handle task iteration, error correction, and verification, allowing agents like Hermes to drive coding tasks from start to finish without human prompts.

Highlights

Loop engineering shifts the focus from manual prompt engineering to designing self-driving systems where agents manage their own tasks and course corrections.
Deterministic loops use clear verification criteria like test passing or compilation success to confirm task completion.
Non-deterministic loops utilize adversarial setups where a builder model creates content and a separate verifier model critiques it against quality standards.
Effective loops require specific components: robust context management, clear feedback mechanisms, verification gates, explicit termination conditions, and error handling strategies.
Autonomous agents like Hermes and Claude Code now operate in non-interactive modes to execute long-term tasks without human intervention.

Timeline

From Prompt Engineering to Loop Engineering

Loop engineering replaces manual, step-by-step prompt crafting with system design where agents manage their own workflows.
Human involvement in error checking and course correction between steps is replaced by autonomous processes.
Advanced models like Claude 3.5 Opus enable the shift toward long-running tasks that previously required human intervention.

The transition in AI development moves away from manually writing instructions to designing self-correcting systems. In this framework, the human defines the end goal while the agent determines the necessary steps, handles errors, and iterates until the objective is achieved. This shift mirrors reinforcement learning concepts, where agents use signals to improve performance over time rather than relying on static prompts.

Essential Components for Building Autonomous Loops

System performance in loops relies on clearly defined end goals that guide agent decision-making.
Successful implementation requires five components: context management, feedback quality, verification gates, termination conditions, and error handling.
State must be maintained across turns using external files to prevent context window limitations from disrupting progress.

Constructing effective loops requires careful architecture. Developers must manage context beyond standard chat windows, implement high-quality feedback loops, and establish explicit verification gates to determine if a task is actually finished. Error handling is critical to ensure tool call failures do not leave the system in a broken, non-functional state.

Deterministic and Non-Deterministic Loop Architectures

Deterministic loops are ideal for tasks with clear success criteria, such as passing unit tests or successful code compilation.
Non-deterministic loops address subjective tasks like UI design by utilizing an adversarial approach with separate builder and verifier models.
Hermes agents leverage self-evolving skills to automatically update workflows based on feedback and identified patterns.

Deterministic loops rely on binary outcomes to judge success, making them ideal for automated maintenance like monitoring production environments via test suites. Non-deterministic loops handle more complex, subjective tasks. By utilizing different models for building and verifying, such as using Claude for construction and GPT for review, agents can improve output quality and detect patterns like 'AI slop' to iteratively refine the results.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video