AI Agents Are Random… This Fix Makes Them Deterministic (Archon)
BBetter Stack
Computing/SoftwareSmall Business/StartupsConsumer ElectronicsInternet Technology
Transcript
00:00:00AI agents are getting crazy powerful, but they're still chaotic.
00:00:04We give them the exact same task, and we get wildly different code, different quality, and
00:00:09even different decisions every time.
00:00:12That's sort of the reality of working with agents.
00:00:15Turns out it doesn't have to be.
00:00:17This is Archon, and it can now run multiple agents in parallel with zero merge conflicts
00:00:22and consistent results.
00:00:24I'll show you exactly how to set it up and how it works in the next couple minutes.
00:00:30Now, using Claude code, cursor, or codex, we know the first run looks great.
00:00:39The second run could be a completely different plan.
00:00:42Context can drift.
00:00:44The agent changes directions halfway through.
00:00:47Then you try to scale it.
00:00:49Two agents, maybe three agents, four agents.
00:00:51Now your repo is a complete mess.
00:00:54And here's the real problem.
00:00:55You're not really saving time anymore.
00:00:57You're rerunning prompts.
00:00:58You're fixing broken code, hoping this run doesn't just break at all.
00:01:02And if you're building anything, this really just kills our speed.
00:01:06Archon fixes this with something called harness engineering.
00:01:10Instead of hoping the agent behaves, you actually define the process.
00:01:14Planning, coding, testing, review, all in YAML.
00:01:18In agent skills, they're reusable instruction packs the agent loads automatically.
00:01:23So instead of guessing what to do, your agent follows a system.
00:01:28If you enjoy coding tools that speed up your workflow, be sure to subscribe.
00:01:32We have videos coming out all the time.
00:01:34All right, now let me show you.
00:01:36This is running locally on my M4 Pro, no cloud.
00:01:40I can enter archonserv.
00:01:43That brings up this UI interface.
00:01:45I'll install the archon skill into this repo with Claude.
00:01:49Now I run a simple workflow to fix this issue.
00:01:54Watch this part now.
00:01:55The agent finds the skill on its own, loads the workflow and executes step by step.
00:02:02You can watch it here in the terminal or over here on the UI.
00:02:04It looks way better.
00:02:05There's no prompt tweaking here.
00:02:07Even when it does fail, you get full transparency within the UI.
00:02:11You can see exactly which step broke and fix the workflow.
00:02:15This is way better than raw Claude code where you just get confused chat history.
00:02:20This part is key.
00:02:21It also runs on its own Git work tree, so it never touches main.
00:02:26It's prompting through and you can see here it generates it.
00:02:29It's done, clean PR, same structure, same result.
00:02:33We can see logs, the process the prompts go through and the entire output.
00:02:38This is what consistency looks like.
00:02:40So what's actually changed here?
00:02:42Well, three things have changed using Archon.
00:02:45First, the workflows.
00:02:47Archon uses YAML DAGs.
00:02:50Think of it like a checklist the agent has to follow.
00:02:53Some steps use AI, sure.
00:02:56Some steps are fixed.
00:02:58That mix is what makes it more reliable.
00:03:00Then we have the isolation.
00:03:01Every run happens in a separate Git work tree, so agents can't overwrite each other.
00:03:06That's why there are no merge conflicts.
00:03:08In skills, instead of stuffing prompts every time, the agent loads context automatically.
00:03:14So compared to raw agents, you remove all this randomness.
00:03:19Compared to tools like, let's say, LangChain for this one.
00:03:22LangChain is great, but Archon, this is built for code, not general bots.
00:03:27And compared to scripts, this is reusable.
00:03:30It's versions.
00:03:31It's discoverable.
00:03:32The agent isn't guessing anymore.
00:03:34We have this whole workflow it's going through.
00:03:36It's following this actual system.
00:03:38Now we can run multiple agents at the same time and not worry about breaking the repo.
00:03:42You can generate PRs that look the same every time.
00:03:45And the big one here, you stop losing knowledge in chat history.
00:03:49Your process lives in workflows now, which means every run gets more consistent using
00:03:55this.
00:03:56With this, clean PRs, more predictable results.
00:03:58It's the same input, it's the same output.
00:04:00That's the part agents were missing.
00:04:02Now this isn't perfect, right?
00:04:04But what's good?
00:04:05All right, it's open source, it runs great locally, especially on M chips, right?
00:04:10There are certain ones that have a VPS configuration.
00:04:13I don't need that here.
00:04:14YAML makes everything visible.
00:04:16Great win for us and get work trees solve a real problem.
00:04:19But again, this also means a few things.
00:04:21You have to think upfront.
00:04:23Designing workflows is going to take a little bit of effort and it's still evolving, right?
00:04:28Things are going to change.
00:04:29They're going to evolve, but they are growing.
00:04:31And if you're just doing quick prompts, you probably don't even need this.
00:04:34This would just be honestly a waste of time.
00:04:36Also, the model still does matter.
00:04:38So a better model obviously is going to generate us a better output.
00:04:42If you're tired of fixing agent mistakes, this is definitely worth a shot.
00:04:46If you want something you can actually rely on without second guessing yourself, this is
00:04:50also pretty worth it.
00:04:52If you're just experimenting, I mean, yeah, I was experimenting for this.
00:04:55I kept it simple.
00:04:56It works great.
00:04:57I got to see what it's all about.
00:04:58But if you're serious about building with agents, this is one of the highest leverage tools that
00:05:02I've come across right now.
00:05:04This is what turns agents from these demos that we're using into something we can actually
00:05:08ship with more reliably, incorporating this into our workflow.
00:05:13It's pretty simple.
00:05:14Before you hope the agent does the right thing, right?
00:05:16It's an agent.
00:05:17Now we define how it works.
00:05:20That's what they're claiming or that's what this harness engineering is.
00:05:23If you enjoy coding tools and tips like this, be sure to subscribe to the Better Stack channel.
00:05:27We'll see you in another video.