AI Agents Are Random… This Fix Makes Them Deterministic (Archon)

BBetter Stack
컴퓨터/소프트웨어창업/스타트업가전제품/카메라AI/미래기술

Transcript

00:00:00AI agents are getting crazy powerful, but they're still chaotic.
00:00:04We give them the exact same task, and we get wildly different code, different quality, and
00:00:09even different decisions every time.
00:00:12That's sort of the reality of working with agents.
00:00:15Turns out it doesn't have to be.
00:00:17This is Archon, and it can now run multiple agents in parallel with zero merge conflicts
00:00:22and consistent results.
00:00:24I'll show you exactly how to set it up and how it works in the next couple minutes.
00:00:30Now, using Claude code, cursor, or codex, we know the first run looks great.
00:00:39The second run could be a completely different plan.
00:00:42Context can drift.
00:00:44The agent changes directions halfway through.
00:00:47Then you try to scale it.
00:00:49Two agents, maybe three agents, four agents.
00:00:51Now your repo is a complete mess.
00:00:54And here's the real problem.
00:00:55You're not really saving time anymore.
00:00:57You're rerunning prompts.
00:00:58You're fixing broken code, hoping this run doesn't just break at all.
00:01:02And if you're building anything, this really just kills our speed.
00:01:06Archon fixes this with something called harness engineering.
00:01:10Instead of hoping the agent behaves, you actually define the process.
00:01:14Planning, coding, testing, review, all in YAML.
00:01:18In agent skills, they're reusable instruction packs the agent loads automatically.
00:01:23So instead of guessing what to do, your agent follows a system.
00:01:28If you enjoy coding tools that speed up your workflow, be sure to subscribe.
00:01:32We have videos coming out all the time.
00:01:34All right, now let me show you.
00:01:36This is running locally on my M4 Pro, no cloud.
00:01:40I can enter archonserv.
00:01:43That brings up this UI interface.
00:01:45I'll install the archon skill into this repo with Claude.
00:01:49Now I run a simple workflow to fix this issue.
00:01:54Watch this part now.
00:01:55The agent finds the skill on its own, loads the workflow and executes step by step.
00:02:02You can watch it here in the terminal or over here on the UI.
00:02:04It looks way better.
00:02:05There's no prompt tweaking here.
00:02:07Even when it does fail, you get full transparency within the UI.
00:02:11You can see exactly which step broke and fix the workflow.
00:02:15This is way better than raw Claude code where you just get confused chat history.
00:02:20This part is key.
00:02:21It also runs on its own Git work tree, so it never touches main.
00:02:26It's prompting through and you can see here it generates it.
00:02:29It's done, clean PR, same structure, same result.
00:02:33We can see logs, the process the prompts go through and the entire output.
00:02:38This is what consistency looks like.
00:02:40So what's actually changed here?
00:02:42Well, three things have changed using Archon.
00:02:45First, the workflows.
00:02:47Archon uses YAML DAGs.
00:02:50Think of it like a checklist the agent has to follow.
00:02:53Some steps use AI, sure.
00:02:56Some steps are fixed.
00:02:58That mix is what makes it more reliable.
00:03:00Then we have the isolation.
00:03:01Every run happens in a separate Git work tree, so agents can't overwrite each other.
00:03:06That's why there are no merge conflicts.
00:03:08In skills, instead of stuffing prompts every time, the agent loads context automatically.
00:03:14So compared to raw agents, you remove all this randomness.
00:03:19Compared to tools like, let's say, LangChain for this one.
00:03:22LangChain is great, but Archon, this is built for code, not general bots.
00:03:27And compared to scripts, this is reusable.
00:03:30It's versions.
00:03:31It's discoverable.
00:03:32The agent isn't guessing anymore.
00:03:34We have this whole workflow it's going through.
00:03:36It's following this actual system.
00:03:38Now we can run multiple agents at the same time and not worry about breaking the repo.
00:03:42You can generate PRs that look the same every time.
00:03:45And the big one here, you stop losing knowledge in chat history.
00:03:49Your process lives in workflows now, which means every run gets more consistent using
00:03:55this.
00:03:56With this, clean PRs, more predictable results.
00:03:58It's the same input, it's the same output.
00:04:00That's the part agents were missing.
00:04:02Now this isn't perfect, right?
00:04:04But what's good?
00:04:05All right, it's open source, it runs great locally, especially on M chips, right?
00:04:10There are certain ones that have a VPS configuration.
00:04:13I don't need that here.
00:04:14YAML makes everything visible.
00:04:16Great win for us and get work trees solve a real problem.
00:04:19But again, this also means a few things.
00:04:21You have to think upfront.
00:04:23Designing workflows is going to take a little bit of effort and it's still evolving, right?
00:04:28Things are going to change.
00:04:29They're going to evolve, but they are growing.
00:04:31And if you're just doing quick prompts, you probably don't even need this.
00:04:34This would just be honestly a waste of time.
00:04:36Also, the model still does matter.
00:04:38So a better model obviously is going to generate us a better output.
00:04:42If you're tired of fixing agent mistakes, this is definitely worth a shot.
00:04:46If you want something you can actually rely on without second guessing yourself, this is
00:04:50also pretty worth it.
00:04:52If you're just experimenting, I mean, yeah, I was experimenting for this.
00:04:55I kept it simple.
00:04:56It works great.
00:04:57I got to see what it's all about.
00:04:58But if you're serious about building with agents, this is one of the highest leverage tools that
00:05:02I've come across right now.
00:05:04This is what turns agents from these demos that we're using into something we can actually
00:05:08ship with more reliably, incorporating this into our workflow.
00:05:13It's pretty simple.
00:05:14Before you hope the agent does the right thing, right?
00:05:16It's an agent.
00:05:17Now we define how it works.
00:05:20That's what they're claiming or that's what this harness engineering is.
00:05:23If you enjoy coding tools and tips like this, be sure to subscribe to the Better Stack channel.
00:05:27We'll see you in another video.

Key Takeaway

Archon solves AI agent unpredictability by implementing harness engineering through YAML workflows and isolated Git work trees to deliver deterministic, production-ready code outputs.

Highlights

  • Archon uses YAML-based Directed Acyclic Graphs (DAGs) to enforce specific sequences for planning, coding, and testing.

  • The system runs every agent task in a separate Git work tree to eliminate merge conflicts and protect the main branch.

  • Reusable instruction packs called agent skills load automatically to provide consistent context without manual prompt engineering.

  • The software runs locally on hardware like the M4 Pro chip, removing the requirement for cloud-based execution.

  • Harness engineering replaces traditional chat history with a visible UI that tracks every workflow step and identifies specific points of failure.

Timeline

The inherent instability of autonomous agents

  • Standard AI agents produce inconsistent logic and code quality even when given identical prompts.
  • Scaling to multiple agents often results in repository corruption and context drift.
  • Rerunning prompts and fixing broken AI code negates the time-saving benefits of automation.

The fundamental problem with current tools like Claude Code or Cursor is their lack of a fixed process. Users often find the first run successful while subsequent runs deviate into entirely different plans. This randomness forces developers to spend more time monitoring and correcting the agent than building the actual product.

Systematizing agents with harness engineering

  • Harness engineering defines rigid processes for planning, coding, and review within YAML files.
  • Agents automatically discover and load relevant skills rather than relying on user prompt tweaks.
  • A dedicated UI provides full transparency into the execution logs and specific step failures.

Archon introduces a structured system where the agent follows a checklist instead of guessing the next step. Running locally on an M4 Pro chip, the tool uses the 'archonserv' command to launch a management interface. When a task is executed, the agent finds the necessary skill, follows the defined workflow, and generates a clean pull request with a predictable structure.

Structural differences in the Archon architecture

  • YAML DAGs combine fixed algorithmic steps with flexible AI steps for higher reliability.
  • Isolation via Git work trees ensures that parallel agents never overwrite each other's work.
  • Institutional knowledge moves from volatile chat histories into versioned, reusable workflows.

Three primary architectural changes differentiate Archon from general bots or scripts. First, the use of Directed Acyclic Graphs ensures a logical flow of operations. Second, technical isolation through Git work trees handles the physical file management to prevent conflicts. Finally, the shift from prompts to versioned skills makes the agent's behavior discoverable and consistent across different runs.

Implementation trade-offs and leverage

  • Effective use of Archon requires upfront effort in designing and testing workflows.
  • The underlying LLM model still dictates the ultimate quality of the code output.
  • Direct prompting remains more efficient for simple, one-off tasks without long-term utility.

While the tool is open-source and high-leverage for building serious applications, it is not a universal replacement for simple prompts. Success depends on the developer's willingness to define the 'how' of a task before the agent starts. This transition from hoping for a result to defining a system is what allows AI agents to move from experimental demos to reliable production tools.

Community Posts

View all posts