00:00:00So, okay.
00:00:02What's the best AI model right now?
00:00:04Claude, GPT, Gemini.
00:00:07And honestly, I think that's the wrong question.
00:00:11Like, completely the wrong question.
00:00:14Just real quick, I'm Daniel.
00:00:16I've been deep in iOS dev for over eight years now.
00:00:20Started out freelancing, designing UIs,
00:00:24bouncing from client to client,
00:00:25shipping other people's ideas
00:00:27while trying to figure out my own.
00:00:28And then after www.25, I just went all in, solo.
00:00:33No more clients, no safety net.
00:00:36Since then, I've crafted over 15 of my own apps,
00:00:39all Swift UI, all built in public.
00:00:41And right now, honestly, every ounce of energy I've got
00:00:44goes into making this solo studio
00:00:46into something that actually lasts.
00:00:49Not another round of quick MVPs or AI-generated slop,
00:00:52but real apps that hold up in scale.
00:00:55And yeah, all of that process.
00:00:57The whole messy journey lives on crafterslab.
00:01:00It's at crafterslab.dev,
00:01:01and it's not some tutorial graveyard or AI clone factory.
00:01:06It's genuinely my home base,
00:01:08built for solo devs who use AI like a real teammate.
00:01:12Not like a vending machine you poke when you're stuck
00:01:14and hope for the best.
00:01:16If you care about the craft,
00:01:18if you're serious about leveling up
00:01:20and building things that actually last,
00:01:23yeah, you'd feel right at home.
00:01:24And hey, if you're still on Patreon,
00:01:26huge thanks for that, but heads up.
00:01:29Everything's moved over to crafterslab.dev.
00:01:32That's where the whole crew is now.
00:01:33Come build with us.
00:01:35So here's what got me thinking about all this.
00:01:38There was a study that came out recently.
00:01:41Researchers published this benchmark called Epic's Agent.
00:01:45And what makes it different from every other benchmark
00:01:49you see people arguing about online
00:01:51is that it tests agents on real professional work,
00:01:55not coding puzzles, not multiple choice.
00:01:58We're talking actual tasks that consultants, lawyers,
00:02:03analysts do on a daily basis.
00:02:05Each one takes a human about one to two hours to complete.
00:02:08So they ran every major frontier model through it.
00:02:11The best one completed those tasks
00:02:13about 24% of the time, one in four.
00:02:17And after eight attempts with the same model,
00:02:20it only climbed to around 40%.
00:02:23Now keep in mind, these are the same models
00:02:26scoring above 90% on the benchmarks
00:02:29everyone loses their minds over.
00:02:32So either those benchmarks are off
00:02:33or we're measuring the wrong thing.
00:02:36And I think it's the second one, right?
00:02:37But okay, so here's where it gets real for us.
00:02:41The researchers actually dug into why the agents failed.
00:02:46And the answer wasn't that the models are dumb.
00:02:49They had all the knowledge they needed.
00:02:51They could reason through the problems just fine.
00:02:54The failures were almost entirely
00:02:56about execution and orchestration.
00:03:00The agents would get lost after too many steps.
00:03:02They'd loop back to approaches that already failed.
00:03:05They just lose track of what they were even supposed
00:03:09to be doing in the first place.
00:03:11And if you're a solo dev using Claude code
00:03:14or cursor every day, yeah, you've been there.
00:03:18You've watched the agent spiral retry the same
00:03:21broken thing three times,
00:03:23completely forget the context from 20 steps ago.
00:03:26And you're sitting there like,
00:03:28maybe I should switch to Opus.
00:03:30Maybe I need a different provider,
00:03:32but the data is saying that's not it.
00:03:34The model isn't the bottleneck.
00:03:36It's everything wrapped around it.
00:03:38And there's a word for that.
00:03:40And I think it's gonna define 2026
00:03:43the way agents define 2025.
00:03:46The word is harness.
00:03:47An agent harness sees all the infrastructure
00:03:50around the model, what it can see,
00:03:52what tools it has access to,
00:03:54how it recovers when things go sideways,
00:03:56how it keeps track of what it's doing over a long session.
00:03:59OpenAI literally published a blog post
00:04:02called Harness Engineering.
00:04:04Anthropic put out a whole guide on building effective
00:04:07harnesses for long running agents.
00:04:09Manish, the AI company Meta just acquired,
00:04:13they published their context engineering lessons
00:04:16after rebuilding their entire agent framework
00:04:19five times in six months, five times.
00:04:22And they're all saying the exact same thing.
00:04:24The harness is where the real engineering work lives,
00:04:27not the model.
00:04:28Okay, so, and this is the part that honestly surprised me
00:04:32because it runs completely counter
00:04:34to how most of us think about building with these tools.
00:04:38So there's this story from Vercel.
00:04:41They had a text to SQL agent.
00:04:43You ask a question, it writes a SQL query,
00:04:46and they built it the way most people build agents, right?
00:04:49Gave it a bunch of specialized tools,
00:04:51one for understanding the database schema,
00:04:54one for writing queries, one for validating results.
00:04:58All this error handling had wrapped around it
00:05:01and it worked about 80% of the time.
00:05:04Then they tried something kind of radical.
00:05:06They removed 80% of the tools, just ripped them out,
00:05:11gave the agent basic stuff, run bash commands, read files,
00:05:15standard command line tools like grep and cat,
00:05:18the kind of stuff you or I would actually use.
00:05:20And accuracy went from 80% to 100%.
00:05:25It used 40% fewer tokens,
00:05:28and it was three and a half times faster.
00:05:31Not gonna lie, that's kind of wild, right?
00:05:33And the engineer who built it said something
00:05:36that really stuck with me.
00:05:38Models are getting smarter.
00:05:40Context windows are getting larger.
00:05:42So maybe the best agent architecture
00:05:44is almost no architecture at all.
00:05:46And that just flips everything, you know what I mean?
00:05:50Because the instinct, especially when you're solo
00:05:54and you're trying to make this thing reliable,
00:05:57is to keep adding more tools, more guardrails,
00:06:01more routing logic.
00:06:02You think more structure is gonna help,
00:06:04but those tools weren't helping the model.
00:06:06They were getting in the way.
00:06:08And this isn't an isolated thing either.
00:06:10Manus went through the exact same realization.
00:06:13They rebuilt their entire agent framework
00:06:16five times in six months,
00:06:19and their biggest performance gains
00:06:21didn't come from adding features.
00:06:23They came from removing them.
00:06:25They ripped out complex document retrieval,
00:06:28killed the fancy routing logic,
00:06:29replaced management agents with simple structured handoffs.
00:06:34Every iteration, the thing got simpler and it got better.
00:06:37And here's the part I think every solo dev
00:06:40running long, clawed code sessions needs to hear.
00:06:42Manus found that their agent averaged
00:06:45around 50 tool calls per task.
00:06:49That's a lot of steps.
00:06:50And even with models that technically support
00:06:53huge context windows,
00:06:54performance just degrades past a certain point.
00:06:58The model doesn't suddenly forget everything.
00:07:01It's more like the signal gets buried under noise.
00:07:04Your important instructions from the start of the session
00:07:07get lost under hundreds of intermediate results.
00:07:10So their fix was dead simple.
00:07:12They started treating the file system
00:07:14as the model's external memory.
00:07:17Instead of cramming everything into the context window,
00:07:20the agent writes key info to a file
00:07:23and reads it back when needed.
00:07:25And yeah, if you use clawed code,
00:07:27you've literally seen this.
00:07:29The clawed.md files, the to-do lists, the progress tracking,
00:07:34that's this exact pattern playing out
00:07:36in your terminal every day.
00:07:37All right, so remember what I said
00:07:40about everyone converging on the same idea?
00:07:44Because when you look
00:07:45at the three most successful agent systems right now,
00:07:49they all arrived at the same place
00:07:51from completely different directions.
00:07:53Codex from OpenAI, it's got this layered approach.
00:07:57An orchestrator that plans,
00:07:59an executor that handles individual tasks,
00:08:02and a recovery layer that catches failures.
00:08:06It's robust.
00:08:07You can hand it stuff and walk away.
00:08:09That's one philosophy.
00:08:10Clawed code, and I use this every single day.
00:08:14The core is literally just four tools.
00:08:16Read a file, write a file, edit a file,
00:08:19run a bash command, that's it.
00:08:21Most of the intelligence lives in the model itself.
00:08:23The harness stays minimal.
00:08:25And when you need more, extensibility comes through MCP
00:08:28and skills that the agent picks up as needed.
00:08:30And then Manish landed on what I'd call
00:08:33reduce, offload, isolate, actively shrink the context,
00:08:38use the file system for memory,
00:08:40spin up subagents for heavy tasks,
00:08:43and just bring back the summary.
00:08:45Three totally different approaches,
00:08:47all converging on the same insight.
00:08:50The harness matters more than the model.
00:08:52And for solo devs,
00:08:55this changes what you should actually
00:08:57be spending your time on.
00:08:59Because, you know, we don't have infinite hours.
00:09:01Every hour you spend on Reddit debating
00:09:05clawed versus GPT is an hour you're not shipping.
00:09:08And there's this idea from Richard Sutton,
00:09:11one of the creators of reinforcement learning,
00:09:14called the bitter lesson.
00:09:16The core argument is that
00:09:18approaches which scale with compute
00:09:21always end up beating approaches
00:09:23that rely on hand-engineered knowledge
00:09:26applied to what we're doing.
00:09:27That means something very specific.
00:09:29As models get smarter,
00:09:31your harness should get simpler,
00:09:33not more complex.
00:09:34If you're adding more hand-coded logic,
00:09:36more custom pipelines with every model upgrade,
00:09:40you're swimming against the current.
00:09:42And honestly, that over-engineering
00:09:44is probably why your agent keeps breaking.
00:09:47So here's what I'd actually try.
00:09:49First, do the Vercel experiment yourself.
00:09:52If you've got any kind of agent set up,
00:09:54strip it down, remove the specialized tools,
00:09:57give it a bash terminal and basic file access
00:10:00and just see what happens.
00:10:02The model is probably smarter
00:10:03than the tool pipeline you built around it.
00:10:06Second, add a progress file.
00:10:08Have your agent maintain a running to-do list
00:10:10that it updates after each step.
00:10:13It reads the file at the start of each action,
00:10:15writes to it at the end.
00:10:17This is exactly what clawed code does
00:10:19with those markdown files.
00:10:20And it's the same pattern man has landed on
00:10:22after five complete rewrites.
00:10:24I actually have a whole system for this
00:10:26wired up in the lab with all my agent instructions
00:10:29and .md templates, ready to go if you're curious.
00:10:33And third, start learning about M, CP, and skills.
00:10:37These give the model clean, standardized ways
00:10:40to work with external tools
00:10:42without you having to hard code every integration.
00:10:44That's where the extensibility lives now.
00:10:462025 was the year of agents.
00:10:50And for the most part, yeah, that happened.
00:10:53But 2026, I think 2026 is the year of harnesses
00:10:58and the same model, the exact same model
00:11:03behaves completely differently in clawed code
00:11:06compared to cursor or compared to codecs.
00:11:08So choose your harness carefully,
00:11:11whether you're using a coding agent or building one.
00:11:14And so, yeah, if you're still here,
00:11:17honestly, you're a legend.
00:11:18And look, I know the model discourse is loud right now.
00:11:22Every week there's a new drop, a new benchmark,
00:11:24a new thread about which one is king.
00:11:27But the actual data, the actual engineering
00:11:30coming out of the companies building this stuff,
00:11:32it's all pointing somewhere else.
00:11:34The harness is where the wins are.
00:11:37And as solo devs, that's actually great news
00:11:40because building a better harness
00:11:42is something you can do right now today
00:11:45without waiting for the next model release.
00:11:47And if you wanna go deeper into how I actually
00:11:51set all this up, the .MD files, the agent workflows,
00:11:56how I wire everything together for my own apps,
00:11:59come check out crafterslab.dev.
00:12:02It's not some tutorial dump or another AI content farm.
00:12:06It's genuinely my home base built for solo devs
00:12:09who treat AI like a real teammate
00:12:11and actually care about what they ship.
00:12:13Inside, you get full walkthroughs,
00:12:15real short video tutorials, a bunch of clawed code skills
00:12:19you can grab and use right away,
00:12:21and downloadable resources you can drop
00:12:24straight into your projects.
00:12:26Members riff in the comments, ask followups,
00:12:29go back and forth.
00:12:30It's a real conversation, not some one-way content feed.
00:12:34But the real core, the Notion team spaces,
00:12:37my live playbook, you get a front row seat
00:12:40to how I run every single app I'm building,
00:12:42the actual .MD files I use on real projects,
00:12:46the prompt library, the docs I'm writing as I go,
00:12:49all the automations running behind the scenes,
00:12:51nothing polished for the camera, just the real process,
00:12:55messy parts and all, and their Swift brain,
00:12:58a curated Swift and Swift UI library
00:13:01I've been building out for years, deep dive keynotes,
00:13:04private talks I spent real money curating,
00:13:07the kind of material that's not floating around
00:13:10in public training data.
00:13:11This is what I actually use to build custom MCPs
00:13:16to set up skills for Clawed Code, for Cursor, all of it,
00:13:20always experimenting, always sharing what sticks,
00:13:23and then Ops Lab.
00:13:25That's where all the AI agent instructions live,
00:13:28the Notion templates, the Clawed Code skills,
00:13:31the workflows, automations all wired up
00:13:33and ready for you to copy, tear apart,
00:13:36totally break and rebuild your own way.
00:13:38The whole point is keeping the indie stack connected
00:13:41so you're never really building alone,
00:13:44even if you're solo at the keyboard.
00:13:46So yeah, if you wanna get in while the crew's still small
00:13:49and prices are locked, now is kind of the sweet spot.
00:13:52It feels way more like a behind the scenes dev lounge
00:13:55than some giant faceless forum
00:13:57would genuinely love to see you in there.
00:14:00Trade some takes on this harness stuff,
00:14:02maybe learn something from what you're building next.
00:14:05Keep crafting, keep experimenting,
00:14:08and don't let the benchmark noise distract you
00:14:10from what actually matters.
00:14:12Peace.