00:00:00[MUSIC PLAYING]
00:00:02Hi, my name is Mario.
00:00:04I hail from the land of Arnold Schwarzenegger,
00:00:06which you probably haven't noticed yet
00:00:09based on my very good English.
00:00:12I want to preface this with we've
00:00:13been running around with our four-year-old the entire day
00:00:16through London.
00:00:17So we went to dinosaurs, mummies, Nandos, obviously,
00:00:24and stuff I have already forgotten.
00:00:26I'm very, very tired.
00:00:28And if you don't understand anything I say,
00:00:31just raise your hand and say, grandpa, wake up.
00:00:36The reason I'm here is actually another person,
00:00:39which is here in Cockneyville today.
00:00:40Let's call him Shteter Pineburger.
00:00:44Back in 2025, I think somewhere around April,
00:00:53he told me and Armin Ronecha, which you might also know
00:00:58of Flask fame and Sentry fame, dude, those coding agents,
00:01:02they actually work now.
00:01:04And I was like, oh, shut the fuck up.
00:01:06Sorry, I'm also using swear words.
00:01:09Totally not.
00:01:10And a month later, we teamed up at this flat for 24 hours
00:01:13overnight and just let ourselves get immersed by the clankers,
00:01:19by the wipe code, and by the wipe slop.
00:01:21And since then, none of us have really--
00:01:23we're sleeping anymore, basically.
00:01:27So we were billing stuff, lots of stuff, most of which
00:01:32we actually never used, because that's the new thing in 2025,
00:01:36'26.
00:01:37We bill a lot of stuff, but we don't bill a lot of stuff
00:01:39we actually use.
00:01:40We wrote a lot of stuff.
00:01:42And eventually, that culminated in me thinking,
00:01:46hey, I hate all the existing coding agents or harnesses.
00:01:50How hard can it be to write one myself?
00:01:53And Peter was like, oh, I just want to do a thing.
00:01:56Nobody's probably going to hear about it.
00:01:58And it's going to be a personal assistant,
00:02:01because that's what I've always wanted to have.
00:02:03Most of you probably know how his story went.
00:02:05So today, I'm going to tell you my much less impressive story.
00:02:08But I hope I can transport a couple of learnings,
00:02:11as we see in the industry, that I was able to gather
00:02:16in the past couple of months.
00:02:17So Pi.
00:02:19In the beginning, there was Cloud Code.
00:02:21Actually, there was copy and pasting from JetGPT.
00:02:25We all did that in the beginning, 2023.
00:02:27Then there was-- who remembers the original GitHub co-pilot?
00:02:32Yeah, actually, how many of you are engineers?
00:02:35How many of you are using coding agents,
00:02:37like Cursor, Cloud Code?
00:02:39OK.
00:02:40Popularity contest, Cloud Code?
00:02:43Codex CLI?
00:02:45Cursor?
00:02:48Open--
00:02:48[INAUDIBLE]
00:02:49Yeah.
00:02:50Open code?
00:02:50Anti-gravity.
00:02:51Oh, that's not a lot.
00:02:52Anybody using this?
00:02:55I like you.
00:02:56We're going to have a beer later.
00:02:58Anyway, so this was basically what happened in 2025
00:03:03and before.
00:03:04Started with copy and pasting from JetGPT.
00:03:06It's all mostly broken.
00:03:07It's mostly single functions, stuff you don't want to write.
00:03:10Then you got GitHub co-pilot inside of your Visual Studio
00:03:13Code, where you just tap, tap, tap to happiness,
00:03:15which did work sometimes, mostly didn't.
00:03:17Sometimes it will also just [INAUDIBLE] recite GPL code,
00:03:22like John Carmack's inverse square root
00:03:25and stuff like that, which was a lot of fun.
00:03:29And then there was Adir.
00:03:30Anybody remember Adir?
00:03:31Yes.
00:03:32Old people.
00:03:33Hello.
00:03:33Yeah.
00:03:37You have gray hair.
00:03:37You obviously know Adir.
00:03:41There was also AutoGPT.
00:03:43Probably not a lot.
00:03:44Yeah, OK.
00:03:45He knows all the things.
00:03:48And then eventually there was Cloud Code.
00:03:51I think they released it in November,
00:03:52actually, as a beta in 2024.
00:03:55But it really only became used more, say again?
00:03:59Only February.
00:04:01Yeah, February, March, something like that, 2025.
00:04:03And I was like, I love it.
00:04:05It's awesome.
00:04:06The Cloud team is also awesome.
00:04:07They're on socials.
00:04:08And they're all very good people and very talented people.
00:04:13And they basically created the entire genre.
00:04:15I know there were precursors like Adir and AutoGPT,
00:04:18but nothing did this.
00:04:20And this was basically the whole agentic search thing.
00:04:22So instead of like cursor going into your code base,
00:04:25indexing things, constructing ASTs, and indexing that as well.
00:04:29And it's kind of not really working.
00:04:31They just said, eh.
00:04:33We reinforcement trained our models
00:04:35to just use file tools, bash tools,
00:04:37to explore your code base ad hoc and find the places that it
00:04:41needs to find to understand the code and then modify the code.
00:04:44And this worked so well that, yeah, we
00:04:46stopped sleeping because we all of a sudden
00:04:48could produce so much more code than we could before by hand.
00:04:52Back then, it was simple and predictable
00:04:54and actually fit my workflow perfectly.
00:04:57Fine.
00:04:58But then they fell into the trap to which most of us
00:05:05probably fall.
00:05:06The clankers can write so much code.
00:05:08Why not just let it write all the features you could ever
00:05:11imagine, right?
00:05:11Isn't that great?
00:05:12Let's just add this feature, and that feature,
00:05:14and this feature, and that feature.
00:05:15And eventually, you end up with Homer Simpson's--
00:05:18I don't even know what it's called.
00:05:20I call it a spaceship.
00:05:21And Cloud Code is now a spaceship.
00:05:23It does so many things that you actually probably ever
00:05:26use like 5% of what it offers.
00:05:28You only know about 10% in total.
00:05:30And the rest, the 90% that's left over,
00:05:33that's kind of like the dark matter of AI and agents.
00:05:36Nobody knows what it's actually doing.
00:05:37And I personally find this not to be very helpful
00:05:40because I still think that you kind of need
00:05:43to know what the agent is doing.
00:05:45This guy might disagree to some degree.
00:05:49And we're here at TESOL, and they also
00:05:51like context management or context engineering,
00:05:54as we've called it.
00:05:55And I eventually found that Cloud Code was not
00:05:58a good tool when it comes to observability
00:06:01and actually managing your context.
00:06:04Then there was also this.
00:06:06Who likes this about Cloud Code, like the immense amounts
00:06:09of flicker, unexplainable flicker?
00:06:10Well, actually, I know how to explain it and why it happens,
00:06:13but they still haven't fixed it.
00:06:15Here's Tarik.
00:06:16He's really great.
00:06:16I love him.
00:06:17He's their DevRel guy, mostly on Twitter, and he's amazing.
00:06:21But sometimes he also says questionable stuff
00:06:24like, our terminal user interface is now a game engine.
00:06:27Now, you have to know I have a game development background.
00:06:30That's where I come from.
00:06:31And if I read something like this,
00:06:32then it kind of hurts me a little bit
00:06:34because it's a freaking terminal user interface, dude.
00:06:37It's not a game engine.
00:06:38Trust me.
00:06:39The only reason you think it's a game engine
00:06:41is because you're using React in your terminal interface,
00:06:44and it takes like 12 milliseconds
00:06:45to relay out your entire user interface graph.
00:06:49Just don't do that, man.
00:06:51It's not a game engine, right?
00:06:54So and then Mitchell, who is writing Ghosty,
00:06:56was like, dude, that's offensive, man.
00:06:59Like, don't blame it on Ghosty or any other terminal.
00:07:02Your code is garbage.
00:07:04Terminals can render at like hundreds
00:07:05of frames per second, sub-milliseconds per frame.
00:07:09So don't do that, right?
00:07:12And then they eventually fixed the flicker.
00:07:15But then other stuff happened.
00:07:16So it's like they fully gave in to the vibe coding.
00:07:20And you can feel it every day when you use Cloud Code.
00:07:23Now, again, I do not want to diminish their efforts
00:07:27and their results.
00:07:28Cloud Code is still the category leader for a good reason.
00:07:30They invented this thing, and they're doing a great job.
00:07:32I personally am just an old person
00:07:34who likes predictable simple tools.
00:07:37And this just didn't fit my workflows and my needs anymore.
00:07:41So yeah.
00:07:42Also, they do a lot of stuff in the background,
00:07:44manipulating your context.
00:07:46I built a bunch of tools in summer 2025
00:07:50that would allow me to intercept requests being made
00:07:52to their back end from Cloud Code and finding out
00:07:55what kind of little additional text
00:07:58gets injected into your contacts behind your back.
00:08:00And all of that was very detrimental
00:08:01and also changed all the time.
00:08:04Like every day or second day, there
00:08:06would be a new release where this changed what
00:08:08gets injected at what point, which would basically mess
00:08:11with your existing workflows.
00:08:13It was just not a stable tool.
00:08:14And now I understand it from their perspective.
00:08:16They need to experiment.
00:08:17And they have a huge user base.
00:08:18And it's really hard to experiment
00:08:19when you have a huge user base.
00:08:21But they did not care.
00:08:23So all of us had to suffer.
00:08:25You're working with this new tool.
00:08:27You try to create predictable workflows.
00:08:31And then the tool vendor changes a tiny little thing
00:08:35under the hood that makes the LLM go
00:08:36crazy with your existing workflows.
00:08:38That's just not sustainable.
00:08:39I need control over that.
00:08:40I can't rely on them providing me a stable kind of thing.
00:08:46So I believe, as a consequence of the UI design,
00:08:52they need to reduce the amount of visibility you have.
00:08:54I personally don't like that too much.
00:08:56But that's just a personal preference.
00:08:57I understand that most people will
00:08:58be happy with the amount of information
00:09:00that Cloud Code will present you.
00:09:03There is zero model choice, obviously,
00:09:06because it's an anthropic native tool, so to speak.
00:09:09That's not the downside, because Cloud models are--
00:09:12I like them.
00:09:13They're really good.
00:09:15And there's almost zero angst sensibility.
00:09:17And you might find this kind of funny, because they
00:09:19have this whole hook system and all of that.
00:09:21But if you compare it to what Pi allows you to do,
00:09:25it's not as deeply integrated.
00:09:28It's also basically based on running a process when
00:09:32the hook event starts, which is very expensive if you
00:09:36have to start up that process over and over again.
00:09:40So eventually, I soured on Cloud Code,
00:09:42not because it was terrible.
00:09:44It's just it stopped being a fit for me.
00:09:47It became a fit for a lot more people over that period.
00:09:50So obviously, they are doing things right, but not for me,
00:09:54because I'm old.
00:09:56So then I was looking around for options.
00:09:59And there is Codex CLI, which I really didn't like.
00:10:01In the beginning, both the user interface as well as the model,
00:10:05that has changed, at least with respect to the model.
00:10:08Codex is really pretty good now.
00:10:10Then there's AMP.
00:10:12The team behind that used to work at Sourcegraph.
00:10:15They spun off of Sourcegraph.
00:10:20And they're super good engineers.
00:10:21They managed to build a commercial coding harness where
00:10:25they take away features instead of adding them.
00:10:28And most of their choices make a lot of sense to me.
00:10:33So yeah, if you're looking for a commercial coding harness,
00:10:36I would definitely recommend AMP to you, because it's really good.
00:10:39Factory Troye, kind of similar spiel, also really good,
00:10:44although they are not as experimental as AMP.
00:10:47And then there's OpenCode, which is the open source
00:10:50coding harness a lot of people use.
00:10:53So I have a history of open source.
00:10:55I've been in open source for, well, 17 years.
00:11:00I've managed big and small open source projects.
00:11:04So that's near and dear to my heart.
00:11:05And so I thought, I give OpenCode a try,
00:11:08because that's close to me.
00:11:12And next to AMP, they have one of the most grounded
00:11:15or pragmatic teams in the space.
00:11:16They don't hype you up with features
00:11:18you probably never use.
00:11:20They try to kind of conserve a happy path that's
00:11:23very stable.
00:11:26And they also have pretty good thoughts
00:11:27on what coding agents mean for us
00:11:29as a profession, which I personally can identify with.
00:11:32The problem with OpenCode is that it's also not very good
00:11:37at managing your context.
00:11:38For example, on each turn, it's calling sessionCompaction.prune,
00:11:44which does the following.
00:11:46It prunes all two results before the last 40,000 tokens.
00:11:52Now, who here knows what prompt caching is?
00:11:56What does this do to your prompt cache?
00:11:58So OpenCode and Entropic had an interesting history.
00:12:05And eventually, Entropic, in my opinion, rightly so,
00:12:11said, dudes, that's just not going to happen.
00:12:14And there was never a public kind of thing about this.
00:12:17But Tarek explains it here.
00:12:19If you come to a gym and don't behave and abuse
00:12:22the infrastructure, so to speak, you're going to get banned.
00:12:25And I think--
00:12:27I don't have any evidence for that,
00:12:28but I think that's the reason why
00:12:30there is this animosity between Entropic and OpenCode.
00:12:33And I can totally agree, or at least I
00:12:36think that Entropic is clearly on the right here.
00:12:39Don't mess with the infrastructure.
00:12:42Then there's also other stuff, like OpenCode
00:12:44comes with LSP, Language Server Protocol support,
00:12:46out of the box.
00:12:48Coming back to context engineering,
00:12:51let's say you give your agent the task
00:12:53of modifying a bunch of files.
00:12:55What does that mean in practice?
00:12:57It will make a bunch of edits, one after the other,
00:13:02to a bunch of files.
00:13:03How probable is it that after the first edit, out of 10 edits,
00:13:09so to speak, the code will compile?
00:13:12What happens if you modify your code line by line?
00:13:15How long does it take for it to stabilize again
00:13:17and it compiles cleanly?
00:13:19It doesn't.
00:13:20It won't compile after the first edit, probably not
00:13:22after the second edit, and so on and so forth.
00:13:24So if you then turn around and say, hey, dear LSP server,
00:13:28I just edited one line in this file.
00:13:30Is it broken?
00:13:31Then the LSP server will say, yes, it's really broken.
00:13:34And what this feature does is it then
00:13:36injects this error directly after the tool
00:13:39call as a kind of feedback to the model.
00:13:43Oh, what you just did is wrong.
00:13:45And the model is like, what the fuck, dude?
00:13:47I'm not done editing things.
00:13:49Why are you telling me this?
00:13:50Obviously, it's not wrong.
00:13:51But if you do this often enough, the model will just give up.
00:13:54And that leads to very bad outcomes.
00:13:58So I'm not a fan of LSP.
00:13:59I think it's a very terrible idea to have that enabled.
00:14:02There's natural synchronization points
00:14:03where you want to have linting and type checking
00:14:06and all of that.
00:14:07And that is when the agent think it's done, only then.
00:14:10This has changed recently.
00:14:14This is a single session of open code, where every message
00:14:20becomes its own JSON file.
00:14:22Every single message becomes its own JSON file on disk.
00:14:26That indicates to me that there wasn't a lot of thought put
00:14:29into the architecture of the whole thing.
00:14:31And if I lose trust in that, I don't
00:14:33want to use that tool anymore.
00:14:35Again, I think the team is actually really good.
00:14:37I think they iterated super quickly
00:14:39and built something that's super useful to a lot of people,
00:14:42obviously.
00:14:43It's just, again, decisions that I wouldn't have made that
00:14:46made me decide to build my own.
00:14:50Then there was also this.
00:14:51Open code comes with a server by default.
00:14:54So the core architecture is based on a server.
00:14:56And clients connect to it.
00:14:57And the terminal user interface is one of the clients.
00:15:00There's also a desktop interface.
00:15:01And I don't know.
00:15:03That turned out to be a security vulnerability
00:15:05with remote code execution baked in by default.
00:15:09And that's also-- if you are so proud of your server
00:15:12infrastructure or server architecture,
00:15:15then I would assume you're grown-up engineers that
00:15:18thought about security as well.
00:15:20And apparently, that didn't happen.
00:15:21And this was open for a long time.
00:15:23And again, I'm not claiming anyone here.
00:15:25This is stuff that just happens if you're
00:15:27working in an industry that's operating at a breakneck speed
00:15:31that we haven't seen before.
00:15:33It's just I don't want to use that tool if that is a thing.
00:15:36So this was my observations with regards to existing coding
00:15:42references.
00:15:42AMP and Droid would have been something I could have used.
00:15:45But again, no control.
00:15:47In case of AMP, they even decide what models you can use.
00:15:50And it's only a single model for a single type of task.
00:15:53And that's not me.
00:15:55In terms of Droid, I think it's a little bit more open.
00:15:58But at the time when I tried it out,
00:16:00it just didn't--
00:16:02I didn't see a big advantage over cloud code.
00:16:07And then I looked into benchmarks for entirely different reasons
00:16:10and found terminal bench.
00:16:12Who knows what terminal bench is?
00:16:15OK, basically, it's a coding or an agent evaluation
00:16:20harness, which has a bunch of computer use and programming
00:16:24related--
00:16:24sorry, old and tired because 4-year-old.
00:16:31It has a bunch of computer use and coding related tasks
00:16:35that an agent or the LLM inside an agent harness
00:16:39needs to fulfill.
00:16:40I think it's about 82 or so.
00:16:43And they're very diverse.
00:16:44They're from fix my window setup to code me a Monte Carlo
00:16:48simulation or something like that.
00:16:51And they have a leaderboard.
00:16:52And on that leaderboard, you see the combination
00:16:54of coding agent harness and model.
00:16:57And they have their own coding agent called Terminus.
00:17:03And I think it's brilliant because it's
00:17:06one of the best performing harnesses in the benchmark.
00:17:09We're going to see it later on.
00:17:11What exactly does it do?
00:17:12Well, all the model gets is a TMUX session.
00:17:17And all it can do is send keystrokes to it
00:17:19and read back the VT code sequences that are emitted.
00:17:23So this is like the smallest, most minimal interface
00:17:27a model can have to your computer.
00:17:31And this performs top of the line of the entire leaderboard.
00:17:36So what does this tell us about existing coding agent harnesses?
00:17:39Do we need all these features for the models
00:17:41to actually perform?
00:17:43For me, personally, this is not just about the model actually
00:17:48being good.
00:17:49It's also about me as the user, the human,
00:17:51having a way to interact with my agent with the model.
00:17:54And Terminus is obviously not the user experience or developer
00:17:58experience that I want.
00:18:00But it tells us that all of these features, all of these coding
00:18:03harnesses have might not be necessary to get
00:18:08good results out of agents.
00:18:10So no file tools, no sub-agents, no web search, no nothing.
00:18:13Two theses is based on all of these findings.
00:18:16We are in the messing around and finding out stage.
00:18:18And nobody has any idea what the perfect coding agent should
00:18:21look like or what the perfect coding harness should look like.
00:18:23We're trying both minimalism and going full spaceship swarms
00:18:27and teams of agents and no control and full autonomy
00:18:30and whatever.
00:18:31I think that's not done yet.
00:18:33We haven't answered the question what this
00:18:35should look like ideally and what will become the industry
00:18:37standard.
00:18:38And the second thing is we need better ways
00:18:40to mess around with coding agents.
00:18:42That is, we need them to be able to self-modify themselves
00:18:47and become malleable.
00:18:48So we can quickly experiment with ideas
00:18:50and see if this is something we can make like an industry
00:18:53standard, a new workflow that we probably all are going to adapt.
00:18:58So the basic idea was--
00:18:59and it's very simple, not rocket science--
00:19:01strip away everything and build a minimal extensible core.
00:19:05There's some creature of comfort.
00:19:06It's not a plank slate.
00:19:09So that's pi.
00:19:10And the general motto is adapt your coding agent
00:19:13to your needs instead of the other way around.
00:19:16It comes with four packages, an AI package, which is basically
00:19:21just a simple abstraction over multiple providers, which
00:19:24all speak different transport protocols.
00:19:27So it's very easy to talk to all the providers
00:19:29and switch between them in the same context or same session.
00:19:34The agent core, which is just a generalized agent
00:19:36loop with tooling locations, verification,
00:19:38and so on and so forth.
00:19:39And streaming, a terminal user interface
00:19:42that's like 600 lines of code and works really well,
00:19:47surprisingly, because it wasn't written by a clanker.
00:19:51And the coding agent itself, which is both an SDK
00:19:54that you can use in the headless mode
00:19:57or a full terminal user interface coding agent.
00:20:02This is the entire system prompt.
00:20:05There's nothing more there compared to other coding
00:20:08[INAUDIBLE] system prompts.
00:20:10That's in tokens.
00:20:13It turns out frontier models are heavily RL-trained to know
00:20:16what the coding agent is.
00:20:18So why do you keep telling them that they're a coding agent
00:20:21and how they should do coding tasks, right?
00:20:27YOLO by default, why is that?
00:20:30Most coding agent harnesses at the moment have two modes.
00:20:33Either agent can do whatever it wants
00:20:36or agent gets to ask you, do you really
00:20:40want to delete this file?
00:20:41Do you really want to list the files in this directory,
00:20:44and so on and so forth?
00:20:44And there's different shades of gray here.
00:20:47But at the end of the day, it boils down to the user
00:20:49needs to approve an action by the agent.
00:20:52And then we are safe.
00:20:53And I think that's wrong because that leads to fatigue.
00:20:55And people will either turn it off entirely, YOLO mode,
00:20:58or just sit there and type enter without reading anything.
00:21:01So I don't think that's a solution.
00:21:02Containerization is also not a solution
00:21:04if you're worried about exfiltration of data
00:21:06and prompt injections.
00:21:07But I think that's the only thing that you--
00:21:10I think that's the best basis compared to guardrails
00:21:14like approval or dialogues.
00:21:17It only has four tools, read a file, write a file,
00:21:19edit a file on Bash.
00:21:21Bash is all you need.
00:21:22What's not in there?
00:21:23No MCP, no subagents, no plan, no background,
00:21:25Bash, no built-in to-dos.
00:21:26Here's what you can do instead.
00:21:28For MCP, use CLI tools plus skills,
00:21:30or build an extension, which we will see in a bit.
00:21:34No subagents, why?
00:21:35Because they're not observable.
00:21:36Instead, use tmux and spawn the agent again.
00:21:41You have full control over the agent's outputs and inputs
00:21:44and can see everything that's happening in the subagent.
00:21:48Interesting enough, code spawn--
00:21:50team mode now does exactly this, basically, as well.
00:21:55No plan mode, write a plan MD file.
00:21:57You have a persisted artifact instead
00:21:59of some janky UI that doesn't really
00:22:02fit into your terminal viewport.
00:22:04And you can reuse it across multiple sessions.
00:22:07No background Bash, don't need it, we have tmux.
00:22:09It's the same thing.
00:22:11And no built-in to-dos, write a to-do MD.
00:22:13Same thing.
00:22:14Or build all of this yourself the way you like it.
00:22:17And this is what Py allows you, by being super extensible.
00:22:21So you can extend tools, custom.
00:22:22You can give the LLM tools that you define.
00:22:26I think no other coding agent, Harness,
00:22:28currently offers that, unless you fork open code.
00:22:31You don't need to here.
00:22:32You just write a simple TypeScript file,
00:22:34and it gets loaded automatically.
00:22:37You can also write custom UI.
00:22:39Skills are obviously in their prompt templates, themes.
00:22:43And you can bundle all of that up, put it on MPM or Git,
00:22:46and install it with a single command, which is very nice.
00:22:49And everything hot reloads.
00:22:51So I developed my own extensions that
00:22:53are project or task specific in Py inside the project.
00:22:59And as the agent modifies the extension, I just reload.
00:23:05And it immediately updates all of the running code,
00:23:10which is very nice.
00:23:11And in practice, that means you can do custom compaction.
00:23:14I think that's one of the things that people should experiment
00:23:16more, because all of the compaction implementations
00:23:19currently are not good.
00:23:21Permission gates, you can easily implement them
00:23:23in 50 lines of code, and kind of cover
00:23:24what all the other agent harnesses do if you want that.
00:23:27Custom providers, register proxies of self-hosted models.
00:23:31Don't care.
00:23:32You don't need me to do this for you.
00:23:33You can do this, and actually, your clangor can do it for you.
00:23:37Or overwrite any built-in tool.
00:23:38Modify how read, write, edit, and bash work.
00:23:41Don't care.
00:23:42I have a version of read, write, edit, and bash
00:23:43that works through SSH on a remote machine.
00:23:47For me, that took five minutes to implement, but it works.
00:23:51And you have full TUI access, so you can actually
00:23:54write entirely custom UI in the coding agent.
00:23:58Cloud Code Shipped/By the Way, it took five minutes for somebody
00:24:02to replicate that in Py with more features.
00:24:05PyMessenger, I have no idea what it's doing,
00:24:07but apparently, it's like a chat room for multiple Py agents
00:24:10that then communicate, which then has custom UI.
00:24:13We can look what they're doing, and yeah, it just works.
00:24:18Or PyMess, if you're bored, just play a game
00:24:23while the agent is running, right?
00:24:24You can do that.
00:24:25Or PyAnnotate, open up the website
00:24:28you're working on currently, and annotate stuff in the front end,
00:24:31and give feedback to the agent directly in line.
00:24:35Feed it back into the context, have it modify the thing.
00:24:39Or something I use is File Switch It.
00:24:42I don't want to switch over to an IDE or editor.
00:24:43I just want to quickly look at the file that's been modified.
00:24:46So all of this is extensions.
00:24:48None of this is built in, and it takes people
00:24:50usually a couple of minutes to an afternoon
00:24:52to build all of this the way they want it to.
00:24:56PyWavic says, also, don't know what it's doing.
00:25:00Py also comes with tree structure.
00:25:01I'm not going to explain that.
00:25:03Just look at py.dev.
00:25:04Your session is a tree, not a linear list of chats.
00:25:07So you can basically do some agents
00:25:09by read all the files in the directory,
00:25:11summarize this, go back to my root of the conversation,
00:25:14take the summary with me, and do the actual work.
00:25:19Nothing is injected behind your back.
00:25:22Agents, skills, full cost tracking.
00:25:24A lot of harnesses don't do this here.
00:25:26Open code does it not well.
00:25:29HTML export, JSON format, headless JSON stream, blah, blah.
00:25:33Does it actually work?
00:25:34Well, terminal bench.
00:25:35Let me zoom in here.
00:25:36I can't.
00:25:37This is amazing.
00:25:38Here's py right behind terminus 2 using cloud opus 4.5.
00:25:45That was back in October where py didn't even have compaction.
00:25:49Demo time, skipping that, right against the clankers
00:25:51because they're breaking open source.
00:25:54If you're associated with this guy's project,
00:25:56then you will have hundreds of people coming from OpenClaw
00:26:02to your repository and span you with clanker, fill, fence law.
00:26:06So I had to invent a couple of measures.
00:26:09I invented OSS vacation.
00:26:11So I just closed issues and PRs for a couple of weeks
00:26:14and work on things on my own.
00:26:16Anything that's important will be reported later on anyways
00:26:20or in the Discord.
00:26:21And then I also implemented a custom access kind of scheme
00:26:26where I have a markdown file in the repository.
00:26:28If somebody opens a PR without their account name
00:26:32being in that markdown file, the PR gets auto-closed.
00:26:34I don't care.
00:26:35First, introduce yourself in a human voice via an issue.
00:26:39Write an issue that's not longer than the display law
00:26:42because everything else is clankers law probably.
00:26:45And once you did that, I'm happy to-- looks good to me, you.
00:26:47So you get into that file and can now submit PRs
00:26:50to the repository.
00:26:51All I'm asking is human verification.
00:26:53And Mitchell from Ghosty then took this and built
00:26:57a project called Vouch, which is more easily applicable
00:27:00to your own open source repositories.
00:27:02And that is Pi.
00:27:03Go forth and try it.
00:27:05That's it for me.
00:27:06[APPLAUSE]
00:27:07[MUSIC PLAYING]