00:00:00Opus 4.5 kind of blew the doors off of like,
00:00:03if you didn't see it coming, it's here, sorry.
00:00:06I think people are picking models
00:00:08'cause it's someone that they want to work with
00:00:10rather than I'm picking this model because benchmark score.
00:00:14The harness doesn't matter as much as it used to.
00:00:16You can kind of use whatever harness
00:00:18you want to talk to these things.
00:00:19Like the difference between asking a question through codex
00:00:23and asking open code a question through codex,
00:00:26really small difference.
00:00:28- Hey Ben, thank you so much for joining us.
00:00:31I think we're going to start with the simple question,
00:00:33which is who are you and what do you do?
00:00:36- So I'm a developer relations lead at Warp.
00:00:40So I do a whole combination of building a bunch of tools
00:00:44with AI to help out the team,
00:00:46working with everyone who's using Warp out in the community
00:00:50or out in industry and been working on sort of the future
00:00:55of how you hold agents both locally and on the cloud.
00:00:58I'm sure we'll get into a lot of that stuff today.
00:01:00- Sure, I think we will.
00:01:02But before we get into that,
00:01:03I've noticed you're very good at talking on camera,
00:01:07just explaining things really well
00:01:08and it's not a common thing that people have.
00:01:11So what led you into that direction
00:01:13and how did you develop that skill?
00:01:15- Yeah, I mean, I've been doing content
00:01:19about programming for a while,
00:01:22at least a while to me, as long as I've been like in industry.
00:01:27So I started back in pandemic times
00:01:31when everyone was bored at home looking for things to do.
00:01:35And I was working on a side project at the time
00:01:39that was stitching on a JavaScript bundler
00:01:42to a framework called Eleventy.
00:01:44This is way back in the day, still a great framework.
00:01:47And I wanted some way to communicate what I was building
00:01:53in a way that didn't feel dry.
00:01:56Though normally I would see people post like a link
00:01:59to their changelog or their readme on Twitter
00:02:01to post updates on what they're doing.
00:02:04And I wanted to do something
00:02:05that was more of a video format.
00:02:06I've been doing blogging up until that point.
00:02:08I just wanted to try something different.
00:02:09So I had a whiteboard in my closet and a rock band microphone.
00:02:13That was the only microphone I had, but it had USB.
00:02:16So I was like, okay, I can plug this in to my computer
00:02:19and it works as a microphone.
00:02:20So I propped up the whiteboard on a chair.
00:02:23I wrote up the changelog on the board
00:02:25instead of doing it as like a actual written post.
00:02:29And I just like walked through all the features
00:02:31and then I jumped to a demo and I posted it.
00:02:34And people really liked the format.
00:02:36I mean, the library did all right.
00:02:38I mean, it had some cool ideas in it.
00:02:40That's actually what led me to work on Astro,
00:02:41which is an open source framework
00:02:43that I maintained for a few years.
00:02:45But it kind of showed me, first off,
00:02:48like the developer community is very small
00:02:50and very welcoming to anyone who's working on cool stuff.
00:02:54And also there is room for like intermediate dev content
00:02:59that's very showy and YouTube focused.
00:03:03Like you're on camera and you're talking about things.
00:03:06So that eventually led to me doing a lot of content creation
00:03:11about just general web development concepts,
00:03:13like best practices using HTML,
00:03:15talking about all of the libraries
00:03:16that have been floating around,
00:03:17like Seltkit and Solid and Nuxt and HTMx.
00:03:21And I just did like two short videos a week
00:03:24for like many years or two years, I think,
00:03:28something like that,
00:03:29where I was like doing it really consistently.
00:03:32And yeah, it just builds up a muscle
00:03:34when you do that over and over again.
00:03:36And it also forces you to learn things
00:03:39if you do it in short form, one minute bursts,
00:03:41which is what I've been doing.
00:03:42I haven't really done as much long form.
00:03:44It's been a lot of like short form, one minute,
00:03:46problem solution, let's talk about how this thing works.
00:03:50So just through all of that,
00:03:51you learn skills of like self-editing,
00:03:53you learn what people actually care about
00:03:55and you kind of drill more into those topics.
00:03:58And right now I'm trying to sort of steer that
00:04:01from deeply technical,
00:04:03let's talk about how this library works
00:04:04to now you're using agents
00:04:06at this higher level of abstraction.
00:04:08Let's talk about some strategies there
00:04:10and also where things are heading tooling-wise in that space.
00:04:13- And I was gonna say,
00:04:15I think a lot of people will know you
00:04:16from your whiteboard shorts.
00:04:17I actually think that's the reason
00:04:19my personal website was written in Astro
00:04:21is I think that's how I first discovered--
00:04:22- There you go.
00:04:23- When you were at Astro.
00:04:24So yeah, I'm a big fan of Astro
00:04:26and what you did over there.
00:04:28- Thanks, yeah.
00:04:29I mean, I was honestly surprised
00:04:31'cause my content was so scattershot.
00:04:33Like I was just excited by all the frameworks
00:04:36that were going on and continue to.
00:04:38Like huge Svelte user, Vue's doing interesting stuff,
00:04:41Solid's doing interesting stuff.
00:04:43And since Astro supports every like rendering framework,
00:04:46it's easy to just talk about all of them and say,
00:04:48yeah, this is part of the job.
00:04:49We support everything, so let's talk about everything.
00:04:53But it's cool that the Astro message still kind of landed,
00:04:55even if I'm not explicitly talking about it all the time.
00:04:58I think that's something I've learned also,
00:04:59like we're gonna work right now
00:05:01is just be generally useful in the community
00:05:04and people will notice what you're doing.
00:05:07Like you don't have to be silly to talk about this stuff.
00:05:09In fact, you shouldn't be.
00:05:10You should just like pick up on, hey, this is useful.
00:05:13Let me try to explain it.
00:05:14So yeah, it's been going well.
00:05:19- I think one of the most popular videos
00:05:20you've done on YouTube is the one where you go through
00:05:23the basics of React Server Components,
00:05:25or you build it from scratch.
00:05:26What's your honest view on React Server Components?
00:05:30'Cause it hasn't, I mentioned this to Evan Yu
00:05:32when we interviewed him,
00:05:33but it hasn't been the slam dunk
00:05:35that people thought it was gonna be.
00:05:37And so, yeah, what do you think about it?
00:05:39- Yeah, I haven't opened that box in a little bit
00:05:41'cause I didn't really use, like I,
00:05:44again, it's another case of,
00:05:46there's clearly a lot of confusion in the community
00:05:48about what this thing even does under the hood.
00:05:50So let me just try to understand it.
00:05:53And as a nerd, I just went down the rabbit hole
00:05:55of reading source code and figuring it out.
00:05:57So yeah, that video you mentioned started
00:06:00as like a conference talk that I did at React Summit.
00:06:03This was back, I think two years ago.
00:06:06And then I adapted it to a YouTube video while it was fresh
00:06:09and I could just kind of rattle it off.
00:06:11And yeah, it got a lot of interest
00:06:13because people still didn't know how it worked.
00:06:15And it was kind of an intro to how it does things.
00:06:19And since then, I mean, it seems like Next.js
00:06:23is still like the most used framework.
00:06:25And if you ask an agent to build something for you,
00:06:28it's only gonna accelerate
00:06:30what is most popular in the training set.
00:06:32So that's created a flywheel for Next.js for apps
00:06:35to just kind of proliferate.
00:06:36I agree, like on the tech stack side,
00:06:40it's not like this universal solve.
00:06:42And also because it blends the lines
00:06:44between what is running on the server versus the client.
00:06:48As a human reading it, it's hard to tell
00:06:50unless you're looking at the directives
00:06:52at the top of the file or inside of the server functions.
00:06:55So you can forget where you are.
00:06:56And that was always a problem.
00:06:58Like I remember like isometric JavaScript
00:07:01being like a really big term
00:07:02where you wanna write JavaScript
00:07:03that could run on a server or on a client.
00:07:06And by authoring that way, you forget where it is running
00:07:11and what the implications of each space is.
00:07:13And there are very different implications
00:07:15of this runs on a serverless function
00:07:16that runs once versus this runs on the client
00:07:19and it's stateful as long as the person has a website open.
00:07:22So I do feel like, yeah,
00:07:24they could have drawn clear boundaries
00:07:26on where everything lives.
00:07:27I know the goal is to just make the super abstraction
00:07:30that does everything.
00:07:31And at least working on Astro's APIs,
00:07:34we kind of rejected that.
00:07:36And we said, no, there should be clear lines.
00:07:38Like we actually got to lead like the Astro actions project,
00:07:43which is kind of like server functions,
00:07:45but for Astro's case
00:07:47and also something that could work in any framework.
00:07:49So it was like, instead of tying this
00:07:51to like some React runtime,
00:07:52where you have JSX on the server, JSX on the client,
00:07:55we're gonna have this like actions file,
00:07:58explicitly named as a file on your file system.
00:08:00You can't just put them anywhere.
00:08:02You can export handlers for functions
00:08:05and then you get like a magic import that you can use
00:08:07to grab those as functions you can call on the client.
00:08:10And they're just async functions
00:08:11that you could call from view.
00:08:12You could call it from like a web component.
00:08:14You could call from anywhere.
00:08:16And there are resources out there
00:08:19on like how Astro actions work,
00:08:20if anyone's curious about that.
00:08:22But it was like studying what people liked
00:08:27and didn't in the ecosystem.
00:08:30It felt like, yeah, people want clearer boundaries.
00:08:32We want it to be easy to have TypeScript
00:08:36that works across the wire.
00:08:37'Cause that was kind of the appeal.
00:08:39But we didn't want it to feel like
00:08:40you're forgetting where you are when you're authoring code
00:08:43and leaving it up to the like user
00:08:47to organize all of their code either.
00:08:49We wanted like organization to be clear,
00:08:51boundaries to be clear.
00:08:52We still get the benefits of TypeScript
00:08:54and TypeSafe form data and all that other stuff
00:08:57that server actions was giving you.
00:08:59So it's just a philosophical thing, really.
00:09:02I think it's a super powerful framework.
00:09:04It's just for legibility and reviewability.
00:09:06I feel like drawing lines is a little bit smarter.
00:09:10- You know, I really liked the server island approach
00:09:12that Astro took, and I think they did a sort of great job
00:09:14on making that pretty clear.
00:09:16And yeah, specifically identifying like this component
00:09:19is on its own, it's part of the server.
00:09:21And the rest of the website is still sort of SSR
00:09:23and rendered.
00:09:24It was a, yeah, that's why I use Astro.
00:09:26As I said, I'm a big fan of what Astro did, so.
00:09:29- Yeah, that was something Matt Kane proposed
00:09:32as soon as he joined.
00:09:33He worked on Gatsby for a while.
00:09:34So he has a ton of perspective on static site generation.
00:09:37Yeah.
00:09:38- Yeah, that's funny 'cause my site went from Gatsby
00:09:40to Astro, so that explains why.
00:09:42- A lot of people did.
00:09:44- Yeah.
00:09:45- It was a target.
00:09:46It's what we wanted people to do,
00:09:49especially when Gatsby was like no longer maintained.
00:09:51Even LFI was supporting us in that, you know?
00:09:53Like, hey, let's get some Gatsby users onto Astro
00:09:56'cause it's awesome.
00:09:57But yeah, server islands are a really cool abstraction.
00:10:01And they're cool because it's really simple.
00:10:03Like I know in Next.js you have to do a lot of hoops
00:10:05to, if you wanted something to only render on the client,
00:10:10or you wanted certain things to be like statically rendered
00:10:14in certain parts of the page to be dynamic.
00:10:16Like the classic example is you have a blog post,
00:10:19but just the button in the light counter
00:10:20should actually be a server call.
00:10:22The rest of the page can just be rendered
00:10:23when you build the website.
00:10:25You wanted to have that kind of relationship in Next.js.
00:10:27You had to sort of invent magical runtimes that can do this
00:10:31that really depend on the host whether it'll work.
00:10:34And Astro was like, yeah, server islands are just like
00:10:37a fetch function, but we make the syntax a little nicer.
00:10:39That's literally all it is.
00:10:40So it's like this part of the page,
00:10:42instead of it being like rendered when you build the website,
00:10:46it's just going to be a fetch function under the hood.
00:10:48And it's going to fetch the like button count.
00:10:51And then it's going to render this HTML
00:10:53and put it in there as soon as the website's loaded.
00:10:55Like that's all it does.
00:10:56There's really nothing fancy about it.
00:10:58If you use something like HTMX, it's that,
00:11:00but like a baby version that's admittedly less capable,
00:11:03but in my eyes, easier to understand.
00:11:05And that really caught on because it's just,
00:11:09it's so easy to deploy it anywhere.
00:11:11You don't have to think about it and it's just HTML.
00:11:14So you're not thinking about what bundler am I using?
00:11:17It just kind of works.
00:11:19So I just like how we found simple solutions like that
00:11:22just to make it easier to build stuff.
00:11:24- Yeah, but now that agents are writing most of the code,
00:11:26how much of that really matters?
00:11:28- Yeah, man, I know.
00:11:31So last year was an existential crisis,
00:11:33I think for all of us.
00:11:35Some people are having that existential crisis now.
00:11:38I empathize.
00:11:39Opus 4.5 kind of blew the doors off of like,
00:11:42if you didn't see it coming, it's here.
00:11:45Sorry.
00:11:47I knew what was going on pretty early.
00:11:49As soon as I was trying cursor tab completions
00:11:52and it was doing more and more stuff for me,
00:11:54I was like, there's some inevitability here
00:11:56of where we're heading.
00:11:57It's no longer the copilot tab completions.
00:11:59This is getting serious.
00:12:01So like, yeah, I mean, I joined Warp last year,
00:12:05which is the terminal with,
00:12:08well, it's a really nice to use terminal
00:12:10that has agents built in and also cloud orchestration.
00:12:13There's many chapters to the Warp journey.
00:12:15We help you all the way up the stack,
00:12:17kind of like how Astro helped you
00:12:18all the way from static to server.
00:12:19Warp is kind of like extending up in that way.
00:12:22But I joined Warp as like early as when Sonnet 4,
00:12:26no, Sonnet 3.5 was out and it was barely capable.
00:12:30They gave me one tech demo internally in the interview
00:12:34and the agent just kind of ran in a circle and crashed.
00:12:37And that was a demo.
00:12:38And I was like, okay, we're starting.
00:12:41We're starting with something here.
00:12:43But I could see like the tool calls
00:12:44where it was actually writing files on a system.
00:12:46It was like, oh, that's different.
00:12:47So we're not like opening the file anymore.
00:12:51It's opening the file and doing stuff.
00:12:53And then I review it on the back half.
00:12:55And it wasn't really capable at the time,
00:12:57but then it became very capable later.
00:13:00And all of that definitely had me thinking
00:13:03like how valuable is API design in this new world?
00:13:06Like I spent three years designing APIs at Astro
00:13:10as did the rest of the core team and continues to.
00:13:13But if agents can hold all of these things
00:13:17and understand them, how valuable is the API design?
00:13:20I feel like it's still really valuable for like,
00:13:25can the agent pick up on patterns quickly
00:13:28or is it going to waste a lot of compute
00:13:29trying to look up documentation and running around in circles?
00:13:32I feel like as compute gets cheaper and cheaper,
00:13:34that will be less and less of a problem.
00:13:36I'm not going to pretend like, yeah,
00:13:40perfect API design will always matter, always and forever.
00:13:43Like no, the cost of it will get lower and lower and lower
00:13:46until like you're getting microsecond improvements
00:13:49by improving the API.
00:13:50Right now we're at like,
00:13:52you can cut down a two hour agent job to like 20 minutes
00:13:56or 10 minutes if the API is designed well,
00:13:58which means it's still valuable.
00:14:00And you actually need to think about this stuff.
00:14:01At some point it may change to like,
00:14:03it took a hundred milliseconds,
00:14:05now it takes 20 milliseconds if models get that fast.
00:14:08But at least in this like window we're looking at
00:14:11for next year or two, like good API design still matters.
00:14:14So I do feel like, yeah, the frameworks
00:14:18that we're building these agents on top of,
00:14:21it'll be diminishing returns maybe,
00:14:23but I do think it matters
00:14:26if the agent's able to hold well-made tools
00:14:28versus poorly made tools to get something done.
00:14:31- Yeah, I think it makes sense.
00:14:32If the API is well written enough
00:14:34for the agents to understand,
00:14:35they can navigate through it quickly,
00:14:37make changes quickly, understand it quickly,
00:14:38and therefore be better at helping you build it.
00:14:42- Usually, yeah.
00:14:44- Because any tool essentially that was easier for a human
00:14:46is going to be easier for an agent as well.
00:14:47And I think we'll see.
00:14:49Humans still being in the review process a bit at the moment,
00:14:51it's nice when it's well-written code
00:14:53that you can understand at a quick glance.
00:14:55So tools like that definitely help.
00:14:57- Yeah, totally.
00:14:58I mean, it feels like every tip I see on good prompting
00:15:02is just tips on software engineering.
00:15:03Like it's not even that different.
00:15:05I think there was one recently about like,
00:15:06stop having a big cloud MD that describes your code base,
00:15:09organize your code better.
00:15:10Yeah, that's how we wrote readmes, right?
00:15:14Like it's bad to have a thousand line readme.
00:15:16You should have like a 20 line one
00:15:18where the code mostly explains itself
00:15:20and you document at the touch points
00:15:21and split modules at correct boundaries.
00:15:24Like nothing's changed, really.
00:15:26Like if you have more of a rat's nest,
00:15:28it'll be harder for it to navigate.
00:15:30And right now these agents are still like pretty slow
00:15:33at getting everything done and very compute intensive.
00:15:35So it does matter if you've cleaned up your house
00:15:38before you have visitors,
00:15:39if that makes sense as a metaphor, I don't know.
00:15:41- Yeah, yeah, it makes sense.
00:15:43I was gonna say going back to warp.
00:15:45So I was a heavy warp user for years
00:15:47and I think I was always really impressed
00:15:50at how warp looked and felt.
00:15:52It was a terminal that felt like an IDE
00:15:55'cause you could see tool tips
00:15:57and the menu was easy to navigate through.
00:15:59You didn't have to do a lot to configure it.
00:16:02And I've certainly moved away from warp for various reasons,
00:16:05but I think that the premise of it being a terminal
00:16:09that you can use AI with to do your code
00:16:13and even view the code inside the terminal
00:16:15is really impressive.
00:16:16And I haven't messed up with OZ yet,
00:16:18but I like the direction you guys are going in.
00:16:21Can you tell us a bit about that?
00:16:23- Yeah, and I will picture bringing on why you're doing things
00:16:27that are not working. - Sure, no, happy.
00:16:28- I do wanna know. - Happy to talk about it.
00:16:29- It's everything is a user call.
00:16:32But yeah, like general high level stuff.
00:16:36Yeah, you touched on warp sort of helping you
00:16:39with everything that's around the agent harness.
00:16:42So like you can open a terminal and use cloud code
00:16:44and it works, it can edit code.
00:16:46You can look at the output,
00:16:47you can ask it to run the dev server, all that stuff.
00:16:50But if you wanted to review the code
00:16:52that Claude wrote, for instance,
00:16:53well, you might need to open git desktop or laser git,
00:16:56or even the cloud desktop,
00:16:58like any tool that you would wanna use.
00:17:01And if you wanted to add context on like,
00:17:03here's the file or the directory that I need to put in,
00:17:06you could run some terminal commands
00:17:07to like list out all the stuff inside of the project,
00:17:11find the file name with like the @mention
00:17:14where you say like @file and then you put it in.
00:17:17You can do all of that stuff,
00:17:18but it feels like it's just kind of part of the story.
00:17:20Like cloud code is an access point
00:17:22that you can use to talk to an agent,
00:17:24but there's still all this stuff around it.
00:17:25Like the code review process, the context gathering process,
00:17:30editing markdown is weirdly very important now.
00:17:32So like editing skill files, opening your agents MD,
00:17:35that stuff kind of matters too.
00:17:37And warp is just kind of like, can the app do all of that
00:17:40instead of you jumping around a bunch of different tools
00:17:42or installing a bunch of CLI equivalents,
00:17:45like TUI applications.
00:17:46So that's why we started doing things
00:17:50that aren't really what terminals are supposed to do.
00:17:54Like putting a file explorer on the left
00:17:56and putting a code diff view on the right.
00:17:59Feels a lot like VS code if you have everything open in warp,
00:18:01but it's all progressive disclosure.
00:18:03Like you can hide it and just use a terminal if you want.
00:18:06But most of my day I have like the file tree pulled up
00:18:09and I have the code diff view that I sort of expand out
00:18:12whenever an agent's done.
00:18:13We let you like edit the code diff as well.
00:18:16We added LSP support.
00:18:17So it has like hover hints and stuff like a real editor.
00:18:21We went hard, but you can like, even at the simple level,
00:18:25I'm just, I want to review the code the agent wrote.
00:18:27You can pop open a diff view inside of your terminal.
00:18:29You can leave comments.
00:18:30So you can like hit a little button
00:18:32to leave a comment on a line and say,
00:18:33this doesn't make sense to me.
00:18:34Can you explain this?
00:18:35Send it to the agent and then it'll just kind of pick up
00:18:37on that comment for you.
00:18:39So it's easier to like have a iteration loop with an agent
00:18:42when you tie sort of the environment to the agent itself.
00:18:45And as I said, like all this stuff
00:18:47is inside of a terminal still.
00:18:49So what we're playing with is like, yeah,
00:18:52we have this diff view, this file view,
00:18:53but you don't have to use the warp agent
00:18:56to use all of that stuff.
00:18:57You can, and the warp agent does have like a really nice
00:19:00GUI around it to easily like look at get diffs
00:19:03and things like that.
00:19:04But if you want to use Claude code
00:19:07and then you wanted a nice diff view and a file tree
00:19:09to drag in context, you can do that in warp really easily.
00:19:12And we have like a tool belt that lets you open
00:19:14all those menus, a way to enter voice mode,
00:19:17an image uploader.
00:19:19We're also playing with that code comments feature
00:19:21I just talked about where you can like leave comments
00:19:22in a diff view.
00:19:23We want that to forward to the Claude code CLI
00:19:25or the codecs CLI as well.
00:19:27So we're experimenting with that.
00:19:28So unlike stuff like cursor where it's all owned by cursor,
00:19:34like everything end-to-end is owned by that tool.
00:19:37In warp, we really just own the stuff around the harness.
00:19:40So if you wanted to run any agent like PI or codecs
00:19:44or any of those, you can, but we still have the diff view
00:19:47and the file view and all that stuff to help you like work
00:19:49with the agent.
00:19:51So it's a really unique spot of being like a terminal plus plus
00:19:54that you can run all these agents inside of it.
00:19:56Then like you get some nice helpers on top
00:19:59without you having to install or configure all that yourself.
00:20:02You know, I'll be honest, I was similar to Richard
00:20:04and I used to use warp.
00:20:06I don't think there was any reason I left more than
00:20:08it was just the time where there was so many tools coming out
00:20:10that you sort of just hop between loads of them.
00:20:13And I think obviously one of my original problems
00:20:15was that I still liked VS code when it had like tab complete
00:20:18and all of that at the time.
00:20:21But I'm finding less and less that I am using an IDE now
00:20:23like VS code and cursor.
00:20:25So I definitely need to check out warp again
00:20:27'cause it sounds like you've added a lot to it
00:20:29that helps the modern day sort of development flow.
00:20:33- I'm gonna say I'm an avid warp user and I love using it.
00:20:37Yeah, and my question is with everything
00:20:45that's going on right now and everyone's coming up
00:20:48with their CLI tools and everything,
00:20:50do you think like two ways are the way to go
00:20:53the way of the future and IDEs will disappear
00:20:56and two ways will just take over the entire industry?
00:21:00- I mean, no, I mean, they're fun, they're fun to use.
00:21:05I do think like for a near term solution of like,
00:21:09oh, we were given these coding agents and they live in a CLI.
00:21:12What's the easiest way to build tooling around it?
00:21:15The more CLIs, let's just wrap it with Tmux.
00:21:19Let's wrap it with, I saw Cmux come out recently,
00:21:23which is like a ghosty extension that gives you like
00:21:26vertical tabs and stuff.
00:21:28This is getting us thinking about like,
00:21:29should warp have vertical tabs and all that.
00:21:31I hope we do.
00:21:32So people are doing crazy stuff.
00:21:35And two ways are kind of the quickest interface
00:21:37to just do that without leaving the terminal
00:21:39where you're already living.
00:21:41Warp is the harder path of like,
00:21:43let's actually build a GUI around this.
00:21:45So you need to go deeper into the terminal itself
00:21:48to add all of these tools.
00:21:50So that's the path we're taking.
00:21:52And I think we progressed from bash prompts in the 80s
00:21:57to using clickable interfaces shortly after.
00:22:00I don't really see that being any different now,
00:22:03like the appeal of being able to click and your cursor moves.
00:22:06Yeah, it's pretty useful to be able to move your cursor
00:22:10like that instead of using your keyboard to do it.
00:22:13Of course, keyboard warriors will disagree.
00:22:14But yeah, I think it's intuitive to click around
00:22:17and have like expandable menus and stuff like that.
00:22:21So it's smarter to just use like a GUI
00:22:22instead of using a two ways,
00:22:24like a rendering engine of its own.
00:22:26But I do think that somewhat I learned Vim
00:22:32right at the tail end of like when that was worth learning.
00:22:37I think it might be worth learning today.
00:22:39I learned it a few years ago.
00:22:41And then I got more used to navigating around
00:22:43with the keyboard.
00:22:44I like shortcuts and all that,
00:22:46but like scrolling around a diff view and leaving a comment,
00:22:49I don't want to use a bunch of like one letter shortcuts
00:22:53to do all of that.
00:22:54I would much rather scroll through the diff,
00:22:56click on the line, leave a comment.
00:22:58Like it just kind of makes sense.
00:22:59And I know two ways can be interactive.
00:23:01Like I have seen once they use interactive modes
00:23:04that you can click on things,
00:23:06but you can still deal with like rendering flashes
00:23:08when it's like, 'cause it's already like re-rendering
00:23:10the whole page over and over and over again.
00:23:12That's how two ways work.
00:23:14So there's a fundamental limit to that.
00:23:17There's also a limit to what you can display.
00:23:18Like you can do grading animations and stuff like that.
00:23:21But again, it's not like the best rendering engine
00:23:23to do that sort of thing.
00:23:25It's better if you can just like actually use
00:23:27a rendered native GUI to do that.
00:23:30So I feel like the answer is of course,
00:23:33we're going to go to a GUI.
00:23:35Is it going to be an IDE though?
00:23:37Probably not.
00:23:38We're probably not going to have it with like
00:23:39the full code editor where you have to wait for it to index
00:23:43before you can really use the tool.
00:23:45Like all of that waiting doesn't make much sense
00:23:47'cause I just want to talk to an agent right away.
00:23:49That's like the main interface and then everything else,
00:23:53like a diff view or a file view or supplementary.
00:23:55So to be like agent first, all of that stuff is debugging.
00:23:59And I feel like that's what Warp's doing.
00:24:01Like we're literally walking in that direction
00:24:03of like the agent's the first thing you see,
00:24:05but the file editor and the diff view are like
00:24:07the second thing that you open up
00:24:09after you're debugging what it's doing.
00:24:12- Yeah, I must admit, I'm a GUI type of person.
00:24:14So I'm a big fan of, yeah, GUIs.
00:24:16I love the sort of codecs app recently
00:24:18that OpenAI came out with.
00:24:20I think now that we're getting into multi-agent,
00:24:23it just makes more sense to me to be a GUI
00:24:25that I can click about in.
00:24:26I've never sort of enjoyed the terminal
00:24:28for more than one or two agents.
00:24:30I know I've never been a keyboard warrior, I must admit.
00:24:33So yeah, I do think we'll go back to GUIs at some point,
00:24:36but yeah, as you said, it probably won't be an IDE.
00:24:39- Yeah.
00:24:40How are you liking the codecs app by the way?
00:24:42Are you using it in like your workflow or just experimenting?
00:24:45- I enjoy it a lot for sort of vibe code experiments
00:24:48at the moment.
00:24:48I must admit, I haven't sort of used it too heavily,
00:24:50but it's just been very nice
00:24:52to have sort of multiple agents open at the same time
00:24:54and seeing them all in the sidebar,
00:24:55what they're doing and sort of clicking around.
00:24:58And the codecs model has been sort of very good.
00:25:01It's sort of my favorite coding one at the moment.
00:25:03It just sort of understands what I mean.
00:25:06And it's sort of hard to describe
00:25:08how it's better than Opus 4.6 sometimes,
00:25:10because obviously everyone has their own favorite model,
00:25:13but it's just sort of the way it feels.
00:25:14And I think it's just, as I said,
00:25:15everything's well integrated in the app
00:25:17and I'm having to check code less and less,
00:25:20which is worrying.
00:25:21Obviously, some of the apps I develop,
00:25:22I don't actually need to check the code
00:25:24'cause they're quick demos.
00:25:25So it's, yeah, I don't need to worry
00:25:27about security or anything.
00:25:29I do still think humans need to check code
00:25:31on production apps.
00:25:33- Yeah, we definitely check our code at work as well.
00:25:36Although we did set up an agent to check our code too.
00:25:39And I think that's a pretty common pattern at this point
00:25:41of like have an agent write the code,
00:25:44have a different agent review all the code,
00:25:46either as a GitHub action,
00:25:48which you can set up with like the Oz system
00:25:50that we've been building,
00:25:52or just having the agent review its work locally.
00:25:55Like both of those work pretty well.
00:25:57I've actually done that,
00:25:58where I just start a new conversation in the same directory
00:26:01and I have a saved prompt.
00:26:02I could probably make it a skill at this point.
00:26:04It's like just review the code the other agent wrote
00:26:07and make sure it's PR ready,
00:26:09simplify where it makes sense, yada, dada.
00:26:11It knows how to do a code review.
00:26:13- Actually, I'm curious about this.
00:26:16So do you use the same models for reviewing and writing,
00:26:20or do you find that there's a model
00:26:22that's better at reviewing code versus writing it?
00:26:25- Yeah, so I use the same model.
00:26:28We ask this all the time.
00:26:30And everyone thinks they've cracked the code
00:26:32in our like user groups.
00:26:34And someone's like, I only use Cloud Opus for plans.
00:26:38And then I switched to Codex to execute.
00:26:40The next person says, I only use Codex to write a plan.
00:26:43Well, I would use Opus.
00:26:44And then I use Opus to execute on the plan,
00:26:46'cause that's the best model for execution.
00:26:48It's like, it really just depends on like the type of code
00:26:52that you like reviewing, I guess.
00:26:54'Cause they do write different flavors of code, I've noticed.
00:26:57Like there are differences.
00:26:59- Yeah, but the reason why I'm asking is,
00:27:01like if you use the same model to review the code
00:27:04that the same model wrote,
00:27:06there's kind of like a bias, you know, in the code quality.
00:27:09- Maybe, maybe, I thought like,
00:27:13as long as you start a new conversation,
00:27:15you don't have the bias of past context, which is good.
00:27:19It's kind of like two people who work the same way
00:27:21reviewing each other's work,
00:27:22versus I guess people who work different ways.
00:27:24So it could make sense to like switch your model
00:27:26before you review something.
00:27:28Is that what you do?
00:27:30- Yeah, that's, well, I haven't like established
00:27:32like a workflow for it, but that's what I've tried.
00:27:35And it's an interesting experiment to see, you know,
00:27:37how different train models communicate with each other.
00:27:41- I was gonna say, I've always sort of thought,
00:27:43it's a bad UX though, to rely on the user
00:27:45to know what model is best at everything.
00:27:47And I don't know if this might be a hot take,
00:27:49but I think obviously we're very early days at the moment.
00:27:51So I think it's fine.
00:27:53But I do see why tools like cursor and that lot
00:27:56have the auto mode, because I think to sort of more,
00:28:00less individuals who are online all the time
00:28:02reading about new model updates and everything,
00:28:04like they don't want to be thinking about,
00:28:05oh, I need to use Opus to plan,
00:28:07or I need to go to Codex for the code
00:28:09'cause it's better at that.
00:28:10They just want one thing that does everything for them.
00:28:12So yeah, that was just sort of my random rant.
00:28:14I think it's that UX might go in the future, essentially.
00:28:18- Yeah, it might.
00:28:20And we have an auto mode as well for that sort of reason.
00:28:22People who don't want to pick anything.
00:28:25We do break down the auto model based on what you value.
00:28:28So we have cost efficient, responsive, and genius.
00:28:32So like cost effective is what you expect.
00:28:34It might take longer.
00:28:35Usually it routes to like, I think it's an earlier GPT model
00:28:39that's like safer on tokens,
00:28:41but may take longer to complete.
00:28:42Then it's responsive in the middle and genius,
00:28:44which I believe routes to either Opus or Codex 5.3.
00:28:47It always depends which one's actually going to be
00:28:49like better output.
00:28:50But yeah, I think that makes sense for people
00:28:53who don't want to think about the choice.
00:28:57I mean, all of us here are terminally online.
00:29:00So we're all interested in a model picker
00:29:04that can let you try these things out.
00:29:06It's another reason I really value
00:29:08like general purpose harnesses,
00:29:11which are popping up and becoming more important.
00:29:14Kirk's is one of them, Warp's one of them.
00:29:16Also tools like Copilot and Pi are examples as well.
00:29:20Open code, of course.
00:29:21Because it's useful to be able to try all these things.
00:29:26It's useful to experiment with like who's better
00:29:28at which task for our team.
00:29:30Because again, all these models have like different flavors,
00:29:33but they can still get things done.
00:29:35Like the way I describe it is,
00:29:36Codex is kind of like German engineering.
00:29:40Like it gets everything detail-oriented and exactly right.
00:29:45But as soon as I ask it for like function names
00:29:46and code comments, they're super mechanical
00:29:49and not how I would do things.
00:29:51It also takes longer 'cause it researches forever.
00:29:53And Opus is kind of like the, I don't know,
00:29:56the grad student at Georgia Tech that's up at 2 a.m.,
00:29:59but they're getting things done.
00:30:00They're moving fast, their code's super readable
00:30:03because they actually kind of talk more like a human.
00:30:06That's totally like, people have different opinions on it.
00:30:09But I do think because of how much that's resonated
00:30:12and how many times I've seen that sort of take,
00:30:14I think people are picking models
00:30:16'cause it's someone that they want to work with
00:30:18to anthropomorphize it rather than I'm picking this model
00:30:22because benchmark scores.
00:30:24I think we're kind of past that point.
00:30:25I don't think people are picking models
00:30:27because of benchmarks alone anymore.
00:30:30- You know, I definitely agree with that
00:30:31'cause we cover a lot of model releases on our channel
00:30:33and sort of at a certain point like six months ago,
00:30:36I just flashed the benchmarks up now and I move on
00:30:39because it's like, I don't think you care
00:30:40that it's 1% better than the last model.
00:30:43It's like, have they added any new features?
00:30:45Is it quicker maybe?
00:30:46'Cause I think that's still something people care about
00:30:48is sort of speeding these models up a bit.
00:30:50But yeah, I think benchmarks are a little less obvious,
00:30:54especially now that they clear
00:30:56so many of the easy benchmarks.
00:30:57It's only really difficult benchmarks that seem to matter now
00:31:01and it's sort of hard to explain the benefits
00:31:03of one model over another to people
00:31:05without them just using it for a week
00:31:06and then using another one.
00:31:07It's very hard to pick a favorite model.
00:31:09And as you said, you see so many opinions on Twitter
00:31:11of which one is best for what workflow.
00:31:15So I'm curious how sort of Warp chooses
00:31:17what is best in the auto mode.
00:31:19When a new model comes out,
00:31:20do you have your own sort of suite of benchmarks
00:31:22you run internally to decide
00:31:23if it should be upgraded and things?
00:31:25- Yeah, we do.
00:31:27We have a eval suite.
00:31:28We have a set of benchmarks
00:31:30that are more like industry standard ones
00:31:31like SWE bench pro just to validate.
00:31:35We also do have some auto routing in there.
00:31:37So if you ask a very simple question in your terminal,
00:31:39like can you handle this Git rebase for me,
00:31:44which actually might be kind of complicated.
00:31:46If it's a simpler one, like revert this commit,
00:31:48like I forgot the command.
00:31:48Like people use that in Warp all the time
00:31:50just 'cause you can ask in plain English,
00:31:52revert this commit and then it runs some commands.
00:31:55For that, we route to like a simpler model,
00:31:57either HiTu or Sonnet, I believe.
00:31:59And then if it has planning mode, for example,
00:32:02like if you requested a plan,
00:32:04that should probably go to a smarter model that reasons.
00:32:06If it's like a longer horizon coding task,
00:32:09it'll go to a reasoning model as well.
00:32:12But choosing which reasoning model,
00:32:14yeah, it's tough 'cause it was easy to answer that
00:32:17up until very recently when Codex and Opus
00:32:20became very comparable to each other.
00:32:23So at this point, I do think it's gonna be a combination of,
00:32:27yeah, the benchmarking, but also user feedback.
00:32:29'Cause if you switch out one for another,
00:32:31users will say like, this feels different.
00:32:34It doesn't talk to me the same way.
00:32:35What'd you do?
00:32:36What's in the sauce?
00:32:37This is different.
00:32:38So I think that kind of forces tools
00:32:41to like be a little more consistent.
00:32:43And I have wondered how tools like AMP navigate this
00:32:46'cause I've seen them switch between Gemini
00:32:49and Opus and Codex
00:32:53throughout different eras of their coding harness.
00:32:57And I'm curious how people feel about that
00:32:58'cause it does feel like you're talking
00:32:59to a different person when you make those kinds of switches.
00:33:02For us, we've just kind of kept it consistent
00:33:04because same benchmarks, but keeps the field consistent.
00:33:07Let's use Opus.
00:33:09I believe that's what we've been doing,
00:33:10but there will come a fork where we have to decide like,
00:33:13is the 5% benchmark improvement worth it
00:33:15to have a different voice?
00:33:17I don't know.
00:33:18- I was gonna ask if you can use open source models
00:33:20with warp or is that something you can do or not?
00:33:23- You can't use like your own models.
00:33:26We have bring your own key.
00:33:28If you wanna use like a Tropic, Gemini, OpenAI,
00:33:31Benrock, I believe.
00:33:32We do have GLM.
00:33:34That's the extent of it,
00:33:35but we haven't opened it up to like general
00:33:37or local model support.
00:33:39Not that it's not tracked.
00:33:40I know it's like the top voted feature
00:33:41to have like local model support.
00:33:43So it is definitely on the roadmap.
00:33:46We have a quality team that maintains
00:33:49both the benchmarks I was talking about
00:33:50and also new model releases.
00:33:52So yeah, it is heard that we wanna get that in there.
00:33:56- Do you have an integration with OpenRouter?
00:33:58- We do not.
00:34:00What would that look like?
00:34:01What would you wanna see there?
00:34:03- 'Cause like with OpenRouter,
00:34:06you can choose whatever model you like
00:34:08and the library is just huge.
00:34:12So that would be, I don't know.
00:34:13If you could get that feature in,
00:34:16that would be amazing.
00:34:18- I suppose a problem becomes models are so different
00:34:21at making tool calls.
00:34:22And if they actually work with tool calls
00:34:24that you'd have to nearly verify every single model,
00:34:26which is obviously an impossible task.
00:34:28'Cause I know you will see,
00:34:29people say Gemini is pretty bad
00:34:31at following tool calling rules and things.
00:34:34- You could whitelist some of the models, you know,
00:34:36don't have to use all of them.
00:34:38- Yeah, whitelist, but allow you to bring that key
00:34:41just so you have that flexibility.
00:34:43That totally makes sense.
00:34:44Yeah.
00:34:46- But on what you were saying, yeah.
00:34:48I was just gonna say like codecs, for example,
00:34:51took a while to actually get in there.
00:34:53We were a full like three weeks late, which is an eternity.
00:34:57People are just shouting at the door, where's codecs?
00:34:59And it's because codecs is really specific
00:35:02about the tool calls that it wants.
00:35:04I think 5.3 improved that a bit,
00:35:05but we just plopped codecs in our harness
00:35:08and it did not perform well.
00:35:10It just felt like it wasn't updating us.
00:35:12It searched the web for like five minutes
00:35:15when it definitely shouldn't.
00:35:16So we had to figure out how do you tune this harness
00:35:18to actually get the performance
00:35:19that the codecs team is getting out of their CLI.
00:35:21So we did it, we put in the work
00:35:22and we made it like work in our harness,
00:35:24but it wasn't as simple as we just added the model
00:35:27to the list.
00:35:28That is sometimes is that simple,
00:35:30but for certain flavors like codecs or Gemini, it's not.
00:35:35- When you mentioned about warp, that's why I've left warp,
00:35:38I think everyone's kind of answered
00:35:40or alluded to that slightly.
00:35:41But there was a time when I used open code in warp
00:35:45and I was a fan of open codes.
00:35:47And Kimi came out.
00:35:48I was like, oh, Kimi looks like a really cool model.
00:35:51And I tried it in open code and I was thinking,
00:35:53well, why am I using warp to open open code?
00:35:56Because I was using warp with a subscription
00:35:59and I was using codecs, not codecs,
00:36:01Sonnet and the Claude models with warp.
00:36:04But then when I started to use other models,
00:36:06so like Kimi and like Quen, GLM,
00:36:10they didn't support warp.
00:36:11And so I thought, well, I might as well use
00:36:13a regular terminal if I'm going to be using those models,
00:36:16'cause it's easier for me just to use that
00:36:18than to use open code inside warp.
00:36:22So yeah, I don't know if that makes any sense.
00:36:24- Yeah, that makes sense.
00:36:25Like you didn't want to reach for the warp harness anymore.
00:36:27You wanted to use open code.
00:36:28So you had that access and you could use whatever,
00:36:32like Kimi 2.5, I know that one's like a really cool model
00:36:35that they jumped on.
00:36:36And yeah, you were mentioning like pricing,
00:36:40if you're paying for both, that makes a ton of sense.
00:36:42And we hear that feedback.
00:36:44I do know, like, I mean, you can just, you know,
00:36:48use the warp free version, like the app just kind of works.
00:36:50So if you want to keep using it to run open code,
00:36:53you can just do that.
00:36:54And you still get like the voice mode
00:36:55and the file diff and all that stuff.
00:36:57But if none of it's useful, like you try it on,
00:36:59you're like, I don't reach for this,
00:37:01then it makes total sense.
00:37:03So yeah, I get that.
00:37:05- I think one of the biggest complaints
00:37:07that I've heard you guys get is,
00:37:09why have a terminal with a login page?
00:37:11So I think that's the biggest one I hear.
00:37:13- You don't need to log in.
00:37:15You can use warp without a login,
00:37:17but it's that first impression that sticks, man.
00:37:20Like people are like, when are you gonna get Windows support?
00:37:22And like two years ago, we've had Windows support,
00:37:27but it burst on the scene is like the Mac terminal
00:37:30you log into, so I don't know.
00:37:31But I get it, yeah.
00:37:34Like why log in if you're not gonna use like the warp agent
00:37:37and all that stuff?
00:37:38We do have other things like file storage,
00:37:42if you want to like store commands, store planning documents,
00:37:46I do that just so I don't have to commit the code.
00:37:48You can like put a planning document
00:37:50in what's called the warp drive to save it.
00:37:53These are small things.
00:37:54It really depends on what you want to do.
00:37:57And maybe warp with no login, maybe, I don't know.
00:38:00- I suppose it's sort of to get people to stop thinking
00:38:03that warp is just trying to compete with like ghosty.
00:38:06It's got a load of other things in it.
00:38:08And yeah, it's sort of a new environment
00:38:10for the agentic world of development.
00:38:13But yeah, I see that people probably got stuck on that opinion
00:38:15when warp first came out
00:38:16when it was just sort of AI in the terminal,
00:38:18but it's trying to compete with those.
00:38:20So I guess it's the messaging around then.
00:38:21And yeah, unfortunately, as you said,
00:38:23first impressions do stick, so sorry about that.
00:38:25Hopefully we can change some minds here.
00:38:27- Yeah, and we're honest about like,
00:38:33we don't really compete with ghosty
00:38:35'cause we don't look at them as the same kind of tool.
00:38:37Like ghosty is the very lightweight terminal
00:38:40where you're going to stitch on all of your plugins.
00:38:43You're going to build your own little universe
00:38:44inside of there, use two ways, use things like that.
00:38:47If that's what you value, then you should use ghosty 100%.
00:38:50Like there's no reason.
00:38:51Warp is like, I don't want to stitch those tools together.
00:38:54I kind of like if you could just bring me like a diff view
00:38:56and a file explorer and a voice input button.
00:38:59It's just kind of there and I don't have to configure it.
00:39:02And as long as I just use the free tier,
00:39:04I can use open code if I want to as well.
00:39:06Like if that's your mentality, it's like,
00:39:08I just want the GUI.
00:39:09Like I don't want to stitch together
00:39:10a bunch of TUI applications,
00:39:12then Warp is that option for you.
00:39:15I look at it kind of like NeoVim versus VS Code.
00:39:18It's not as dramatic as that, but it's the same ethos
00:39:22I feel like of where people gravitate
00:39:24and what they end up using.
00:39:26- Are you able to go into what OZ is?
00:39:28Obviously I think that's quite a new release for Warp.
00:39:30And I'm curious sort of what that's doing
00:39:32for cloud agents now.
00:39:34- Yeah, it's interesting.
00:39:36So OZ came out like a couple of weeks back
00:39:39and it is the platform for running agents in the cloud.
00:39:43So we of course have been using agents
00:39:45to build Warp for a while.
00:39:48And some things we ran into were,
00:39:51it's really nice to use agents to author to code locally.
00:39:55But as soon as you get to like repetitive tasks
00:39:58or last mile tasks like code review,
00:40:01the agent doesn't follow you to those places.
00:40:03It stays on your machine and that feels a bit limiting.
00:40:07We're also hitting some parallelization issues
00:40:10of I could actually work on multiple backlog tickets
00:40:14or user feedback requests at the same time.
00:40:17And I don't really need to kick these off on my machine
00:40:20and monitor them.
00:40:21Agents have gotten good enough
00:40:22that I could ramp out a small feature request
00:40:24or a bug report and feel pretty confident at the end
00:40:27just looking at the code diff.
00:40:29So in those cases, like spinning stuff up locally
00:40:32doesn't make a lot of sense.
00:40:33We want this place where you could just run an agent
00:40:37and trigger it from anywhere.
00:40:39So I mentioned like opening up a pull request.
00:40:43Like there should be a way to just trigger an agent
00:40:44from a GitHub action, have it review the code just in time.
00:40:48If you're dealing with user feedback in Slack or linear,
00:40:51there should be a way to just tag an agent
00:40:53and trigger it that way.
00:40:54And then you review the pull request
00:40:56that gets linked on the other side.
00:40:58And then just general purpose stuff.
00:41:01Like we built our own sort of issue triage bot internally
00:41:06that can just go through all the warp issues on GitHub.
00:41:10And we wanted to build something that was like a two-way.
00:41:13So like a full application to go through all of our issues
00:41:15and just trigger agents from there.
00:41:17So we're like building our own mini warp
00:41:19that's focused on GitHub.
00:41:21And for that, you need like an SDK or a REST API.
00:41:24So you're like building an app that triggers an agent
00:41:27to do whatever it needs to do inside of that app
00:41:30and then just get updates and display them to the user.
00:41:33So that meant like having this whole surface
00:41:34of like REST API, SDK, Slack and linear triggers,
00:41:39GitHub action triggers, all that stuff.
00:41:41And the core of it being you have a sandbox
00:41:44for the agent runs, which is called environment.
00:41:47So all of that is rolled into what we're calling Oz,
00:41:50which has all of those things that I mentioned.
00:41:54It helps you set up environments to run agents
00:41:56not on your machine.
00:41:57So if I were to be on my phone, the dream scenario
00:42:01of like I get a message from someone,
00:42:03I want to implement this feature
00:42:04and I don't want to go to my computer.
00:42:06I should be able to just kick off an agent
00:42:09inside of that environment I've set up, have it do its work.
00:42:12And then I get a link to the pull request to look at.
00:42:16And then have another coding agent review that code
00:42:18in the pull request so I can like ship changes to that.
00:42:21Cause why not?
00:42:21So that's kind of what we built towards.
00:42:25So, and the reason we called it Oz,
00:42:27instead of like warp for the cloud,
00:42:30is first like make it its own thing.
00:42:33First impressions are sticky, warp is the terminal.
00:42:35So we need to have a different name for this concept
00:42:38because it is very unique.
00:42:39And also we want to make it really accessible
00:42:42even if you're not using the warp terminal.
00:42:44Like we have some niceties to tap into everything going on
00:42:49in these cloud runners from the warp terminal,
00:42:51but it's just a CLI.
00:42:52Like getting ghosty, I could say like Oz,
00:42:56tell me all of my scheduled jobs
00:42:58and it can run some CLI commands with the coding agent,
00:43:01go look at this cloud environment and then give me an answer.
00:43:04We also have a web UI.
00:43:05So you can go to like oz.warp.dev
00:43:07and you can look at all of your agents that are running again
00:43:10without like opening a terminal at all.
00:43:12So because it was this different thing
00:43:16that didn't really need like the warp terminal to work,
00:43:20we just kind of made it its own entity
00:43:22that the warp team uses internally to build everything.
00:43:25So it's very authentic to us.
00:43:26Like we built this because we had very real needs internally.
00:43:30And we also wanted to make it accessible enough
00:43:33that no matter what you're doing,
00:43:34if you're using like open code inside of anywhere
00:43:38or using the pie harness,
00:43:40you can still tap into this platform
00:43:42and trigger cloud agents, introspect what's going on
00:43:45and all of that stuff.
00:43:48So that's kind of like a high level.
00:43:50I don't know if there's anything that I could like tap into
00:43:52or explain a little bit more clearly.
00:43:54- I was sort of curious when you say
00:43:57I trigger off the agent in the cloud to do something,
00:43:59how can I still use tools like open code
00:44:02and cloud code in that agent?
00:44:04Does it just sort of trigger it
00:44:05on the sandbox environment out there?
00:44:08- Yeah, it's a good question.
00:44:09So the way environments work is very flexible
00:44:14because it's just set up as like a Docker file
00:44:18and whatever code you want to clone inside of there.
00:44:21So I've noticed some other tools like the Codex app
00:44:23and Cursor, may still be true for Cursor,
00:44:25need to check in,
00:44:27is like it taps to a single coding repository
00:44:30and spins up the environment around that.
00:44:32So it's very like one click, like boom,
00:44:34you have this GitHub repository in the cloud now.
00:44:36But it means if you want an agent to work on a task
00:44:39that touches a bunch of repositories,
00:44:41you can't really do that.
00:44:42Like you would need to trigger a different agent
00:44:44in each repository to do each part of the work.
00:44:48So we made it a lot different.
00:44:49Like we have environments internally
00:44:51that have like four or five repositories cloned inside of it,
00:44:54like the database schema and the server
00:44:56and the client and the docs.
00:44:58Like all of those are in the same environment.
00:45:00So if I ask an agent, I need to make this schema change.
00:45:03It could make PRs across all those environments all at once.
00:45:06And it just runs an agent generically across this code base
00:45:10in order to accomplish it.
00:45:11The only thing we're helping you with is triggering it
00:45:13to start working inside of the sandbox
00:45:16and also to get artifacts back out.
00:45:18So if it made a pull request,
00:45:20we can actually detect that
00:45:22and show you a link to the pull request
00:45:23instead of you hunting through the agent logs
00:45:26to figure that out.
00:45:27Now, the question you were saying about,
00:45:29can I run open code with this?
00:45:31The answer is yes, actually,
00:45:33but we want to make it a lot cleaner.
00:45:35So right now, like the turnkey,
00:45:37if you read the docs and you do it,
00:45:39it's going to be using Warps agent.
00:45:41So you're going to like set up some cloud credits.
00:45:43You're going to use the Warp agent.
00:45:44You can pick whatever model you want
00:45:45and all of your permission models.
00:45:48So you could flip it to any of the models
00:45:50that you would expect to support.
00:45:51Like if you want to use Opus versus codecs, you can do that.
00:45:55But if you want to switch out the harness,
00:45:56that involves like cloning the CLI
00:46:01into the Docker environment.
00:46:02So you like add a little installation instruction
00:46:04of install cloud code here or install open code.
00:46:08And then when the environment spun up,
00:46:09now open code's available.
00:46:11And at least today, you could tell the agent,
00:46:13delegate everything to open code
00:46:15and then all the compute runs through that instead.
00:46:17So you can do that.
00:46:19We would like to make it a little bit cleaner though.
00:46:21Or if you just say like my preference is to use open code
00:46:24and I don't even want delegation.
00:46:25Like that's something that we could explore.
00:46:27We've also played with like being able to delegate
00:46:30to multiple harnesses.
00:46:31So like I want the docs to be updated by cloud code
00:46:35and I want the code to be updated by the codecs harness.
00:46:37Like you can actually do that.
00:46:39And then it spins them both up in parallel,
00:46:40watches the results and then reports the artifact back out.
00:46:43So you can kind of stitch these things together
00:46:45in really creative ways.
00:46:47It's not tied to specifically using like warp
00:46:50for the full agentic flow end to end.
00:46:53And we do have like open source resources
00:46:56for this right now.
00:46:57But we are trying to make that like even cleaner.
00:46:59So this could be like a general purpose sandbox.
00:47:02And really we're just giving you the tools
00:47:03to set up those environments and inspect
00:47:06and manage everything the agents are doing when they're done.
00:47:09- Oh yeah, that sounds good.
00:47:10It's answers to my question 'cause I think I'm still
00:47:12in the stage at the moment, as I said,
00:47:14where I bounced between a lot of the tools,
00:47:16testing them out.
00:47:16I mean, mainly 'cause uniquely we make a lot
00:47:19of YouTube videos on these types of things.
00:47:20So I'm installing sort of the newest latest thing
00:47:23that everyone's talking about every week.
00:47:25And I was curious if that would fit into the workflow.
00:47:27- Yeah, totally.
00:47:28And I'll send some stuff along to you
00:47:30because I don't think we've had anyone testing
00:47:32those sorts of flows yet outside of our internal team.
00:47:35So I wanna see that get used.
00:47:38'Cause yeah, there's so many harnesses right now.
00:47:39Like Pi has come on my radar.
00:47:41It's like everyone's talking about this.
00:47:43I'm surprised how quickly that caught on Steam.
00:47:47But isn't that like a general purpose harness as well?
00:47:49Like you could bring whatever model you want to it
00:47:51and it just lets you extend it.
00:47:53- I think so.
00:47:54Didn't OpenClaw use a bit of Pi or something?
00:47:57So obviously OpenClaw got trending massively.
00:47:59And then, yeah, I need to check out Pi as well.
00:48:03I've been meaning to for a while.
00:48:04- Yeah, I know it's the harness that's like the default
00:48:08if you use OpenClaw now.
00:48:09And it's the one that Peter,
00:48:11the creative OpenClaw uses to work on code.
00:48:14But I know it's a codex user, so that has me guessing
00:48:17it's very general purpose and pluggable.
00:48:21But we are entering this age of like,
00:48:23I don't know, the harness doesn't matter as much
00:48:26as it used to.
00:48:27You can kind of use whatever harness you want
00:48:29to talk to these things.
00:48:30Like the difference between asking warp a question
00:48:33through codex and asking open code a question through codex
00:48:37is really small difference.
00:48:39Like we could argue we have a slightly better system prompt
00:48:42and that's why it gets 1% better on benchmarks.
00:48:45But I don't know.
00:48:47I feel like the model matters a lot more than the harness.
00:48:51It's really just come down to like pricing models, honestly,
00:48:55of like why you pick one harness over another.
00:48:57And some of it's the extensibility.
00:48:59I've heard people saying I like Pi
00:49:01'cause I can like change the interface that I'm looking at
00:49:03and like add a diff view here and whatever.
00:49:05So it became this like Lego set for people to build around,
00:49:08which is very cool.
00:49:10- I'm curious if you ever tried OpenClaw
00:49:12when that was trending.
00:49:14- I still haven't set up an OpenClaw box.
00:49:18I still haven't set one up.
00:49:19Oz is kind of like,
00:49:23'cause we were actually playing with internally,
00:49:24could you put OpenClaw in one of these Oz sandboxes
00:49:28and have it run?
00:49:28And the answer is like, yes,
00:49:30but we set these things up to be serverless, quote unquote.
00:49:34Like once it's done with the task,
00:49:35even if it takes hours, it spins down.
00:49:37OpenClaw is supposed to be like this always listening
00:49:40assistant.
00:49:41So it's like, yeah, you could have OpenClaw run
00:49:44for a period of time.
00:49:46But the idea of like having a server
00:49:47I can talk to all the time, I mean, it's cool.
00:49:50Do you use it?
00:49:51And what do you use it for?
00:49:52- I tried it when it was trending on just seeing
00:49:54if I could make it do some of my workflows
00:49:57sort of on its own, like researching on Twitter,
00:49:59scrolling through Twitter and tweets
00:50:02and seeing opinions on things.
00:50:03'Cause yeah, Twitter search API is not great.
00:50:06So I wanted something to replace that
00:50:07and just sort of researching a load of things on the webs,
00:50:10running arbitrary scripts that I'd ask it for.
00:50:13But yeah, I used it when it was a massive security nightmare.
00:50:16So I quickly spun it down because I was like,
00:50:18I don't want to deal with any of this anymore.
00:50:20And I have seen on Twitter, obviously a lot of people say,
00:50:24you get advertised that OpenClaw is like this magic solution,
00:50:27but the people who have really good workflows with OpenClaw
00:50:30have done a lot behind the scenes
00:50:31to actually get those workflows working well for them,
00:50:34whether that's prompting, setting up scripts,
00:50:36automations, everything.
00:50:37So yeah, I like it, but I don't think it was the initial
00:50:40like one click install and then it's this magic box
00:50:43that can do everything.
00:50:45It takes a lot more work and sort of overlooking
00:50:47and handholding before you get it perfect.
00:50:50- It's good to know.
00:50:51I mean, that checks out with how all these agenda tools
00:50:54have really worked for me.
00:50:57Like even just working outside projects,
00:50:58I still have to really think hard
00:51:00about how I'm instructing codecs and reviewing its code.
00:51:02It's not as simple as I ask
00:51:04and the app looks exactly how I want it to look.
00:51:07I do need to try it though, OpenClaw specifically.
00:51:12'Cause we do have some people internally that use it a lot.
00:51:15We have more and more Slack bots that are set up
00:51:17for like competitive analysis and all these things.
00:51:22For jobs like that where it's like do a weekly research
00:51:24report on X, not X.com, but could be.
00:51:28It's really nice to use one of these like cloud agent tools
00:51:33like Oz where it just kind of spins up once a week.
00:51:36It computes for like 10 minutes,
00:51:37does a bunch of web searches and then spins down
00:51:39and then gives you a report.
00:51:41For stuff like that, it's really nice.
00:51:44So I think for like proactive agents
00:51:46that like do things for you,
00:51:48I think these sandbox tools are gonna be a lot more relevant.
00:51:51The ones that are always interactive
00:51:55is gonna be a different story.
00:51:57Like the fact that OpenClaw, I can text it at any moment
00:51:59and it would start doing work is kind of like this next step
00:52:03that I haven't even taken.
00:52:04I don't think a lot of developers have taken either.
00:52:06But I see it coming, whatever it's gonna end up being.
00:52:11Unless we figure out the security.
00:52:13- My main selling points of OpenClaw was the fact
00:52:15that you can sort of run it on your own server
00:52:17and have whatever you wanted on there
00:52:18because I just put in like a Ubuntu install.
00:52:21But yeah, I'm not at the point
00:52:22I'd put it on my personal computer yet.
00:52:23That is a security nightmare for now.
00:52:27- Yeah, I don't think it was ever meant to.
00:52:29People were buying Mac Minis for that one, which was crazy.
00:52:32- Considering it.
00:52:32- I was gonna say, back to your point about harnesses
00:52:37and that they didn't make a difference.
00:52:40I think a while back Anthropic made it so
00:52:42that you had to use Claude's code
00:52:44to use your Pro or Mac subscription.
00:52:46And I think Warp has always offered Sonnet, Opus
00:52:50and everything else that Claude,
00:52:52the Claude models that Anthropic have.
00:52:54How have you been able to like make money from those models
00:52:57even though they're so expensive to run?
00:53:00- Yeah, it's tough with like the what's going on
00:53:03with price subsidization and all of that
00:53:05because it's become very apparent
00:53:10that all these model providers are losing a lot of money
00:53:14just with the amount of compute
00:53:15that they're pouring over developers.
00:53:16It's usually like the power users
00:53:19that create the biggest offset.
00:53:22Like I heard it was like power users using 2000 a month
00:53:25and inference and paying 200 and that's why
00:53:27they were sort of stripping back who can access those plans.
00:53:31I think that makes a ton of sense.
00:53:32And I think people are aware like Warp changed
00:53:36our own pricing model from a deeply subsidized one
00:53:39to one that's sustainable.
00:53:41So at least for us, like we've priced it where
00:53:43there's a bit of advantage to like paying a 20 a month
00:53:47instead of just paying for API keys.
00:53:49But it's not at the level of we lose 10 X on our users.
00:53:53It's at a level of we can break even,
00:53:56maybe make like a little profit.
00:53:58We're not trying to go crazy.
00:53:59It's a balancing of the scales.
00:54:01But we have just tried to put it in a place
00:54:05where we're not going after something
00:54:08that will just run out in like a year or so
00:54:12while other companies have positioned themselves
00:54:14where they don't do that.
00:54:16So it's really just a difference of what's practical
00:54:19and what you can do.
00:54:20And when you're in that position,
00:54:22like that's why we're really open to people using Clog Code
00:54:26and Open Code and other things inside of Warp
00:54:28and us making that a really nice experience
00:54:30and focusing more on how can we add features
00:54:33that are valuable if you're doing that,
00:54:34like a diff view or a file explorer.
00:54:37How can we help you put those things in the cloud
00:54:39with Cloud Runners?
00:54:40So we really just become like a helper
00:54:42for environment setup, compute, management, artifacts,
00:54:44all that stuff.
00:54:45And if you want to use Warp's built-in agent,
00:54:48like as I mentioned, the pricing is like
00:54:50a fair, sustainable thing,
00:54:52but it gives you like multi-model support
00:54:54and just being able to ask a question in your terminal
00:54:56without opening a CLI is convenient.
00:54:59So people use it for like certain classes of tasks,
00:55:02even if they don't use it for everything.
00:55:05So we had that balance going
00:55:06that I feel like makes a lot of sense.
00:55:08It's just leaning into what people are using and listening
00:55:11and just making it a little bit nicer.
00:55:13- How does the billing of Oswald,
00:55:15is it based on how long a task takes in the environment?
00:55:18- Yeah, I need to make sure on that.
00:55:20I know that we're doing it with like cloud credits
00:55:22because the blessed path is using Warp's harness
00:55:26to do everything.
00:55:27The like using other coding agents
00:55:30is like a chapter we're exploring
00:55:32and is possible using a Docker file.
00:55:35So I feel like we're working out what that model would be
00:55:38if that's kind of like the way people start using it.
00:55:42I want to say it's based on compute,
00:55:44but I'll make sure on whether it's like timer,
00:55:47compute based.
00:55:49It's whatever makes the most sense.
00:55:52- What kind of things do you have for Warp
00:55:55coming in the future that you can talk about?
00:55:57- Oz is kind of the next chapter that we're playing with,
00:56:01making it nicer to use.
00:56:02I do think multi harness is kind of the future.
00:56:08I think orchestration is the future.
00:56:10We put out a little teaser from Zach's account or CEO,
00:56:14if you go find it on Twitter,
00:56:16where we're playing with a slash orchestrate command.
00:56:19And it's pretty neat.
00:56:20Like if you do just this magical, it's not really a skill,
00:56:24it's kind of like a prompt.
00:56:26It will ask the agent to work on a task,
00:56:29figure out a delegation plan.
00:56:30So not just like an implementation plan,
00:56:32but actually here's how I would divide it up.
00:56:35Here's who could work on each thing.
00:56:37So it's almost becoming a product manager
00:56:39where you give it a feature it needs to build.
00:56:42It figures out here are the sub-agents I'm going to kick off.
00:56:45And then it creates all those sub-agents
00:56:48and through message passing,
00:56:49it's able to talk to those sub-agents while they work.
00:56:52Kind of like coworkers talking to their manager on Slack
00:56:55as they're doing stuff or even talking to each other.
00:56:57So it's a very early experiment.
00:57:00We want to evaluate like if agents have better success
00:57:05by delegating to sub-agents versus just doing it all itself.
00:57:09Like I think that's a really big divide right now
00:57:12of agent swarms, agent teams, all this stuff.
00:57:14Is that beneficial versus like the quality of output?
00:57:18But this is the first example I've seen in a while
00:57:21that's really like here's a turnkey solution
00:57:24to just have an agent delegate everything.
00:57:27And also here's like standard,
00:57:30well, not really like an open center or anything yet,
00:57:32but like a standard for message passing
00:57:35where like while a sub-agent's working on something,
00:57:37it can actually message out this thing went wrong
00:57:40and I don't understand this.
00:57:41And the main agent could actually pick that up
00:57:43and say, oh, that makes sense.
00:57:44Let me research.
00:57:45Here's an answer.
00:57:46And it can actually like sort of ping back and forth.
00:57:50It can also like when an agent's done,
00:57:53it can tell the main agent I finished my task.
00:57:56Then the main agent can tell all the other agents,
00:57:58hey, this guy's done with this part of the task.
00:58:00If you want to merge that in or whatever, you can do that.
00:58:03So it's this really weird world
00:58:05of now you don't have a human
00:58:06that's managing agent communication.
00:58:09Now agents are managing their own communication strategies.
00:58:13I don't know how I feel about that.
00:58:14Like how much of us are we going to replace at this point?
00:58:17But if it leads to higher quality
00:58:20by dividing things up that way,
00:58:22which in the workplace it does.
00:58:24So I feel like that could scale down to agents.
00:58:26That's kind of the next chapter
00:58:27that we and a bunch of other people are looking at
00:58:30is how do you orchestrate this stuff?
00:58:32- When you said using multi harness,
00:58:34is that like using various harnesses for one task
00:58:39or using different harnesses entirely?
00:58:43- Oh yeah.
00:58:44So the way that we've set up right now is like,
00:58:47and it doesn't have to work this way,
00:58:50but like each agent is working independently on a task
00:58:54and they could be a separate model,
00:58:55but they're all using the warp harness in our testing
00:58:57that we've done so far.
00:58:59There's no reason it has to be though.
00:59:01Like we've played with, especially with like OZ cloud renders
00:59:04like delegate this to the cloud code harness,
00:59:06delegate this to the codex harness
00:59:08and then give it the same, we could in theory,
00:59:10give it the same like message passing
00:59:12to sort of message back from those harnesses instead.
00:59:16But as I kind of mentioned earlier,
00:59:18I don't expect it to be this huge difference in quality
00:59:21'cause the model is like 90% of the difference.
00:59:24And the harness is kind of like the last 10%
00:59:26that you can experiment with and get some gains.
00:59:28So I feel like that'll be the step two of like,
00:59:30oh, if we mix up harnesses as well as models,
00:59:34does that lead to interesting results?
00:59:37So all of this is like experimental phase.
00:59:40Like can we evaluate this and actually benchmark it
00:59:42and figure out what the best deployment strategy is?
00:59:46- Yeah, I was gonna say,
00:59:47I think it'd be interesting to find out your results
00:59:49from doing orchestration or doing multiple sub-agents
00:59:52because I read an article somewhere from Quid Mission
00:59:54last year who said they don't recommend having multi-agents
00:59:59purely because the sub-agents don't have the same context
01:00:02as the main agent.
01:00:03And so it won't produce the results in the same way.
01:00:06And so trying to kind of merge all those results
01:00:09in different agents together might not work as well
01:00:11as if you have the same agent doing all the tasks,
01:00:13but it would be good to know what you guys find out.
01:00:15- Yeah, and I feel like that came out
01:00:17in the previous sub-agent architecture,
01:00:20which was very hands-off.
01:00:23It's kind of like if a product manager
01:00:24set up the linear board,
01:00:26everyone worked on their ticket
01:00:27and never talked to each other ever again
01:00:29until all the PRs emerged.
01:00:31Like, yeah, I would think in a workplace
01:00:33where no one talks to each other,
01:00:35you wouldn't get great results.
01:00:37But because we've added this message-passing ability
01:00:41where people could talk to each other,
01:00:43we've also made sure there's a plan
01:00:45so every agent can see this delegation document
01:00:48that live updates.
01:00:49The main agent can update this document
01:00:51and everyone can go read it.
01:00:53Now we're adding more context passing
01:00:55where they can pass necessary context to each other
01:00:58and also agree on what everyone is working on,
01:01:01which I think is a much different test
01:01:03than the old sub-agent model of delegate and come back
01:01:07with no channel in between.
01:01:09- I was gonna switch the topics a bit
01:01:11and sort of ask how you're just personally coding
01:01:14for side projects.
01:01:15I've seen you've been working on a markdown editor recently,
01:01:17and I wonder sort of what tools do you use?
01:01:20Has it been mostly AI writing that code
01:01:22or sort of you hand-holding it?
01:01:24Or are you going sort of,
01:01:25I know some developers got a manual approach
01:01:27and sort of don't wanna use AI,
01:01:28so they can still do sort of have a bit of fun coding?
01:01:31- So you might be surprised.
01:01:33I use Warp to work on this.
01:01:35I know, I know.
01:01:37So, no, I definitely thought about like,
01:01:41should this project be an escape
01:01:43where this is my safe place to just write code manually
01:01:46and just kind of go that way.
01:01:47But quickly I remembered like the reason I didn't work on this
01:01:50is because the very nitty gritty details
01:01:53of working on text editing and the ProseMirror library,
01:01:56which is notoriously so hard to use.
01:01:59It's really hard.
01:02:01Even though I've used it for a long time,
01:02:03it's still very difficult.
01:02:05I thought, yeah, coding agent will actually push me through
01:02:07and let me focus with like a more balanced approach
01:02:10on design and development.
01:02:12'Cause normally working on side products,
01:02:14it was like 10% in the design space, which I love.
01:02:16I still like opening Figma and designing everything myself.
01:02:20And then 90% was trying to figure out the implementation.
01:02:23Now it's more 50/50 or even less,
01:02:25which is a much nicer balance for me.
01:02:28So I designed a lot of things in Figma still.
01:02:31I know some people just code it all out,
01:02:33never use Figma again.
01:02:34I still like to be able to like draw out the gradients
01:02:37and the shadows and make it feel the way that I want to.
01:02:41But from there, yeah,
01:02:42I delegate everything to Codex right now.
01:02:44I mentioned earlier,
01:02:46like Codex is the more reasoned developer
01:02:49and Opus is the more jump to a solution developer.
01:02:52That's more me.
01:02:54So I want someone that can balance out my crazy.
01:02:56And I feel like Codex is that model of like,
01:02:58this one will actually care about
01:02:59how this thing's architected more than me.
01:03:02So I should probably balance with that.
01:03:05So a lot of it's just Codex tasks.
01:03:06I have like, I don't even use work trees.
01:03:08I have two clones of the repo and I just hop between them.
01:03:11And I can't really do more than two at the moment
01:03:14just because the project's so early
01:03:16that I can't delegate huge swaths of work
01:03:18without agents stepping on each other.
01:03:20So I just have like two agents that work on stuff.
01:03:23And I switched between the dev servers to look at the output.
01:03:26I have used cloud agents a bit just to like
01:03:30do a research task and then pull it down locally.
01:03:33So in Oz, if you like spin up a cloud agent,
01:03:35it'll like clone that, get a repository, do some work.
01:03:38Then there's a fork locally button
01:03:40where I can pull it all down and then resume.
01:03:42So I started doing that a bit for like,
01:03:44I want to research how like the popover API works.
01:03:48So I can create like a nice hover
01:03:51whenever you're over a hyperlink.
01:03:53So I just kicked off a cloud agent, go research that,
01:03:55get the libraries, get an initial implementation.
01:03:57I'll pull it down and get it done.
01:03:59That way I don't have to make another clone
01:04:00or another work tree.
01:04:03It's really what I wouldn't make of it,
01:04:04but that's really been my workflow.
01:04:07And I do dive a lot into the code review,
01:04:10especially because in text editing,
01:04:12codecs isn't, it's not a solved problem.
01:04:15Like codecs still kind of struggles
01:04:16and messes things up in ProseMirror.
01:04:18And I have to ask it questions about like,
01:04:21why did you make this choice?
01:04:22What is the limitation we're working around here?
01:04:25Because this is another prompting tip.
01:04:29Don't tell it, why are you so dumb?
01:04:31I know best.
01:04:32Don't just ask it, why did you make that choice?
01:04:34And then it will tell you,
01:04:35oh, it's because of this library that I read about,
01:04:38or there's this edge case elsewhere in the code base
01:04:40that I had to work around.
01:04:41Like you have to pull information out of a model
01:04:43to actually figure out when it's doing wrong.
01:04:46That's the strategy I use of like staying in the loop,
01:04:48ask it questions if things look weird,
01:04:51otherwise ask it to like review its code
01:04:53and merge it automatically.
01:04:55And I just push everything to the main line when it's done,
01:04:57because it's just purely local for me right now.
01:05:01Though it's not really like a team level strategy.
01:05:05It's more of like a solo strategy.
01:05:07Push things to main, clone the repo a couple of times,
01:05:10use one coding model, maybe use a cloud sandbox
01:05:13if you wanna do something off of your machine,
01:05:15but you only need to, and that's mostly it.
01:05:19So I'm also live streaming this process by the way,
01:05:23on Twitch Tuesday mornings,
01:05:25just to be a little bit in public
01:05:27while I'm working on this stuff.
01:05:29- I'm gonna ask you a question that I ask
01:05:31most guests when I come, but do you have any hot takes?
01:05:34- I feel like I've already said some.
01:05:38I don't know.
01:05:42I do think work trees are a bit overrated,
01:05:44but I think it's because I haven't tried them enough
01:05:46or I had a bad experience with them
01:05:48and I need to try it I guess.
01:05:49Kind of cold take is review your code.
01:05:54These agents aren't good enough to just like merge right away
01:05:57and you will feel the pain later.
01:06:00- What about like code rabbit and reptile?
01:06:03Do you not trust those to review code?
01:06:06- 'Cause it's a funny relationship
01:06:09of like why do we also need a code review on the backend
01:06:13if the coding agent already wrote the code?
01:06:15And I mean, it's the same thing.
01:06:21I hate comparing agents to humans so much,
01:06:25but it is trained on how we work.
01:06:27So it is gonna do some similar stuff.
01:06:30And for me, I only catch issues when I've actually,
01:06:34when I'm about to hit the button
01:06:36of ask this person for review.
01:06:38Like before that point, my code, I'm like, this is fine.
01:06:41Then as soon as I'm about to hit the button
01:06:42of I need to request a review from this senior engineer,
01:06:45I'm like, maybe I should look at it again.
01:06:48Maybe just like once.
01:06:49And then I catch things all the time.
01:06:51'Cause I feel like these agents are modeled
01:06:54to like do the bare minimum to accomplish the task
01:06:57with just enough quality to say mission accomplished.
01:07:01Like that's how they're kind of trained to do things.
01:07:03But that doesn't include,
01:07:05did I simplify all the stacks surrounding it?
01:07:07Did I look for potential for abstractions
01:07:09that are outside of all the code that I researched?
01:07:12Like it doesn't do all of that stuff because if it did,
01:07:16then the iteration loop would be a lot longer.
01:07:18And so I'm sure some of the way these models are trained is,
01:07:22it's going to be a lot more obnoxious to use
01:07:24if we bake in all of this self-reflection
01:07:27and self-correction into it.
01:07:30And so they don't.
01:07:31And we need to have tools like code review on the backend
01:07:34or skills to simplify your code
01:07:36in order to compensate for that.
01:07:38So it feels like, yeah,
01:07:40we need code review on the other side
01:07:41in order to ensure that.
01:07:43Now, what does that mean for,
01:07:47like do humans need to review code in the longterm?
01:07:50I feel like there might come a time when we don't,
01:07:55and I don't know what that's going to look like yet.
01:07:57I don't know if that means like we need different models
01:08:00as we were talking about earlier.
01:08:01Like we need a model that thinks differently
01:08:02to review the code so that we get a better mix of opinions.
01:08:05Maybe that's part of it.
01:08:06Maybe agents are able to self-merse their PR
01:08:10if another agent reviewed and they address the comments.
01:08:13Maybe, I don't know.
01:08:15It feels wrong, obviously,
01:08:17because that's the final stand that a human has right now
01:08:22is at least I'm involved at the gate of letting things in.
01:08:26So I feel like it makes sense why the code review is required
01:08:30because it's not trained to do all this stuff right away.
01:08:33It can be accomplished either through code review
01:08:35or like I saw Cloud Code
01:08:37put out a /simplify command recently.
01:08:40I've been cooking up my own as a skill.
01:08:42It's really nice to have that sort of thing.
01:08:43Both of those are the same way to address the problem.
01:08:46Have it do multiple passes with a lens for get it work
01:08:50and then a lens for get it right.
01:08:51And it's really just a matter of training it in more.
01:08:56Maybe it's like a hook.
01:08:57Maybe it's baked straight into the post-training
01:08:59for certain reasoning models.
01:09:01Like maybe there's extra high reasoning
01:09:03that explicitly does that.
01:09:05But yeah, it is kind of funny
01:09:08'cause you were mentioning like code wrapping and reptile.
01:09:11Like we built our own with Oz as well
01:09:13where it just triggers a cloud agent as a GitHub action
01:09:16and reviews the code.
01:09:17And we open source the skill for that.
01:09:19If you wanna apply it to CodePilot or apply it to Oz,
01:09:21you can do whatever you want.
01:09:23You've kind of hit a point where it's like,
01:09:25yeah, code review is just one of countless places
01:09:28an agent could review the code.
01:09:30I feel like you could do it in GitHub.
01:09:34You could do it locally.
01:09:35You could do it wherever.
01:09:36And it's really just a matter
01:09:38of making a code review process that you like.
01:09:40And I do think code wrapping and reptile like super focus
01:09:44on this problem, which is so cool.
01:09:46But I do also think we're in a world now
01:09:48where you could build your own and it could run locally
01:09:51or it could run in the cloud.
01:09:53And you can build it to exactly your preferences
01:09:55at this point 'cause the models are so good.
01:09:57That's why we're leaning into like,
01:09:58yeah, we have a code review example,
01:10:02but you could also apply this anywhere else in the stack
01:10:05or write your own.
01:10:06Like at this point it doesn't really matter
01:10:08'cause the models are good enough
01:10:09that you could just do that.
01:10:11- Should we give Ben a chance
01:10:13to plug something if he wants?
01:10:15- Oh yeah.
01:10:16So what was you want to plug Ben?
01:10:19- What do I want to plug?
01:10:21Oz.dam, go over there cloud runners, all that stuff.
01:10:24Personal side, we were talking about like content
01:10:28and whiteboard videos, trying to be as open as possible
01:10:32with all the strategies to use agents
01:10:34to write code more effectively
01:10:36or just to be a good software developer in general.
01:10:38So if you're part of that community,
01:10:40I'm around Twitter, Blue Sky, YouTube,
01:10:44and a number of other places that be Holmes Devs.
01:10:47I'm sure there's a link for that.
01:10:48Holmes is in Sherlock Holmes.
01:10:50So if you ever want to come find me,
01:10:52ask follow up questions from this or anything else,
01:10:55I'm around.
01:10:56- Cool.
01:10:56I think you mentioned on the podcast,
01:10:57sorry, I was meant to wrap up,
01:10:59something about Dr. Who jokes, you're a Dr. Who fan?
01:11:02- Did I mention Dr. Who jokes?
01:11:04I mean, I am.
01:11:05Definitely back in high school, that was the big era.
01:11:08David Tennant era, Matt Smith era.
01:11:11That's what I grew up on.
01:11:14So yes.
01:11:16- Sure, but not anymore.
01:11:17- Yeah, I haven't tapped back in recently.
01:11:20I don't know.
01:11:21- Thanks for listening to this episode
01:11:22of the Better Stack Podcast.
01:11:24Find us wherever you listen to your podcast.
01:11:26So Apple Podcast, Spotify, we're there.
01:11:29And from me, it's goodbye.
01:11:31- Goodbye from me.
01:11:32- Goodbye from me.
01:11:33- Goodbye from me.