Why GUIs Will Replace TUIs for AI Developer Tools | Better Stack Podcast Ep. 14

English

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Opus 4.5 kind of blew the doors off of like,

00:00:03if you didn't see it coming, it's here, sorry.

00:00:06I think people are picking models

00:00:08'cause it's someone that they want to work with

00:00:10rather than I'm picking this model because benchmark score.

00:00:14The harness doesn't matter as much as it used to.

00:00:16You can kind of use whatever harness

00:00:18you want to talk to these things.

00:00:19Like the difference between asking a question through codex

00:00:23and asking open code a question through codex,

00:00:26really small difference.

00:00:28- Hey Ben, thank you so much for joining us.

00:00:31I think we're going to start with the simple question,

00:00:33which is who are you and what do you do?

00:00:36- So I'm a developer relations lead at Warp.

00:00:40So I do a whole combination of building a bunch of tools

00:00:44with AI to help out the team,

00:00:46working with everyone who's using Warp out in the community

00:00:50or out in industry and been working on sort of the future

00:00:55of how you hold agents both locally and on the cloud.

00:00:58I'm sure we'll get into a lot of that stuff today.

00:01:00- Sure, I think we will.

00:01:02But before we get into that,

00:01:03I've noticed you're very good at talking on camera,

00:01:07just explaining things really well

00:01:08and it's not a common thing that people have.

00:01:11So what led you into that direction

00:01:13and how did you develop that skill?

00:01:15- Yeah, I mean, I've been doing content

00:01:19about programming for a while,

00:01:22at least a while to me, as long as I've been like in industry.

00:01:27So I started back in pandemic times

00:01:31when everyone was bored at home looking for things to do.

00:01:35And I was working on a side project at the time

00:01:39that was stitching on a JavaScript bundler

00:01:42to a framework called Eleventy.

00:01:44This is way back in the day, still a great framework.

00:01:47And I wanted some way to communicate what I was building

00:01:53in a way that didn't feel dry.

00:01:56Though normally I would see people post like a link

00:01:59to their changelog or their readme on Twitter

00:02:01to post updates on what they're doing.

00:02:04And I wanted to do something

00:02:05that was more of a video format.

00:02:06I've been doing blogging up until that point.

00:02:08I just wanted to try something different.

00:02:09So I had a whiteboard in my closet and a rock band microphone.

00:02:13That was the only microphone I had, but it had USB.

00:02:16So I was like, okay, I can plug this in to my computer

00:02:19and it works as a microphone.

00:02:20So I propped up the whiteboard on a chair.

00:02:23I wrote up the changelog on the board

00:02:25instead of doing it as like a actual written post.

00:02:29And I just like walked through all the features

00:02:31and then I jumped to a demo and I posted it.

00:02:34And people really liked the format.

00:02:36I mean, the library did all right.

00:02:38I mean, it had some cool ideas in it.

00:02:40That's actually what led me to work on Astro,

00:02:41which is an open source framework

00:02:43that I maintained for a few years.

00:02:45But it kind of showed me, first off,

00:02:48like the developer community is very small

00:02:50and very welcoming to anyone who's working on cool stuff.

00:02:54And also there is room for like intermediate dev content

00:02:59that's very showy and YouTube focused.

00:03:03Like you're on camera and you're talking about things.

00:03:06So that eventually led to me doing a lot of content creation

00:03:11about just general web development concepts,

00:03:13like best practices using HTML,

00:03:15talking about all of the libraries

00:03:16that have been floating around,

00:03:17like Seltkit and Solid and Nuxt and HTMx.

00:03:21And I just did like two short videos a week

00:03:24for like many years or two years, I think,

00:03:28something like that,

00:03:29where I was like doing it really consistently.

00:03:32And yeah, it just builds up a muscle

00:03:34when you do that over and over again.

00:03:36And it also forces you to learn things

00:03:39if you do it in short form, one minute bursts,

00:03:41which is what I've been doing.

00:03:42I haven't really done as much long form.

00:03:44It's been a lot of like short form, one minute,

00:03:46problem solution, let's talk about how this thing works.

00:03:50So just through all of that,

00:03:51you learn skills of like self-editing,

00:03:53you learn what people actually care about

00:03:55and you kind of drill more into those topics.

00:03:58And right now I'm trying to sort of steer that

00:04:01from deeply technical,

00:04:03let's talk about how this library works

00:04:04to now you're using agents

00:04:06at this higher level of abstraction.

00:04:08Let's talk about some strategies there

00:04:10and also where things are heading tooling-wise in that space.

00:04:13- And I was gonna say,

00:04:15I think a lot of people will know you

00:04:16from your whiteboard shorts.

00:04:17I actually think that's the reason

00:04:19my personal website was written in Astro

00:04:21is I think that's how I first discovered--

00:04:22- There you go.

00:04:23- When you were at Astro.

00:04:24So yeah, I'm a big fan of Astro

00:04:26and what you did over there.

00:04:28- Thanks, yeah.

00:04:29I mean, I was honestly surprised

00:04:31'cause my content was so scattershot.

00:04:33Like I was just excited by all the frameworks

00:04:36that were going on and continue to.

00:04:38Like huge Svelte user, Vue's doing interesting stuff,

00:04:41Solid's doing interesting stuff.

00:04:43And since Astro supports every like rendering framework,

00:04:46it's easy to just talk about all of them and say,

00:04:48yeah, this is part of the job.

00:04:49We support everything, so let's talk about everything.

00:04:53But it's cool that the Astro message still kind of landed,

00:04:55even if I'm not explicitly talking about it all the time.

00:04:58I think that's something I've learned also,

00:04:59like we're gonna work right now

00:05:01is just be generally useful in the community

00:05:04and people will notice what you're doing.

00:05:07Like you don't have to be silly to talk about this stuff.

00:05:09In fact, you shouldn't be.

00:05:10You should just like pick up on, hey, this is useful.

00:05:13Let me try to explain it.

00:05:14So yeah, it's been going well.

00:05:19- I think one of the most popular videos

00:05:20you've done on YouTube is the one where you go through

00:05:23the basics of React Server Components,

00:05:25or you build it from scratch.

00:05:26What's your honest view on React Server Components?

00:05:30'Cause it hasn't, I mentioned this to Evan Yu

00:05:32when we interviewed him,

00:05:33but it hasn't been the slam dunk

00:05:35that people thought it was gonna be.

00:05:37And so, yeah, what do you think about it?

00:05:39- Yeah, I haven't opened that box in a little bit

00:05:41'cause I didn't really use, like I,

00:05:44again, it's another case of,

00:05:46there's clearly a lot of confusion in the community

00:05:48about what this thing even does under the hood.

00:05:50So let me just try to understand it.

00:05:53And as a nerd, I just went down the rabbit hole

00:05:55of reading source code and figuring it out.

00:05:57So yeah, that video you mentioned started

00:06:00as like a conference talk that I did at React Summit.

00:06:03This was back, I think two years ago.

00:06:06And then I adapted it to a YouTube video while it was fresh

00:06:09and I could just kind of rattle it off.

00:06:11And yeah, it got a lot of interest

00:06:13because people still didn't know how it worked.

00:06:15And it was kind of an intro to how it does things.

00:06:19And since then, I mean, it seems like Next.js

00:06:23is still like the most used framework.

00:06:25And if you ask an agent to build something for you,

00:06:28it's only gonna accelerate

00:06:30what is most popular in the training set.

00:06:32So that's created a flywheel for Next.js for apps

00:06:35to just kind of proliferate.

00:06:36I agree, like on the tech stack side,

00:06:40it's not like this universal solve.

00:06:42And also because it blends the lines

00:06:44between what is running on the server versus the client.

00:06:48As a human reading it, it's hard to tell

00:06:50unless you're looking at the directives

00:06:52at the top of the file or inside of the server functions.

00:06:55So you can forget where you are.

00:06:56And that was always a problem.

00:06:58Like I remember like isometric JavaScript

00:07:01being like a really big term

00:07:02where you wanna write JavaScript

00:07:03that could run on a server or on a client.

00:07:06And by authoring that way, you forget where it is running

00:07:11and what the implications of each space is.

00:07:13And there are very different implications

00:07:15of this runs on a serverless function

00:07:16that runs once versus this runs on the client

00:07:19and it's stateful as long as the person has a website open.

00:07:22So I do feel like, yeah,

00:07:24they could have drawn clear boundaries

00:07:26on where everything lives.

00:07:27I know the goal is to just make the super abstraction

00:07:30that does everything.

00:07:31And at least working on Astro's APIs,

00:07:34we kind of rejected that.

00:07:36And we said, no, there should be clear lines.

00:07:38Like we actually got to lead like the Astro actions project,

00:07:43which is kind of like server functions,

00:07:45but for Astro's case

00:07:47and also something that could work in any framework.

00:07:49So it was like, instead of tying this

00:07:51to like some React runtime,

00:07:52where you have JSX on the server, JSX on the client,

00:07:55we're gonna have this like actions file,

00:07:58explicitly named as a file on your file system.

00:08:00You can't just put them anywhere.

00:08:02You can export handlers for functions

00:08:05and then you get like a magic import that you can use

00:08:07to grab those as functions you can call on the client.

00:08:10And they're just async functions

00:08:11that you could call from view.

00:08:12You could call it from like a web component.

00:08:14You could call from anywhere.

00:08:16And there are resources out there

00:08:19on like how Astro actions work,

00:08:20if anyone's curious about that.

00:08:22But it was like studying what people liked

00:08:27and didn't in the ecosystem.

00:08:30It felt like, yeah, people want clearer boundaries.

00:08:32We want it to be easy to have TypeScript

00:08:36that works across the wire.

00:08:37'Cause that was kind of the appeal.

00:08:39But we didn't want it to feel like

00:08:40you're forgetting where you are when you're authoring code

00:08:43and leaving it up to the like user

00:08:47to organize all of their code either.

00:08:49We wanted like organization to be clear,

00:08:51boundaries to be clear.

00:08:52We still get the benefits of TypeScript

00:08:54and TypeSafe form data and all that other stuff

00:08:57that server actions was giving you.

00:08:59So it's just a philosophical thing, really.

00:09:02I think it's a super powerful framework.

00:09:04It's just for legibility and reviewability.

00:09:06I feel like drawing lines is a little bit smarter.

00:09:10- You know, I really liked the server island approach

00:09:12that Astro took, and I think they did a sort of great job

00:09:14on making that pretty clear.

00:09:16And yeah, specifically identifying like this component

00:09:19is on its own, it's part of the server.

00:09:21And the rest of the website is still sort of SSR

00:09:23and rendered.

00:09:24It was a, yeah, that's why I use Astro.

00:09:26As I said, I'm a big fan of what Astro did, so.

00:09:29- Yeah, that was something Matt Kane proposed

00:09:32as soon as he joined.

00:09:33He worked on Gatsby for a while.

00:09:34So he has a ton of perspective on static site generation.

00:09:37Yeah.

00:09:38- Yeah, that's funny 'cause my site went from Gatsby

00:09:40to Astro, so that explains why.

00:09:42- A lot of people did.

00:09:44- Yeah.

00:09:45- It was a target.

00:09:46It's what we wanted people to do,

00:09:49especially when Gatsby was like no longer maintained.

00:09:51Even LFI was supporting us in that, you know?

00:09:53Like, hey, let's get some Gatsby users onto Astro

00:09:56'cause it's awesome.

00:09:57But yeah, server islands are a really cool abstraction.

00:10:01And they're cool because it's really simple.

00:10:03Like I know in Next.js you have to do a lot of hoops

00:10:05to, if you wanted something to only render on the client,

00:10:10or you wanted certain things to be like statically rendered

00:10:14in certain parts of the page to be dynamic.

00:10:16Like the classic example is you have a blog post,

00:10:19but just the button in the light counter

00:10:20should actually be a server call.

00:10:22The rest of the page can just be rendered

00:10:23when you build the website.

00:10:25You wanted to have that kind of relationship in Next.js.

00:10:27You had to sort of invent magical runtimes that can do this

00:10:31that really depend on the host whether it'll work.

00:10:34And Astro was like, yeah, server islands are just like

00:10:37a fetch function, but we make the syntax a little nicer.

00:10:39That's literally all it is.

00:10:40So it's like this part of the page,

00:10:42instead of it being like rendered when you build the website,

00:10:46it's just going to be a fetch function under the hood.

00:10:48And it's going to fetch the like button count.

00:10:51And then it's going to render this HTML

00:10:53and put it in there as soon as the website's loaded.

00:10:55Like that's all it does.

00:10:56There's really nothing fancy about it.

00:10:58If you use something like HTMX, it's that,

00:11:00but like a baby version that's admittedly less capable,

00:11:03but in my eyes, easier to understand.

00:11:05And that really caught on because it's just,

00:11:09it's so easy to deploy it anywhere.

00:11:11You don't have to think about it and it's just HTML.

00:11:14So you're not thinking about what bundler am I using?

00:11:17It just kind of works.

00:11:19So I just like how we found simple solutions like that

00:11:22just to make it easier to build stuff.

00:11:24- Yeah, but now that agents are writing most of the code,

00:11:26how much of that really matters?

00:11:28- Yeah, man, I know.

00:11:31So last year was an existential crisis,

00:11:33I think for all of us.

00:11:35Some people are having that existential crisis now.

00:11:38I empathize.

00:11:39Opus 4.5 kind of blew the doors off of like,

00:11:42if you didn't see it coming, it's here.

00:11:45Sorry.

00:11:47I knew what was going on pretty early.

00:11:49As soon as I was trying cursor tab completions

00:11:52and it was doing more and more stuff for me,

00:11:54I was like, there's some inevitability here

00:11:56of where we're heading.

00:11:57It's no longer the copilot tab completions.

00:11:59This is getting serious.

00:12:01So like, yeah, I mean, I joined Warp last year,

00:12:05which is the terminal with,

00:12:08well, it's a really nice to use terminal

00:12:10that has agents built in and also cloud orchestration.

00:12:13There's many chapters to the Warp journey.

00:12:15We help you all the way up the stack,

00:12:17kind of like how Astro helped you

00:12:18all the way from static to server.

00:12:19Warp is kind of like extending up in that way.

00:12:22But I joined Warp as like early as when Sonnet 4,

00:12:26no, Sonnet 3.5 was out and it was barely capable.

00:12:30They gave me one tech demo internally in the interview

00:12:34and the agent just kind of ran in a circle and crashed.

00:12:37And that was a demo.

00:12:38And I was like, okay, we're starting.

00:12:41We're starting with something here.

00:12:43But I could see like the tool calls

00:12:44where it was actually writing files on a system.

00:12:46It was like, oh, that's different.

00:12:47So we're not like opening the file anymore.

00:12:51It's opening the file and doing stuff.

00:12:53And then I review it on the back half.

00:12:55And it wasn't really capable at the time,

00:12:57but then it became very capable later.

00:13:00And all of that definitely had me thinking

00:13:03like how valuable is API design in this new world?

00:13:06Like I spent three years designing APIs at Astro

00:13:10as did the rest of the core team and continues to.

00:13:13But if agents can hold all of these things

00:13:17and understand them, how valuable is the API design?

00:13:20I feel like it's still really valuable for like,

00:13:25can the agent pick up on patterns quickly

00:13:28or is it going to waste a lot of compute

00:13:29trying to look up documentation and running around in circles?

00:13:32I feel like as compute gets cheaper and cheaper,

00:13:34that will be less and less of a problem.

00:13:36I'm not going to pretend like, yeah,

00:13:40perfect API design will always matter, always and forever.

00:13:43Like no, the cost of it will get lower and lower and lower

00:13:46until like you're getting microsecond improvements

00:13:49by improving the API.

00:13:50Right now we're at like,

00:13:52you can cut down a two hour agent job to like 20 minutes

00:13:56or 10 minutes if the API is designed well,

00:13:58which means it's still valuable.

00:14:00And you actually need to think about this stuff.

00:14:01At some point it may change to like,

00:14:03it took a hundred milliseconds,

00:14:05now it takes 20 milliseconds if models get that fast.

00:14:08But at least in this like window we're looking at

00:14:11for next year or two, like good API design still matters.

00:14:14So I do feel like, yeah, the frameworks

00:14:18that we're building these agents on top of,

00:14:21it'll be diminishing returns maybe,

00:14:23but I do think it matters

00:14:26if the agent's able to hold well-made tools

00:14:28versus poorly made tools to get something done.

00:14:31- Yeah, I think it makes sense.

00:14:32If the API is well written enough

00:14:34for the agents to understand,

00:14:35they can navigate through it quickly,

00:14:37make changes quickly, understand it quickly,

00:14:38and therefore be better at helping you build it.

00:14:42- Usually, yeah.

00:14:44- Because any tool essentially that was easier for a human

00:14:46is going to be easier for an agent as well.

00:14:47And I think we'll see.

00:14:49Humans still being in the review process a bit at the moment,

00:14:51it's nice when it's well-written code

00:14:53that you can understand at a quick glance.

00:14:55So tools like that definitely help.

00:14:57- Yeah, totally.

00:14:58I mean, it feels like every tip I see on good prompting

00:15:02is just tips on software engineering.

00:15:03Like it's not even that different.

00:15:05I think there was one recently about like,

00:15:06stop having a big cloud MD that describes your code base,

00:15:09organize your code better.

00:15:10Yeah, that's how we wrote readmes, right?

00:15:14Like it's bad to have a thousand line readme.

00:15:16You should have like a 20 line one

00:15:18where the code mostly explains itself

00:15:20and you document at the touch points

00:15:21and split modules at correct boundaries.

00:15:24Like nothing's changed, really.

00:15:26Like if you have more of a rat's nest,

00:15:28it'll be harder for it to navigate.

00:15:30And right now these agents are still like pretty slow

00:15:33at getting everything done and very compute intensive.

00:15:35So it does matter if you've cleaned up your house

00:15:38before you have visitors,

00:15:39if that makes sense as a metaphor, I don't know.

00:15:41- Yeah, yeah, it makes sense.

00:15:43I was gonna say going back to warp.

00:15:45So I was a heavy warp user for years

00:15:47and I think I was always really impressed

00:15:50at how warp looked and felt.

00:15:52It was a terminal that felt like an IDE

00:15:55'cause you could see tool tips

00:15:57and the menu was easy to navigate through.

00:15:59You didn't have to do a lot to configure it.

00:16:02And I've certainly moved away from warp for various reasons,

00:16:05but I think that the premise of it being a terminal

00:16:09that you can use AI with to do your code

00:16:13and even view the code inside the terminal

00:16:15is really impressive.

00:16:16And I haven't messed up with OZ yet,

00:16:18but I like the direction you guys are going in.

00:16:21Can you tell us a bit about that?

00:16:23- Yeah, and I will picture bringing on why you're doing things

00:16:27that are not working. - Sure, no, happy.

00:16:28- I do wanna know. - Happy to talk about it.

00:16:29- It's everything is a user call.

00:16:32But yeah, like general high level stuff.

00:16:36Yeah, you touched on warp sort of helping you

00:16:39with everything that's around the agent harness.

00:16:42So like you can open a terminal and use cloud code

00:16:44and it works, it can edit code.

00:16:46You can look at the output,

00:16:47you can ask it to run the dev server, all that stuff.

00:16:50But if you wanted to review the code

00:16:52that Claude wrote, for instance,

00:16:53well, you might need to open git desktop or laser git,

00:16:56or even the cloud desktop,

00:16:58like any tool that you would wanna use.

00:17:01And if you wanted to add context on like,

00:17:03here's the file or the directory that I need to put in,

00:17:06you could run some terminal commands

00:17:07to like list out all the stuff inside of the project,

00:17:11find the file name with like the @mention

00:17:14where you say like @file and then you put it in.

00:17:17You can do all of that stuff,

00:17:18but it feels like it's just kind of part of the story.

00:17:20Like cloud code is an access point

00:17:22that you can use to talk to an agent,

00:17:24but there's still all this stuff around it.

00:17:25Like the code review process, the context gathering process,

00:17:30editing markdown is weirdly very important now.

00:17:32So like editing skill files, opening your agents MD,

00:17:35that stuff kind of matters too.

00:17:37And warp is just kind of like, can the app do all of that

00:17:40instead of you jumping around a bunch of different tools

00:17:42or installing a bunch of CLI equivalents,

00:17:45like TUI applications.

00:17:46So that's why we started doing things

00:17:50that aren't really what terminals are supposed to do.

00:17:54Like putting a file explorer on the left

00:17:56and putting a code diff view on the right.

00:17:59Feels a lot like VS code if you have everything open in warp,

00:18:01but it's all progressive disclosure.

00:18:03Like you can hide it and just use a terminal if you want.

00:18:06But most of my day I have like the file tree pulled up

00:18:09and I have the code diff view that I sort of expand out

00:18:12whenever an agent's done.

00:18:13We let you like edit the code diff as well.

00:18:16We added LSP support.

00:18:17So it has like hover hints and stuff like a real editor.

00:18:21We went hard, but you can like, even at the simple level,

00:18:25I'm just, I want to review the code the agent wrote.

00:18:27You can pop open a diff view inside of your terminal.

00:18:29You can leave comments.

00:18:30So you can like hit a little button

00:18:32to leave a comment on a line and say,

00:18:33this doesn't make sense to me.

00:18:34Can you explain this?

00:18:35Send it to the agent and then it'll just kind of pick up

00:18:37on that comment for you.

00:18:39So it's easier to like have a iteration loop with an agent

00:18:42when you tie sort of the environment to the agent itself.

00:18:45And as I said, like all this stuff

00:18:47is inside of a terminal still.

00:18:49So what we're playing with is like, yeah,

00:18:52we have this diff view, this file view,

00:18:53but you don't have to use the warp agent

00:18:56to use all of that stuff.

00:18:57You can, and the warp agent does have like a really nice

00:19:00GUI around it to easily like look at get diffs

00:19:03and things like that.

00:19:04But if you want to use Claude code

00:19:07and then you wanted a nice diff view and a file tree

00:19:09to drag in context, you can do that in warp really easily.

00:19:12And we have like a tool belt that lets you open

00:19:14all those menus, a way to enter voice mode,

00:19:17an image uploader.

00:19:19We're also playing with that code comments feature

00:19:21I just talked about where you can like leave comments

00:19:22in a diff view.

00:19:23We want that to forward to the Claude code CLI

00:19:25or the codecs CLI as well.

00:19:27So we're experimenting with that.

00:19:28So unlike stuff like cursor where it's all owned by cursor,

00:19:34like everything end-to-end is owned by that tool.

00:19:37In warp, we really just own the stuff around the harness.

00:19:40So if you wanted to run any agent like PI or codecs

00:19:44or any of those, you can, but we still have the diff view

00:19:47and the file view and all that stuff to help you like work

00:19:49with the agent.

00:19:51So it's a really unique spot of being like a terminal plus plus

00:19:54that you can run all these agents inside of it.

00:19:56Then like you get some nice helpers on top

00:19:59without you having to install or configure all that yourself.

00:20:02You know, I'll be honest, I was similar to Richard

00:20:04and I used to use warp.

00:20:06I don't think there was any reason I left more than

00:20:08it was just the time where there was so many tools coming out

00:20:10that you sort of just hop between loads of them.

00:20:13And I think obviously one of my original problems

00:20:15was that I still liked VS code when it had like tab complete

00:20:18and all of that at the time.

00:20:21But I'm finding less and less that I am using an IDE now

00:20:23like VS code and cursor.

00:20:25So I definitely need to check out warp again

00:20:27'cause it sounds like you've added a lot to it

00:20:29that helps the modern day sort of development flow.

00:20:33- I'm gonna say I'm an avid warp user and I love using it.

00:20:37Yeah, and my question is with everything

00:20:45that's going on right now and everyone's coming up

00:20:48with their CLI tools and everything,

00:20:50do you think like two ways are the way to go

00:20:53the way of the future and IDEs will disappear

00:20:56and two ways will just take over the entire industry?

00:21:00- I mean, no, I mean, they're fun, they're fun to use.

00:21:05I do think like for a near term solution of like,

00:21:09oh, we were given these coding agents and they live in a CLI.

00:21:12What's the easiest way to build tooling around it?

00:21:15The more CLIs, let's just wrap it with Tmux.

00:21:19Let's wrap it with, I saw Cmux come out recently,

00:21:23which is like a ghosty extension that gives you like

00:21:26vertical tabs and stuff.

00:21:28This is getting us thinking about like,

00:21:29should warp have vertical tabs and all that.

00:21:31I hope we do.

00:21:32So people are doing crazy stuff.

00:21:35And two ways are kind of the quickest interface

00:21:37to just do that without leaving the terminal

00:21:39where you're already living.

00:21:41Warp is the harder path of like,

00:21:43let's actually build a GUI around this.

00:21:45So you need to go deeper into the terminal itself

00:21:48to add all of these tools.

00:21:50So that's the path we're taking.

00:21:52And I think we progressed from bash prompts in the 80s

00:21:57to using clickable interfaces shortly after.

00:22:00I don't really see that being any different now,

00:22:03like the appeal of being able to click and your cursor moves.

00:22:06Yeah, it's pretty useful to be able to move your cursor

00:22:10like that instead of using your keyboard to do it.

00:22:13Of course, keyboard warriors will disagree.

00:22:14But yeah, I think it's intuitive to click around

00:22:17and have like expandable menus and stuff like that.

00:22:21So it's smarter to just use like a GUI

00:22:22instead of using a two ways,

00:22:24like a rendering engine of its own.

00:22:26But I do think that somewhat I learned Vim

00:22:32right at the tail end of like when that was worth learning.

00:22:37I think it might be worth learning today.

00:22:39I learned it a few years ago.

00:22:41And then I got more used to navigating around

00:22:43with the keyboard.

00:22:44I like shortcuts and all that,

00:22:46but like scrolling around a diff view and leaving a comment,

00:22:49I don't want to use a bunch of like one letter shortcuts

00:22:53to do all of that.

00:22:54I would much rather scroll through the diff,

00:22:56click on the line, leave a comment.

00:22:58Like it just kind of makes sense.

00:22:59And I know two ways can be interactive.

00:23:01Like I have seen once they use interactive modes

00:23:04that you can click on things,

00:23:06but you can still deal with like rendering flashes

00:23:08when it's like, 'cause it's already like re-rendering

00:23:10the whole page over and over and over again.

00:23:12That's how two ways work.

00:23:14So there's a fundamental limit to that.

00:23:17There's also a limit to what you can display.

00:23:18Like you can do grading animations and stuff like that.

00:23:21But again, it's not like the best rendering engine

00:23:23to do that sort of thing.

00:23:25It's better if you can just like actually use

00:23:27a rendered native GUI to do that.

00:23:30So I feel like the answer is of course,

00:23:33we're going to go to a GUI.

00:23:35Is it going to be an IDE though?

00:23:37Probably not.

00:23:38We're probably not going to have it with like

00:23:39the full code editor where you have to wait for it to index

00:23:43before you can really use the tool.

00:23:45Like all of that waiting doesn't make much sense

00:23:47'cause I just want to talk to an agent right away.

00:23:49That's like the main interface and then everything else,

00:23:53like a diff view or a file view or supplementary.

00:23:55So to be like agent first, all of that stuff is debugging.

00:23:59And I feel like that's what Warp's doing.

00:24:01Like we're literally walking in that direction

00:24:03of like the agent's the first thing you see,

00:24:05but the file editor and the diff view are like

00:24:07the second thing that you open up

00:24:09after you're debugging what it's doing.

00:24:12- Yeah, I must admit, I'm a GUI type of person.

00:24:14So I'm a big fan of, yeah, GUIs.

00:24:16I love the sort of codecs app recently

00:24:18that OpenAI came out with.

00:24:20I think now that we're getting into multi-agent,

00:24:23it just makes more sense to me to be a GUI

00:24:25that I can click about in.

00:24:26I've never sort of enjoyed the terminal

00:24:28for more than one or two agents.

00:24:30I know I've never been a keyboard warrior, I must admit.

00:24:33So yeah, I do think we'll go back to GUIs at some point,

00:24:36but yeah, as you said, it probably won't be an IDE.

00:24:39- Yeah.

00:24:40How are you liking the codecs app by the way?

00:24:42Are you using it in like your workflow or just experimenting?

00:24:45- I enjoy it a lot for sort of vibe code experiments

00:24:48at the moment.

00:24:48I must admit, I haven't sort of used it too heavily,

00:24:50but it's just been very nice

00:24:52to have sort of multiple agents open at the same time

00:24:54and seeing them all in the sidebar,

00:24:55what they're doing and sort of clicking around.

00:24:58And the codecs model has been sort of very good.

00:25:01It's sort of my favorite coding one at the moment.

00:25:03It just sort of understands what I mean.

00:25:06And it's sort of hard to describe

00:25:08how it's better than Opus 4.6 sometimes,

00:25:10because obviously everyone has their own favorite model,

00:25:13but it's just sort of the way it feels.

00:25:14And I think it's just, as I said,

00:25:15everything's well integrated in the app

00:25:17and I'm having to check code less and less,

00:25:20which is worrying.

00:25:21Obviously, some of the apps I develop,

00:25:22I don't actually need to check the code

00:25:24'cause they're quick demos.

00:25:25So it's, yeah, I don't need to worry

00:25:27about security or anything.

00:25:29I do still think humans need to check code

00:25:31on production apps.

00:25:33- Yeah, we definitely check our code at work as well.

00:25:36Although we did set up an agent to check our code too.

00:25:39And I think that's a pretty common pattern at this point

00:25:41of like have an agent write the code,

00:25:44have a different agent review all the code,

00:25:46either as a GitHub action,

00:25:48which you can set up with like the Oz system

00:25:50that we've been building,

00:25:52or just having the agent review its work locally.

00:25:55Like both of those work pretty well.

00:25:57I've actually done that,

00:25:58where I just start a new conversation in the same directory

00:26:01and I have a saved prompt.

00:26:02I could probably make it a skill at this point.

00:26:04It's like just review the code the other agent wrote

00:26:07and make sure it's PR ready,

00:26:09simplify where it makes sense, yada, dada.

00:26:11It knows how to do a code review.

00:26:13- Actually, I'm curious about this.

00:26:16So do you use the same models for reviewing and writing,

00:26:20or do you find that there's a model

00:26:22that's better at reviewing code versus writing it?

00:26:25- Yeah, so I use the same model.

00:26:28We ask this all the time.

00:26:30And everyone thinks they've cracked the code

00:26:32in our like user groups.

00:26:34And someone's like, I only use Cloud Opus for plans.

00:26:38And then I switched to Codex to execute.

00:26:40The next person says, I only use Codex to write a plan.

00:26:43Well, I would use Opus.

00:26:44And then I use Opus to execute on the plan,

00:26:46'cause that's the best model for execution.

00:26:48It's like, it really just depends on like the type of code

00:26:52that you like reviewing, I guess.

00:26:54'Cause they do write different flavors of code, I've noticed.

00:26:57Like there are differences.

00:26:59- Yeah, but the reason why I'm asking is,

00:27:01like if you use the same model to review the code

00:27:04that the same model wrote,

00:27:06there's kind of like a bias, you know, in the code quality.

00:27:09- Maybe, maybe, I thought like,

00:27:13as long as you start a new conversation,

00:27:15you don't have the bias of past context, which is good.

00:27:19It's kind of like two people who work the same way

00:27:21reviewing each other's work,

00:27:22versus I guess people who work different ways.

00:27:24So it could make sense to like switch your model

00:27:26before you review something.

00:27:28Is that what you do?

00:27:30- Yeah, that's, well, I haven't like established

00:27:32like a workflow for it, but that's what I've tried.

00:27:35And it's an interesting experiment to see, you know,

00:27:37how different train models communicate with each other.

00:27:41- I was gonna say, I've always sort of thought,

00:27:43it's a bad UX though, to rely on the user

00:27:45to know what model is best at everything.

00:27:47And I don't know if this might be a hot take,

00:27:49but I think obviously we're very early days at the moment.

00:27:51So I think it's fine.

00:27:53But I do see why tools like cursor and that lot

00:27:56have the auto mode, because I think to sort of more,

00:28:00less individuals who are online all the time

00:28:02reading about new model updates and everything,

00:28:04like they don't want to be thinking about,

00:28:05oh, I need to use Opus to plan,

00:28:07or I need to go to Codex for the code

00:28:09'cause it's better at that.

00:28:10They just want one thing that does everything for them.

00:28:12So yeah, that was just sort of my random rant.

00:28:14I think it's that UX might go in the future, essentially.

00:28:18- Yeah, it might.

00:28:20And we have an auto mode as well for that sort of reason.

00:28:22People who don't want to pick anything.

00:28:25We do break down the auto model based on what you value.

00:28:28So we have cost efficient, responsive, and genius.

00:28:32So like cost effective is what you expect.

00:28:34It might take longer.

00:28:35Usually it routes to like, I think it's an earlier GPT model

00:28:39that's like safer on tokens,

00:28:41but may take longer to complete.

00:28:42Then it's responsive in the middle and genius,

00:28:44which I believe routes to either Opus or Codex 5.3.

00:28:47It always depends which one's actually going to be

00:28:49like better output.

00:28:50But yeah, I think that makes sense for people

00:28:53who don't want to think about the choice.

00:28:57I mean, all of us here are terminally online.

00:29:00So we're all interested in a model picker

00:29:04that can let you try these things out.

00:29:06It's another reason I really value

00:29:08like general purpose harnesses,

00:29:11which are popping up and becoming more important.

00:29:14Kirk's is one of them, Warp's one of them.

00:29:16Also tools like Copilot and Pi are examples as well.

00:29:20Open code, of course.

00:29:21Because it's useful to be able to try all these things.

00:29:26It's useful to experiment with like who's better

00:29:28at which task for our team.

00:29:30Because again, all these models have like different flavors,

00:29:33but they can still get things done.

00:29:35Like the way I describe it is,

00:29:36Codex is kind of like German engineering.

00:29:40Like it gets everything detail-oriented and exactly right.

00:29:45But as soon as I ask it for like function names

00:29:46and code comments, they're super mechanical

00:29:49and not how I would do things.

00:29:51It also takes longer 'cause it researches forever.

00:29:53And Opus is kind of like the, I don't know,

00:29:56the grad student at Georgia Tech that's up at 2 a.m.,

00:29:59but they're getting things done.

00:30:00They're moving fast, their code's super readable

00:30:03because they actually kind of talk more like a human.

00:30:06That's totally like, people have different opinions on it.

00:30:09But I do think because of how much that's resonated

00:30:12and how many times I've seen that sort of take,

00:30:14I think people are picking models

00:30:16'cause it's someone that they want to work with

00:30:18to anthropomorphize it rather than I'm picking this model

00:30:22because benchmark scores.

00:30:24I think we're kind of past that point.

00:30:25I don't think people are picking models

00:30:27because of benchmarks alone anymore.

00:30:30- You know, I definitely agree with that

00:30:31'cause we cover a lot of model releases on our channel

00:30:33and sort of at a certain point like six months ago,

00:30:36I just flashed the benchmarks up now and I move on

00:30:39because it's like, I don't think you care

00:30:40that it's 1% better than the last model.

00:30:43It's like, have they added any new features?

00:30:45Is it quicker maybe?

00:30:46'Cause I think that's still something people care about

00:30:48is sort of speeding these models up a bit.

00:30:50But yeah, I think benchmarks are a little less obvious,

00:30:54especially now that they clear

00:30:56so many of the easy benchmarks.

00:30:57It's only really difficult benchmarks that seem to matter now

00:31:01and it's sort of hard to explain the benefits

00:31:03of one model over another to people

00:31:05without them just using it for a week

00:31:06and then using another one.

00:31:07It's very hard to pick a favorite model.

00:31:09And as you said, you see so many opinions on Twitter

00:31:11of which one is best for what workflow.

00:31:15So I'm curious how sort of Warp chooses

00:31:17what is best in the auto mode.

00:31:19When a new model comes out,

00:31:20do you have your own sort of suite of benchmarks

00:31:22you run internally to decide

00:31:23if it should be upgraded and things?

00:31:25- Yeah, we do.

00:31:27We have a eval suite.

00:31:28We have a set of benchmarks

00:31:30that are more like industry standard ones

00:31:31like SWE bench pro just to validate.

00:31:35We also do have some auto routing in there.

00:31:37So if you ask a very simple question in your terminal,

00:31:39like can you handle this Git rebase for me,

00:31:44which actually might be kind of complicated.

00:31:46If it's a simpler one, like revert this commit,

00:31:48like I forgot the command.

00:31:48Like people use that in Warp all the time

00:31:50just 'cause you can ask in plain English,

00:31:52revert this commit and then it runs some commands.

00:31:55For that, we route to like a simpler model,

00:31:57either HiTu or Sonnet, I believe.

00:31:59And then if it has planning mode, for example,

00:32:02like if you requested a plan,

00:32:04that should probably go to a smarter model that reasons.

00:32:06If it's like a longer horizon coding task,

00:32:09it'll go to a reasoning model as well.

00:32:12But choosing which reasoning model,

00:32:14yeah, it's tough 'cause it was easy to answer that

00:32:17up until very recently when Codex and Opus

00:32:20became very comparable to each other.

00:32:23So at this point, I do think it's gonna be a combination of,

00:32:27yeah, the benchmarking, but also user feedback.

00:32:29'Cause if you switch out one for another,

00:32:31users will say like, this feels different.

00:32:34It doesn't talk to me the same way.

00:32:35What'd you do?

00:32:36What's in the sauce?

00:32:37This is different.

00:32:38So I think that kind of forces tools

00:32:41to like be a little more consistent.

00:32:43And I have wondered how tools like AMP navigate this

00:32:46'cause I've seen them switch between Gemini

00:32:49and Opus and Codex

00:32:53throughout different eras of their coding harness.

00:32:57And I'm curious how people feel about that

00:32:58'cause it does feel like you're talking

00:32:59to a different person when you make those kinds of switches.

00:33:02For us, we've just kind of kept it consistent

00:33:04because same benchmarks, but keeps the field consistent.

00:33:07Let's use Opus.

00:33:09I believe that's what we've been doing,

00:33:10but there will come a fork where we have to decide like,

00:33:13is the 5% benchmark improvement worth it

00:33:15to have a different voice?

00:33:17I don't know.

00:33:18- I was gonna ask if you can use open source models

00:33:20with warp or is that something you can do or not?

00:33:23- You can't use like your own models.

00:33:26We have bring your own key.

00:33:28If you wanna use like a Tropic, Gemini, OpenAI,

00:33:31Benrock, I believe.

00:33:32We do have GLM.

00:33:34That's the extent of it,

00:33:35but we haven't opened it up to like general

00:33:37or local model support.

00:33:39Not that it's not tracked.

00:33:40I know it's like the top voted feature

00:33:41to have like local model support.

00:33:43So it is definitely on the roadmap.

00:33:46We have a quality team that maintains

00:33:49both the benchmarks I was talking about

00:33:50and also new model releases.

00:33:52So yeah, it is heard that we wanna get that in there.

00:33:56- Do you have an integration with OpenRouter?

00:33:58- We do not.

00:34:00What would that look like?

00:34:01What would you wanna see there?

00:34:03- 'Cause like with OpenRouter,

00:34:06you can choose whatever model you like

00:34:08and the library is just huge.

00:34:12So that would be, I don't know.

00:34:13If you could get that feature in,

00:34:16that would be amazing.

00:34:18- I suppose a problem becomes models are so different

00:34:21at making tool calls.

00:34:22And if they actually work with tool calls

00:34:24that you'd have to nearly verify every single model,

00:34:26which is obviously an impossible task.

00:34:28'Cause I know you will see,

00:34:29people say Gemini is pretty bad

00:34:31at following tool calling rules and things.

00:34:34- You could whitelist some of the models, you know,

00:34:36don't have to use all of them.

00:34:38- Yeah, whitelist, but allow you to bring that key

00:34:41just so you have that flexibility.

00:34:43That totally makes sense.

00:34:44Yeah.

00:34:46- But on what you were saying, yeah.

00:34:48I was just gonna say like codecs, for example,

00:34:51took a while to actually get in there.

00:34:53We were a full like three weeks late, which is an eternity.

00:34:57People are just shouting at the door, where's codecs?

00:34:59And it's because codecs is really specific

00:35:02about the tool calls that it wants.

00:35:04I think 5.3 improved that a bit,

00:35:05but we just plopped codecs in our harness

00:35:08and it did not perform well.

00:35:10It just felt like it wasn't updating us.

00:35:12It searched the web for like five minutes

00:35:15when it definitely shouldn't.

00:35:16So we had to figure out how do you tune this harness

00:35:18to actually get the performance

00:35:19that the codecs team is getting out of their CLI.

00:35:21So we did it, we put in the work

00:35:22and we made it like work in our harness,

00:35:24but it wasn't as simple as we just added the model

00:35:27to the list.

00:35:28That is sometimes is that simple,

00:35:30but for certain flavors like codecs or Gemini, it's not.

00:35:35- When you mentioned about warp, that's why I've left warp,

00:35:38I think everyone's kind of answered

00:35:40or alluded to that slightly.

00:35:41But there was a time when I used open code in warp

00:35:45and I was a fan of open codes.

00:35:47And Kimi came out.

00:35:48I was like, oh, Kimi looks like a really cool model.

00:35:51And I tried it in open code and I was thinking,

00:35:53well, why am I using warp to open open code?

00:35:56Because I was using warp with a subscription

00:35:59and I was using codecs, not codecs,

00:36:01Sonnet and the Claude models with warp.

00:36:04But then when I started to use other models,

00:36:06so like Kimi and like Quen, GLM,

00:36:10they didn't support warp.

00:36:11And so I thought, well, I might as well use

00:36:13a regular terminal if I'm going to be using those models,

00:36:16'cause it's easier for me just to use that

00:36:18than to use open code inside warp.

00:36:22So yeah, I don't know if that makes any sense.

00:36:24- Yeah, that makes sense.

00:36:25Like you didn't want to reach for the warp harness anymore.

00:36:27You wanted to use open code.

00:36:28So you had that access and you could use whatever,

00:36:32like Kimi 2.5, I know that one's like a really cool model

00:36:35that they jumped on.

00:36:36And yeah, you were mentioning like pricing,

00:36:40if you're paying for both, that makes a ton of sense.

00:36:42And we hear that feedback.

00:36:44I do know, like, I mean, you can just, you know,

00:36:48use the warp free version, like the app just kind of works.

00:36:50So if you want to keep using it to run open code,

00:36:53you can just do that.

00:36:54And you still get like the voice mode

00:36:55and the file diff and all that stuff.

00:36:57But if none of it's useful, like you try it on,

00:36:59you're like, I don't reach for this,

00:37:01then it makes total sense.

00:37:03So yeah, I get that.

00:37:05- I think one of the biggest complaints

00:37:07that I've heard you guys get is,

00:37:09why have a terminal with a login page?

00:37:11So I think that's the biggest one I hear.

00:37:13- You don't need to log in.

00:37:15You can use warp without a login,

00:37:17but it's that first impression that sticks, man.

00:37:20Like people are like, when are you gonna get Windows support?

00:37:22And like two years ago, we've had Windows support,

00:37:27but it burst on the scene is like the Mac terminal

00:37:30you log into, so I don't know.

00:37:31But I get it, yeah.

00:37:34Like why log in if you're not gonna use like the warp agent

00:37:37and all that stuff?

00:37:38We do have other things like file storage,

00:37:42if you want to like store commands, store planning documents,

00:37:46I do that just so I don't have to commit the code.

00:37:48You can like put a planning document

00:37:50in what's called the warp drive to save it.

00:37:53These are small things.

00:37:54It really depends on what you want to do.

00:37:57And maybe warp with no login, maybe, I don't know.

00:38:00- I suppose it's sort of to get people to stop thinking

00:38:03that warp is just trying to compete with like ghosty.

00:38:06It's got a load of other things in it.

00:38:08And yeah, it's sort of a new environment

00:38:10for the agentic world of development.

00:38:13But yeah, I see that people probably got stuck on that opinion

00:38:15when warp first came out

00:38:16when it was just sort of AI in the terminal,

00:38:18but it's trying to compete with those.

00:38:20So I guess it's the messaging around then.

00:38:21And yeah, unfortunately, as you said,

00:38:23first impressions do stick, so sorry about that.

00:38:25Hopefully we can change some minds here.

00:38:27- Yeah, and we're honest about like,

00:38:33we don't really compete with ghosty

00:38:35'cause we don't look at them as the same kind of tool.

00:38:37Like ghosty is the very lightweight terminal

00:38:40where you're going to stitch on all of your plugins.

00:38:43You're going to build your own little universe

00:38:44inside of there, use two ways, use things like that.

00:38:47If that's what you value, then you should use ghosty 100%.

00:38:50Like there's no reason.

00:38:51Warp is like, I don't want to stitch those tools together.

00:38:54I kind of like if you could just bring me like a diff view

00:38:56and a file explorer and a voice input button.

00:38:59It's just kind of there and I don't have to configure it.

00:39:02And as long as I just use the free tier,

00:39:04I can use open code if I want to as well.

00:39:06Like if that's your mentality, it's like,

00:39:08I just want the GUI.

00:39:09Like I don't want to stitch together

00:39:10a bunch of TUI applications,

00:39:12then Warp is that option for you.

00:39:15I look at it kind of like NeoVim versus VS Code.

00:39:18It's not as dramatic as that, but it's the same ethos

00:39:22I feel like of where people gravitate

00:39:24and what they end up using.

00:39:26- Are you able to go into what OZ is?

00:39:28Obviously I think that's quite a new release for Warp.

00:39:30And I'm curious sort of what that's doing

00:39:32for cloud agents now.

00:39:34- Yeah, it's interesting.

00:39:36So OZ came out like a couple of weeks back

00:39:39and it is the platform for running agents in the cloud.

00:39:43So we of course have been using agents

00:39:45to build Warp for a while.

00:39:48And some things we ran into were,

00:39:51it's really nice to use agents to author to code locally.

00:39:55But as soon as you get to like repetitive tasks

00:39:58or last mile tasks like code review,

00:40:01the agent doesn't follow you to those places.

00:40:03It stays on your machine and that feels a bit limiting.

00:40:07We're also hitting some parallelization issues

00:40:10of I could actually work on multiple backlog tickets

00:40:14or user feedback requests at the same time.

00:40:17And I don't really need to kick these off on my machine

00:40:20and monitor them.

00:40:21Agents have gotten good enough

00:40:22that I could ramp out a small feature request

00:40:24or a bug report and feel pretty confident at the end

00:40:27just looking at the code diff.

00:40:29So in those cases, like spinning stuff up locally

00:40:32doesn't make a lot of sense.

00:40:33We want this place where you could just run an agent

00:40:37and trigger it from anywhere.

00:40:39So I mentioned like opening up a pull request.

00:40:43Like there should be a way to just trigger an agent

00:40:44from a GitHub action, have it review the code just in time.

00:40:48If you're dealing with user feedback in Slack or linear,

00:40:51there should be a way to just tag an agent

00:40:53and trigger it that way.

00:40:54And then you review the pull request

00:40:56that gets linked on the other side.

00:40:58And then just general purpose stuff.

00:41:01Like we built our own sort of issue triage bot internally

00:41:06that can just go through all the warp issues on GitHub.

00:41:10And we wanted to build something that was like a two-way.

00:41:13So like a full application to go through all of our issues

00:41:15and just trigger agents from there.

00:41:17So we're like building our own mini warp

00:41:19that's focused on GitHub.

00:41:21And for that, you need like an SDK or a REST API.

00:41:24So you're like building an app that triggers an agent

00:41:27to do whatever it needs to do inside of that app

00:41:30and then just get updates and display them to the user.

00:41:33So that meant like having this whole surface

00:41:34of like REST API, SDK, Slack and linear triggers,

00:41:39GitHub action triggers, all that stuff.

00:41:41And the core of it being you have a sandbox

00:41:44for the agent runs, which is called environment.

00:41:47So all of that is rolled into what we're calling Oz,

00:41:50which has all of those things that I mentioned.

00:41:54It helps you set up environments to run agents

00:41:56not on your machine.

00:41:57So if I were to be on my phone, the dream scenario

00:42:01of like I get a message from someone,

00:42:03I want to implement this feature

00:42:04and I don't want to go to my computer.

00:42:06I should be able to just kick off an agent

00:42:09inside of that environment I've set up, have it do its work.

00:42:12And then I get a link to the pull request to look at.

00:42:16And then have another coding agent review that code

00:42:18in the pull request so I can like ship changes to that.

00:42:21Cause why not?

00:42:21So that's kind of what we built towards.

00:42:25So, and the reason we called it Oz,

00:42:27instead of like warp for the cloud,

00:42:30is first like make it its own thing.

00:42:33First impressions are sticky, warp is the terminal.

00:42:35So we need to have a different name for this concept

00:42:38because it is very unique.

00:42:39And also we want to make it really accessible

00:42:42even if you're not using the warp terminal.

00:42:44Like we have some niceties to tap into everything going on

00:42:49in these cloud runners from the warp terminal,

00:42:51but it's just a CLI.

00:42:52Like getting ghosty, I could say like Oz,

00:42:56tell me all of my scheduled jobs

00:42:58and it can run some CLI commands with the coding agent,

00:43:01go look at this cloud environment and then give me an answer.

00:43:04We also have a web UI.

00:43:05So you can go to like oz.warp.dev

00:43:07and you can look at all of your agents that are running again

00:43:10without like opening a terminal at all.

00:43:12So because it was this different thing

00:43:16that didn't really need like the warp terminal to work,

00:43:20we just kind of made it its own entity

00:43:22that the warp team uses internally to build everything.

00:43:25So it's very authentic to us.

00:43:26Like we built this because we had very real needs internally.

00:43:30And we also wanted to make it accessible enough

00:43:33that no matter what you're doing,

00:43:34if you're using like open code inside of anywhere

00:43:38or using the pie harness,

00:43:40you can still tap into this platform

00:43:42and trigger cloud agents, introspect what's going on

00:43:45and all of that stuff.

00:43:48So that's kind of like a high level.

00:43:50I don't know if there's anything that I could like tap into

00:43:52or explain a little bit more clearly.

00:43:54- I was sort of curious when you say

00:43:57I trigger off the agent in the cloud to do something,

00:43:59how can I still use tools like open code

00:44:02and cloud code in that agent?

00:44:04Does it just sort of trigger it

00:44:05on the sandbox environment out there?

00:44:08- Yeah, it's a good question.

00:44:09So the way environments work is very flexible

00:44:14because it's just set up as like a Docker file

00:44:18and whatever code you want to clone inside of there.

00:44:21So I've noticed some other tools like the Codex app

00:44:23and Cursor, may still be true for Cursor,

00:44:25need to check in,

00:44:27is like it taps to a single coding repository

00:44:30and spins up the environment around that.

00:44:32So it's very like one click, like boom,

00:44:34you have this GitHub repository in the cloud now.

00:44:36But it means if you want an agent to work on a task

00:44:39that touches a bunch of repositories,

00:44:41you can't really do that.

00:44:42Like you would need to trigger a different agent

00:44:44in each repository to do each part of the work.

00:44:48So we made it a lot different.

00:44:49Like we have environments internally

00:44:51that have like four or five repositories cloned inside of it,

00:44:54like the database schema and the server

00:44:56and the client and the docs.

00:44:58Like all of those are in the same environment.

00:45:00So if I ask an agent, I need to make this schema change.

00:45:03It could make PRs across all those environments all at once.

00:45:06And it just runs an agent generically across this code base

00:45:10in order to accomplish it.

00:45:11The only thing we're helping you with is triggering it

00:45:13to start working inside of the sandbox

00:45:16and also to get artifacts back out.

00:45:18So if it made a pull request,

00:45:20we can actually detect that

00:45:22and show you a link to the pull request

00:45:23instead of you hunting through the agent logs

00:45:26to figure that out.

00:45:27Now, the question you were saying about,

00:45:29can I run open code with this?

00:45:31The answer is yes, actually,

00:45:33but we want to make it a lot cleaner.

00:45:35So right now, like the turnkey,

00:45:37if you read the docs and you do it,

00:45:39it's going to be using Warps agent.

00:45:41So you're going to like set up some cloud credits.

00:45:43You're going to use the Warp agent.

00:45:44You can pick whatever model you want

00:45:45and all of your permission models.

00:45:48So you could flip it to any of the models

00:45:50that you would expect to support.

00:45:51Like if you want to use Opus versus codecs, you can do that.

00:45:55But if you want to switch out the harness,

00:45:56that involves like cloning the CLI

00:46:01into the Docker environment.

00:46:02So you like add a little installation instruction

00:46:04of install cloud code here or install open code.

00:46:08And then when the environment spun up,

00:46:09now open code's available.

00:46:11And at least today, you could tell the agent,

00:46:13delegate everything to open code

00:46:15and then all the compute runs through that instead.

00:46:17So you can do that.

00:46:19We would like to make it a little bit cleaner though.

00:46:21Or if you just say like my preference is to use open code

00:46:24and I don't even want delegation.

00:46:25Like that's something that we could explore.

00:46:27We've also played with like being able to delegate

00:46:30to multiple harnesses.

00:46:31So like I want the docs to be updated by cloud code

00:46:35and I want the code to be updated by the codecs harness.

00:46:37Like you can actually do that.

00:46:39And then it spins them both up in parallel,

00:46:40watches the results and then reports the artifact back out.

00:46:43So you can kind of stitch these things together

00:46:45in really creative ways.

00:46:47It's not tied to specifically using like warp

00:46:50for the full agentic flow end to end.

00:46:53And we do have like open source resources

00:46:56for this right now.

00:46:57But we are trying to make that like even cleaner.

00:46:59So this could be like a general purpose sandbox.

00:47:02And really we're just giving you the tools

00:47:03to set up those environments and inspect

00:47:06and manage everything the agents are doing when they're done.

00:47:09- Oh yeah, that sounds good.

00:47:10It's answers to my question 'cause I think I'm still

00:47:12in the stage at the moment, as I said,

00:47:14where I bounced between a lot of the tools,

00:47:16testing them out.

00:47:16I mean, mainly 'cause uniquely we make a lot

00:47:19of YouTube videos on these types of things.

00:47:20So I'm installing sort of the newest latest thing

00:47:23that everyone's talking about every week.

00:47:25And I was curious if that would fit into the workflow.

00:47:27- Yeah, totally.

00:47:28And I'll send some stuff along to you

00:47:30because I don't think we've had anyone testing

00:47:32those sorts of flows yet outside of our internal team.

00:47:35So I wanna see that get used.

00:47:38'Cause yeah, there's so many harnesses right now.

00:47:39Like Pi has come on my radar.

00:47:41It's like everyone's talking about this.

00:47:43I'm surprised how quickly that caught on Steam.

00:47:47But isn't that like a general purpose harness as well?

00:47:49Like you could bring whatever model you want to it

00:47:51and it just lets you extend it.

00:47:53- I think so.

00:47:54Didn't OpenClaw use a bit of Pi or something?

00:47:57So obviously OpenClaw got trending massively.

00:47:59And then, yeah, I need to check out Pi as well.

00:48:03I've been meaning to for a while.

00:48:04- Yeah, I know it's the harness that's like the default

00:48:08if you use OpenClaw now.

00:48:09And it's the one that Peter,

00:48:11the creative OpenClaw uses to work on code.

00:48:14But I know it's a codex user, so that has me guessing

00:48:17it's very general purpose and pluggable.

00:48:21But we are entering this age of like,

00:48:23I don't know, the harness doesn't matter as much

00:48:26as it used to.

00:48:27You can kind of use whatever harness you want

00:48:29to talk to these things.

00:48:30Like the difference between asking warp a question

00:48:33through codex and asking open code a question through codex

00:48:37is really small difference.

00:48:39Like we could argue we have a slightly better system prompt

00:48:42and that's why it gets 1% better on benchmarks.

00:48:45But I don't know.

00:48:47I feel like the model matters a lot more than the harness.

00:48:51It's really just come down to like pricing models, honestly,

00:48:55of like why you pick one harness over another.

00:48:57And some of it's the extensibility.

00:48:59I've heard people saying I like Pi

00:49:01'cause I can like change the interface that I'm looking at

00:49:03and like add a diff view here and whatever.

00:49:05So it became this like Lego set for people to build around,

00:49:08which is very cool.

00:49:10- I'm curious if you ever tried OpenClaw

00:49:12when that was trending.

00:49:14- I still haven't set up an OpenClaw box.

00:49:18I still haven't set one up.

00:49:19Oz is kind of like,

00:49:23'cause we were actually playing with internally,

00:49:24could you put OpenClaw in one of these Oz sandboxes

00:49:28and have it run?

00:49:28And the answer is like, yes,

00:49:30but we set these things up to be serverless, quote unquote.

00:49:34Like once it's done with the task,

00:49:35even if it takes hours, it spins down.

00:49:37OpenClaw is supposed to be like this always listening

00:49:40assistant.

00:49:41So it's like, yeah, you could have OpenClaw run

00:49:44for a period of time.

00:49:46But the idea of like having a server

00:49:47I can talk to all the time, I mean, it's cool.

00:49:50Do you use it?

00:49:51And what do you use it for?

00:49:52- I tried it when it was trending on just seeing

00:49:54if I could make it do some of my workflows

00:49:57sort of on its own, like researching on Twitter,

00:49:59scrolling through Twitter and tweets

00:50:02and seeing opinions on things.

00:50:03'Cause yeah, Twitter search API is not great.

00:50:06So I wanted something to replace that

00:50:07and just sort of researching a load of things on the webs,

00:50:10running arbitrary scripts that I'd ask it for.

00:50:13But yeah, I used it when it was a massive security nightmare.

00:50:16So I quickly spun it down because I was like,

00:50:18I don't want to deal with any of this anymore.

00:50:20And I have seen on Twitter, obviously a lot of people say,

00:50:24you get advertised that OpenClaw is like this magic solution,

00:50:27but the people who have really good workflows with OpenClaw

00:50:30have done a lot behind the scenes

00:50:31to actually get those workflows working well for them,

00:50:34whether that's prompting, setting up scripts,

00:50:36automations, everything.

00:50:37So yeah, I like it, but I don't think it was the initial

00:50:40like one click install and then it's this magic box

00:50:43that can do everything.

00:50:45It takes a lot more work and sort of overlooking

00:50:47and handholding before you get it perfect.

00:50:50- It's good to know.

00:50:51I mean, that checks out with how all these agenda tools

00:50:54have really worked for me.

00:50:57Like even just working outside projects,

00:50:58I still have to really think hard

00:51:00about how I'm instructing codecs and reviewing its code.

00:51:02It's not as simple as I ask

00:51:04and the app looks exactly how I want it to look.

00:51:07I do need to try it though, OpenClaw specifically.

00:51:12'Cause we do have some people internally that use it a lot.

00:51:15We have more and more Slack bots that are set up

00:51:17for like competitive analysis and all these things.

00:51:22For jobs like that where it's like do a weekly research

00:51:24report on X, not X.com, but could be.

00:51:28It's really nice to use one of these like cloud agent tools

00:51:33like Oz where it just kind of spins up once a week.

00:51:36It computes for like 10 minutes,

00:51:37does a bunch of web searches and then spins down

00:51:39and then gives you a report.

00:51:41For stuff like that, it's really nice.

00:51:44So I think for like proactive agents

00:51:46that like do things for you,

00:51:48I think these sandbox tools are gonna be a lot more relevant.

00:51:51The ones that are always interactive

00:51:55is gonna be a different story.

00:51:57Like the fact that OpenClaw, I can text it at any moment

00:51:59and it would start doing work is kind of like this next step

00:52:03that I haven't even taken.

00:52:04I don't think a lot of developers have taken either.

00:52:06But I see it coming, whatever it's gonna end up being.

00:52:11Unless we figure out the security.

00:52:13- My main selling points of OpenClaw was the fact

00:52:15that you can sort of run it on your own server

00:52:17and have whatever you wanted on there

00:52:18because I just put in like a Ubuntu install.

00:52:21But yeah, I'm not at the point

00:52:22I'd put it on my personal computer yet.

00:52:23That is a security nightmare for now.

00:52:27- Yeah, I don't think it was ever meant to.

00:52:29People were buying Mac Minis for that one, which was crazy.

00:52:32- Considering it.

00:52:32- I was gonna say, back to your point about harnesses

00:52:37and that they didn't make a difference.

00:52:40I think a while back Anthropic made it so

00:52:42that you had to use Claude's code

00:52:44to use your Pro or Mac subscription.

00:52:46And I think Warp has always offered Sonnet, Opus

00:52:50and everything else that Claude,

00:52:52the Claude models that Anthropic have.

00:52:54How have you been able to like make money from those models

00:52:57even though they're so expensive to run?

00:53:00- Yeah, it's tough with like the what's going on

00:53:03with price subsidization and all of that

00:53:05because it's become very apparent

00:53:10that all these model providers are losing a lot of money

00:53:14just with the amount of compute

00:53:15that they're pouring over developers.

00:53:16It's usually like the power users

00:53:19that create the biggest offset.

00:53:22Like I heard it was like power users using 2000 a month

00:53:25and inference and paying 200 and that's why

00:53:27they were sort of stripping back who can access those plans.

00:53:31I think that makes a ton of sense.

00:53:32And I think people are aware like Warp changed

00:53:36our own pricing model from a deeply subsidized one

00:53:39to one that's sustainable.

00:53:41So at least for us, like we've priced it where

00:53:43there's a bit of advantage to like paying a 20 a month

00:53:47instead of just paying for API keys.

00:53:49But it's not at the level of we lose 10 X on our users.

00:53:53It's at a level of we can break even,

00:53:56maybe make like a little profit.

00:53:58We're not trying to go crazy.

00:53:59It's a balancing of the scales.

00:54:01But we have just tried to put it in a place

00:54:05where we're not going after something

00:54:08that will just run out in like a year or so

00:54:12while other companies have positioned themselves

00:54:14where they don't do that.

00:54:16So it's really just a difference of what's practical

00:54:19and what you can do.

00:54:20And when you're in that position,

00:54:22like that's why we're really open to people using Clog Code

00:54:26and Open Code and other things inside of Warp

00:54:28and us making that a really nice experience

00:54:30and focusing more on how can we add features

00:54:33that are valuable if you're doing that,

00:54:34like a diff view or a file explorer.

00:54:37How can we help you put those things in the cloud

00:54:39with Cloud Runners?

00:54:40So we really just become like a helper

00:54:42for environment setup, compute, management, artifacts,

00:54:44all that stuff.

00:54:45And if you want to use Warp's built-in agent,

00:54:48like as I mentioned, the pricing is like

00:54:50a fair, sustainable thing,

00:54:52but it gives you like multi-model support

00:54:54and just being able to ask a question in your terminal

00:54:56without opening a CLI is convenient.

00:54:59So people use it for like certain classes of tasks,

00:55:02even if they don't use it for everything.

00:55:05So we had that balance going

00:55:06that I feel like makes a lot of sense.

00:55:08It's just leaning into what people are using and listening

00:55:11and just making it a little bit nicer.

00:55:13- How does the billing of Oswald,

00:55:15is it based on how long a task takes in the environment?

00:55:18- Yeah, I need to make sure on that.

00:55:20I know that we're doing it with like cloud credits

00:55:22because the blessed path is using Warp's harness

00:55:26to do everything.

00:55:27The like using other coding agents

00:55:30is like a chapter we're exploring

00:55:32and is possible using a Docker file.

00:55:35So I feel like we're working out what that model would be

00:55:38if that's kind of like the way people start using it.

00:55:42I want to say it's based on compute,

00:55:44but I'll make sure on whether it's like timer,

00:55:47compute based.

00:55:49It's whatever makes the most sense.

00:55:52- What kind of things do you have for Warp

00:55:55coming in the future that you can talk about?

00:55:57- Oz is kind of the next chapter that we're playing with,

00:56:01making it nicer to use.

00:56:02I do think multi harness is kind of the future.

00:56:08I think orchestration is the future.

00:56:10We put out a little teaser from Zach's account or CEO,

00:56:14if you go find it on Twitter,

00:56:16where we're playing with a slash orchestrate command.

00:56:19And it's pretty neat.

00:56:20Like if you do just this magical, it's not really a skill,

00:56:24it's kind of like a prompt.

00:56:26It will ask the agent to work on a task,

00:56:29figure out a delegation plan.

00:56:30So not just like an implementation plan,

00:56:32but actually here's how I would divide it up.

00:56:35Here's who could work on each thing.

00:56:37So it's almost becoming a product manager

00:56:39where you give it a feature it needs to build.

00:56:42It figures out here are the sub-agents I'm going to kick off.

00:56:45And then it creates all those sub-agents

00:56:48and through message passing,

00:56:49it's able to talk to those sub-agents while they work.

00:56:52Kind of like coworkers talking to their manager on Slack

00:56:55as they're doing stuff or even talking to each other.

00:56:57So it's a very early experiment.

00:57:00We want to evaluate like if agents have better success

00:57:05by delegating to sub-agents versus just doing it all itself.

00:57:09Like I think that's a really big divide right now

00:57:12of agent swarms, agent teams, all this stuff.

00:57:14Is that beneficial versus like the quality of output?

00:57:18But this is the first example I've seen in a while

00:57:21that's really like here's a turnkey solution

00:57:24to just have an agent delegate everything.

00:57:27And also here's like standard,

00:57:30well, not really like an open center or anything yet,

00:57:32but like a standard for message passing

00:57:35where like while a sub-agent's working on something,

00:57:37it can actually message out this thing went wrong

00:57:40and I don't understand this.

00:57:41And the main agent could actually pick that up

00:57:43and say, oh, that makes sense.

00:57:44Let me research.

00:57:45Here's an answer.

00:57:46And it can actually like sort of ping back and forth.

00:57:50It can also like when an agent's done,

00:57:53it can tell the main agent I finished my task.

00:57:56Then the main agent can tell all the other agents,

00:57:58hey, this guy's done with this part of the task.

00:58:00If you want to merge that in or whatever, you can do that.

00:58:03So it's this really weird world

00:58:05of now you don't have a human

00:58:06that's managing agent communication.

00:58:09Now agents are managing their own communication strategies.

00:58:13I don't know how I feel about that.

00:58:14Like how much of us are we going to replace at this point?

00:58:17But if it leads to higher quality

00:58:20by dividing things up that way,

00:58:22which in the workplace it does.

00:58:24So I feel like that could scale down to agents.

00:58:26That's kind of the next chapter

00:58:27that we and a bunch of other people are looking at

00:58:30is how do you orchestrate this stuff?

00:58:32- When you said using multi harness,

00:58:34is that like using various harnesses for one task

00:58:39or using different harnesses entirely?

00:58:43- Oh yeah.

00:58:44So the way that we've set up right now is like,

00:58:47and it doesn't have to work this way,

00:58:50but like each agent is working independently on a task

00:58:54and they could be a separate model,

00:58:55but they're all using the warp harness in our testing

00:58:57that we've done so far.

00:58:59There's no reason it has to be though.

00:59:01Like we've played with, especially with like OZ cloud renders

00:59:04like delegate this to the cloud code harness,

00:59:06delegate this to the codex harness

00:59:08and then give it the same, we could in theory,

00:59:10give it the same like message passing

00:59:12to sort of message back from those harnesses instead.

00:59:16But as I kind of mentioned earlier,

00:59:18I don't expect it to be this huge difference in quality

00:59:21'cause the model is like 90% of the difference.

00:59:24And the harness is kind of like the last 10%

00:59:26that you can experiment with and get some gains.

00:59:28So I feel like that'll be the step two of like,

00:59:30oh, if we mix up harnesses as well as models,

00:59:34does that lead to interesting results?

00:59:37So all of this is like experimental phase.

00:59:40Like can we evaluate this and actually benchmark it

00:59:42and figure out what the best deployment strategy is?

00:59:46- Yeah, I was gonna say,

00:59:47I think it'd be interesting to find out your results

00:59:49from doing orchestration or doing multiple sub-agents

00:59:52because I read an article somewhere from Quid Mission

00:59:54last year who said they don't recommend having multi-agents

00:59:59purely because the sub-agents don't have the same context

01:00:02as the main agent.

01:00:03And so it won't produce the results in the same way.

01:00:06And so trying to kind of merge all those results

01:00:09in different agents together might not work as well

01:00:11as if you have the same agent doing all the tasks,

01:00:13but it would be good to know what you guys find out.

01:00:15- Yeah, and I feel like that came out

01:00:17in the previous sub-agent architecture,

01:00:20which was very hands-off.

01:00:23It's kind of like if a product manager

01:00:24set up the linear board,

01:00:26everyone worked on their ticket

01:00:27and never talked to each other ever again

01:00:29until all the PRs emerged.

01:00:31Like, yeah, I would think in a workplace

01:00:33where no one talks to each other,

01:00:35you wouldn't get great results.

01:00:37But because we've added this message-passing ability

01:00:41where people could talk to each other,

01:00:43we've also made sure there's a plan

01:00:45so every agent can see this delegation document

01:00:48that live updates.

01:00:49The main agent can update this document

01:00:51and everyone can go read it.

01:00:53Now we're adding more context passing

01:00:55where they can pass necessary context to each other

01:00:58and also agree on what everyone is working on,

01:01:01which I think is a much different test

01:01:03than the old sub-agent model of delegate and come back

01:01:07with no channel in between.

01:01:09- I was gonna switch the topics a bit

01:01:11and sort of ask how you're just personally coding

01:01:14for side projects.

01:01:15I've seen you've been working on a markdown editor recently,

01:01:17and I wonder sort of what tools do you use?

01:01:20Has it been mostly AI writing that code

01:01:22or sort of you hand-holding it?

01:01:24Or are you going sort of,

01:01:25I know some developers got a manual approach

01:01:27and sort of don't wanna use AI,

01:01:28so they can still do sort of have a bit of fun coding?

01:01:31- So you might be surprised.

01:01:33I use Warp to work on this.

01:01:35I know, I know.

01:01:37So, no, I definitely thought about like,

01:01:41should this project be an escape

01:01:43where this is my safe place to just write code manually

01:01:46and just kind of go that way.

01:01:47But quickly I remembered like the reason I didn't work on this

01:01:50is because the very nitty gritty details

01:01:53of working on text editing and the ProseMirror library,

01:01:56which is notoriously so hard to use.

01:01:59It's really hard.

01:02:01Even though I've used it for a long time,

01:02:03it's still very difficult.

01:02:05I thought, yeah, coding agent will actually push me through

01:02:07and let me focus with like a more balanced approach

01:02:10on design and development.

01:02:12'Cause normally working on side products,

01:02:14it was like 10% in the design space, which I love.

01:02:16I still like opening Figma and designing everything myself.

01:02:20And then 90% was trying to figure out the implementation.

01:02:23Now it's more 50/50 or even less,

01:02:25which is a much nicer balance for me.

01:02:28So I designed a lot of things in Figma still.

01:02:31I know some people just code it all out,

01:02:33never use Figma again.

01:02:34I still like to be able to like draw out the gradients

01:02:37and the shadows and make it feel the way that I want to.

01:02:41But from there, yeah,

01:02:42I delegate everything to Codex right now.

01:02:44I mentioned earlier,

01:02:46like Codex is the more reasoned developer

01:02:49and Opus is the more jump to a solution developer.

01:02:52That's more me.

01:02:54So I want someone that can balance out my crazy.

01:02:56And I feel like Codex is that model of like,

01:02:58this one will actually care about

01:02:59how this thing's architected more than me.

01:03:02So I should probably balance with that.

01:03:05So a lot of it's just Codex tasks.

01:03:06I have like, I don't even use work trees.

01:03:08I have two clones of the repo and I just hop between them.

01:03:11And I can't really do more than two at the moment

01:03:14just because the project's so early

01:03:16that I can't delegate huge swaths of work

01:03:18without agents stepping on each other.

01:03:20So I just have like two agents that work on stuff.

01:03:23And I switched between the dev servers to look at the output.

01:03:26I have used cloud agents a bit just to like

01:03:30do a research task and then pull it down locally.

01:03:33So in Oz, if you like spin up a cloud agent,

01:03:35it'll like clone that, get a repository, do some work.

01:03:38Then there's a fork locally button

01:03:40where I can pull it all down and then resume.

01:03:42So I started doing that a bit for like,

01:03:44I want to research how like the popover API works.

01:03:48So I can create like a nice hover

01:03:51whenever you're over a hyperlink.

01:03:53So I just kicked off a cloud agent, go research that,

01:03:55get the libraries, get an initial implementation.

01:03:57I'll pull it down and get it done.

01:03:59That way I don't have to make another clone

01:04:00or another work tree.

01:04:03It's really what I wouldn't make of it,

01:04:04but that's really been my workflow.

01:04:07And I do dive a lot into the code review,

01:04:10especially because in text editing,

01:04:12codecs isn't, it's not a solved problem.

01:04:15Like codecs still kind of struggles

01:04:16and messes things up in ProseMirror.

01:04:18And I have to ask it questions about like,

01:04:21why did you make this choice?

01:04:22What is the limitation we're working around here?

01:04:25Because this is another prompting tip.

01:04:29Don't tell it, why are you so dumb?

01:04:31I know best.

01:04:32Don't just ask it, why did you make that choice?

01:04:34And then it will tell you,

01:04:35oh, it's because of this library that I read about,

01:04:38or there's this edge case elsewhere in the code base

01:04:40that I had to work around.

01:04:41Like you have to pull information out of a model

01:04:43to actually figure out when it's doing wrong.

01:04:46That's the strategy I use of like staying in the loop,

01:04:48ask it questions if things look weird,

01:04:51otherwise ask it to like review its code

01:04:53and merge it automatically.

01:04:55And I just push everything to the main line when it's done,

01:04:57because it's just purely local for me right now.

01:05:01Though it's not really like a team level strategy.

01:05:05It's more of like a solo strategy.

01:05:07Push things to main, clone the repo a couple of times,

01:05:10use one coding model, maybe use a cloud sandbox

01:05:13if you wanna do something off of your machine,

01:05:15but you only need to, and that's mostly it.

01:05:19So I'm also live streaming this process by the way,

01:05:23on Twitch Tuesday mornings,

01:05:25just to be a little bit in public

01:05:27while I'm working on this stuff.

01:05:29- I'm gonna ask you a question that I ask

01:05:31most guests when I come, but do you have any hot takes?

01:05:34- I feel like I've already said some.

01:05:38I don't know.

01:05:42I do think work trees are a bit overrated,

01:05:44but I think it's because I haven't tried them enough

01:05:46or I had a bad experience with them

01:05:48and I need to try it I guess.

01:05:49Kind of cold take is review your code.

01:05:54These agents aren't good enough to just like merge right away

01:05:57and you will feel the pain later.

01:06:00- What about like code rabbit and reptile?

01:06:03Do you not trust those to review code?

01:06:06- 'Cause it's a funny relationship

01:06:09of like why do we also need a code review on the backend

01:06:13if the coding agent already wrote the code?

01:06:15And I mean, it's the same thing.

01:06:21I hate comparing agents to humans so much,

01:06:25but it is trained on how we work.

01:06:27So it is gonna do some similar stuff.

01:06:30And for me, I only catch issues when I've actually,

01:06:34when I'm about to hit the button

01:06:36of ask this person for review.

01:06:38Like before that point, my code, I'm like, this is fine.

01:06:41Then as soon as I'm about to hit the button

01:06:42of I need to request a review from this senior engineer,

01:06:45I'm like, maybe I should look at it again.

01:06:48Maybe just like once.

01:06:49And then I catch things all the time.

01:06:51'Cause I feel like these agents are modeled

01:06:54to like do the bare minimum to accomplish the task

01:06:57with just enough quality to say mission accomplished.

01:07:01Like that's how they're kind of trained to do things.

01:07:03But that doesn't include,

01:07:05did I simplify all the stacks surrounding it?

01:07:07Did I look for potential for abstractions

01:07:09that are outside of all the code that I researched?

01:07:12Like it doesn't do all of that stuff because if it did,

01:07:16then the iteration loop would be a lot longer.

01:07:18And so I'm sure some of the way these models are trained is,

01:07:22it's going to be a lot more obnoxious to use

01:07:24if we bake in all of this self-reflection

01:07:27and self-correction into it.

01:07:30And so they don't.

01:07:31And we need to have tools like code review on the backend

01:07:34or skills to simplify your code

01:07:36in order to compensate for that.

01:07:38So it feels like, yeah,

01:07:40we need code review on the other side

01:07:41in order to ensure that.

01:07:43Now, what does that mean for,

01:07:47like do humans need to review code in the longterm?

01:07:50I feel like there might come a time when we don't,

01:07:55and I don't know what that's going to look like yet.

01:07:57I don't know if that means like we need different models

01:08:00as we were talking about earlier.

01:08:01Like we need a model that thinks differently

01:08:02to review the code so that we get a better mix of opinions.

01:08:05Maybe that's part of it.

01:08:06Maybe agents are able to self-merse their PR

01:08:10if another agent reviewed and they address the comments.

01:08:13Maybe, I don't know.

01:08:15It feels wrong, obviously,

01:08:17because that's the final stand that a human has right now

01:08:22is at least I'm involved at the gate of letting things in.

01:08:26So I feel like it makes sense why the code review is required

01:08:30because it's not trained to do all this stuff right away.

01:08:33It can be accomplished either through code review

01:08:35or like I saw Cloud Code

01:08:37put out a /simplify command recently.

01:08:40I've been cooking up my own as a skill.

01:08:42It's really nice to have that sort of thing.

01:08:43Both of those are the same way to address the problem.

01:08:46Have it do multiple passes with a lens for get it work

01:08:50and then a lens for get it right.

01:08:51And it's really just a matter of training it in more.

01:08:56Maybe it's like a hook.

01:08:57Maybe it's baked straight into the post-training

01:08:59for certain reasoning models.

01:09:01Like maybe there's extra high reasoning

01:09:03that explicitly does that.

01:09:05But yeah, it is kind of funny

01:09:08'cause you were mentioning like code wrapping and reptile.

01:09:11Like we built our own with Oz as well

01:09:13where it just triggers a cloud agent as a GitHub action

01:09:16and reviews the code.

01:09:17And we open source the skill for that.

01:09:19If you wanna apply it to CodePilot or apply it to Oz,

01:09:21you can do whatever you want.

01:09:23You've kind of hit a point where it's like,

01:09:25yeah, code review is just one of countless places

01:09:28an agent could review the code.

01:09:30I feel like you could do it in GitHub.

01:09:34You could do it locally.

01:09:35You could do it wherever.

01:09:36And it's really just a matter

01:09:38of making a code review process that you like.

01:09:40And I do think code wrapping and reptile like super focus

01:09:44on this problem, which is so cool.

01:09:46But I do also think we're in a world now

01:09:48where you could build your own and it could run locally

01:09:51or it could run in the cloud.

01:09:53And you can build it to exactly your preferences

01:09:55at this point 'cause the models are so good.

01:09:57That's why we're leaning into like,

01:09:58yeah, we have a code review example,

01:10:02but you could also apply this anywhere else in the stack

01:10:05or write your own.

01:10:06Like at this point it doesn't really matter

01:10:08'cause the models are good enough

01:10:09that you could just do that.

01:10:11- Should we give Ben a chance

01:10:13to plug something if he wants?

01:10:15- Oh yeah.

01:10:16So what was you want to plug Ben?

01:10:19- What do I want to plug?

01:10:21Oz.dam, go over there cloud runners, all that stuff.

01:10:24Personal side, we were talking about like content

01:10:28and whiteboard videos, trying to be as open as possible

01:10:32with all the strategies to use agents

01:10:34to write code more effectively

01:10:36or just to be a good software developer in general.

01:10:38So if you're part of that community,

01:10:40I'm around Twitter, Blue Sky, YouTube,

01:10:44and a number of other places that be Holmes Devs.

01:10:47I'm sure there's a link for that.

01:10:48Holmes is in Sherlock Holmes.

01:10:50So if you ever want to come find me,

01:10:52ask follow up questions from this or anything else,

01:10:55I'm around.

01:10:56- Cool.

01:10:56I think you mentioned on the podcast,

01:10:57sorry, I was meant to wrap up,

01:10:59something about Dr. Who jokes, you're a Dr. Who fan?

01:11:02- Did I mention Dr. Who jokes?

01:11:04I mean, I am.

01:11:05Definitely back in high school, that was the big era.

01:11:08David Tennant era, Matt Smith era.

01:11:11That's what I grew up on.

01:11:14So yes.

01:11:16- Sure, but not anymore.

01:11:17- Yeah, I haven't tapped back in recently.

01:11:20I don't know.

01:11:21- Thanks for listening to this episode

01:11:22of the Better Stack Podcast.

01:11:24Find us wherever you listen to your podcast.

01:11:26So Apple Podcast, Spotify, we're there.

01:11:29And from me, it's goodbye.

01:11:31- Goodbye from me.

01:11:32- Goodbye from me.

01:11:33- Goodbye from me.

Key Takeaway

As AI agents become the primary authors of code, developer tools are evolving into agent-first GUI environments that prioritize orchestration, cloud-based sandboxing, and intuitive human-in-the-loop review processes.

Highlights

Ben Holmes discusses his transition from Astro to Warp and how his background in whiteboarding shorts shaped his developer relations career.

The debate between GUIs and TUIs in the AI era, with Ben arguing that native GUIs provide better interaction for complex tasks like code reviews.

Warp's new cloud platform, Oz, allows developers to trigger agents in sandboxed environments for parallelization and repetitive tasks like issue triage.

API design remains critical for AI agents because well-structured patterns reduce compute time and help agents navigate codebases more efficiently.

The concept of agent orchestration where a primary agent acts as a project manager to delegate sub-tasks to specialized sub-agents via message passing.

Future developer tools are shifting from IDE-centric workflows to 'agent-first' interfaces where editors and diff views become secondary debugging tools.

Model selection is becoming more about 'personality' and 'working style' rather than just chasing 1% improvements in benchmark scores.

Timeline

Introduction and the Art of Developer Content

The episode begins with Ben Holmes, the Developer Relations Lead at Warp, sharing his unique journey into the industry through pandemic-era content creation. He explains how using a whiteboard and a Rock Band microphone to explain JavaScript bundlers led him to a career maintaining the Astro framework. Ben emphasizes that there is a significant demand for 'intermediate' developer content that is visually engaging and focuses on problem-solving. This early experience taught him self-editing skills and how to identify what the developer community truly cares about in a fast-paced ecosystem. He highlights that being generally useful is the best way to build a reputation and get noticed by industry peers.

React Server Components and the Importance of Clear Boundaries

Ben dives into a technical discussion regarding React Server Components (RSC) and why they haven't been the universal solution many expected. He notes that the blurring of lines between server and client execution can lead to confusion for human developers who forget the implications of each environment. During his time at Astro, he advocated for clear boundaries through projects like Astro Actions and Server Islands, which provide type-safe data fetching without a complex runtime. These 'simple solutions' prioritize legibility and reviewability, making it easier for developers to deploy code anywhere without worrying about specific host dependencies. This section underscores the philosophy that explicit file organization is often superior to magic abstractions that hide the underlying execution logic.

The Impact of AI Agents on API Design and Code Quality

The conversation shifts to the 'existential crisis' triggered by the rapid advancement of AI models like Opus 4.5 and Sonnet 3.5. Ben argues that while agents can now handle more complex coding tasks, good API design is still vital because it reduces the compute cost and time an agent spends 'running in circles.' If an API is intuitive for a human, it is typically easier for an agent to navigate, leading to faster iterations and fewer hallucinations. He compares the importance of a clean codebase to 'cleaning your house before visitors arrive,' as agents struggle with disorganized 'rat's nests' of code. Ultimately, modern prompting strategies are essentially just good software engineering principles applied to a new interface.

Warp: Enhancing the Terminal for an Agentic World

Ben explains the vision behind Warp, describing it as a 'terminal plus plus' that integrates modern features like file explorers and diff views directly into the CLI. Unlike traditional terminals, Warp provides a native GUI that allows for progressive disclosure, meaning users can hide advanced tools until they need to review agent-written code. A key feature discussed is the ability to leave comments on specific lines within a diff view to trigger iterative loops with an AI agent. This environment allows developers to use various agent harnesses, such as Claude Code or Open Code, while benefiting from Warp's specialized UI. Ben notes that Warp focuses on the 'stuff around the harness' to make the transition from static to agent-driven development seamless.

Why GUIs Will Outlast TUIs for Complex AI Workflows

In a bold take on the future of developer tools, Ben suggests that while Text User Interfaces (TUIs) are trendy, native Graphical User Interfaces (GUIs) will eventually dominate for AI-driven tasks. He points out that the ability to click, scroll, and use expandable menus is more intuitive for reviewing deep changes than memorizing complex keyboard shortcuts. While 'keyboard warriors' may disagree, GUIs avoid the rendering flashes and limitations of terminal-based engines when displaying rich animations or complex data. The hosts discuss the new OpenAI 'Codecs' app as an example of a GUI that makes multi-agent interaction much more manageable than a standard terminal prompt. Ben believes we are moving toward an 'agent-first' world where the IDE as we know it might disappear in favor of debugging-focused GUIs.

Model Selection: Benchmarks vs. Working Style

The discussion moves to how developers choose between top-tier models like Claude Opus and GPT-based 'Codecs.' Ben observes that users are beginning to pick models based on their perceived 'personality' or 'voice' rather than purely looking at benchmark scores. He characterizes one model as being like a meticulous 'German engineer' and another as a 'grad student at 2 AM' who moves fast and writes human-readable code. Warp addresses this by offering an 'Auto Mode' with settings like 'Cost Efficient,' 'Responsive,' and 'Genius' to help users who don't want to track every model update. This section highlights that the industry is reaching a point of diminishing returns for benchmarks, making user experience and integration the new competitive frontiers.

Local vs. Cloud Agents and the Launch of Oz

The speakers discuss the limitations of local agents and why Warp is leaning into cloud-based execution through their new platform, Oz. Many developers find it frustrating to manage multiple API keys or wait for local machines to process heavy inference tasks, especially for open-source models. Ben addresses the common complaint about Warp's login requirement, explaining it is necessary for features like 'Warp Drive' command storage and cloud orchestration. He clarifies that Warp doesn't aim to compete with lightweight terminals like Ghosty, but rather provides a pre-configured environment for those who don't want to stitch together their own toolchain. This section clarifies the positioning of Warp as a collaborative, cloud-integrated workspace rather than just a simple terminal emulator.

Cloud Orchestration and Agent Swarms

Ben provides a deep dive into 'Oz,' Warp's platform for running agents in the cloud to solve the 'last mile' problem of software development. Oz allows for the parallelization of tasks, such as running agents across multiple repositories simultaneously to make sweeping schema changes. A major innovation mentioned is 'message passing,' where a lead agent can orchestrate sub-agents, receive updates on their progress, and resolve conflicts. This setup mirrors a human workplace where a project manager coordinates a team, leading to higher-quality outputs than a single agent working in isolation. This cloud-first approach also enables developers to trigger tasks from their phones and receive a finished Pull Request link later.

Security, Sustainable Pricing, and the Future of AI Coding

The dialogue touches on the security risks of autonomous agents, particularly trending tools like OpenClaw that can be a 'security nightmare' if not properly sandboxed. Ben explains how Warp transitioned to a sustainable pricing model to avoid the massive losses model providers face when subsidizing power users' compute. They also discuss 'slash orchestrate,' a future command that will allow the agent to generate its own delegation plan before beginning work. This proactive strategy focuses on 'getting it right' through multiple passes rather than just 'getting it done' in one go. Ben suggests that the 'harness' itself will eventually matter less than the quality of the underlying model and the orchestration strategy employed.

Personal Side Projects and Final Thoughts

In the closing segment, Ben shares insights into his personal workflow, including his work on a markdown editor using the difficult 'ProseMirror' library. He uses a 50/50 balance of AI and manual coding, often designing in Figma first and then delegating the implementation to the 'Codecs' model. He offers a 'cold take' for listeners: always review your code, as agents are currently trained to do the bare minimum to reach 'mission accomplished.' The episode wraps up with Ben's recommendations for following his live streams and his reflections on the importance of self-correction in the development process. He signs off with a brief nod to his love for Doctor Who, ending the podcast on a personal and friendly note.

Community Posts

The Era of AI Agents: Why You Must Ditch the Terminal (TUI) for a GUI

makedream28 days ago7250

Write about this video