00:00:00(upbeat music)
00:00:02- Welcome to the Future of AI Coding panel.
00:00:04Thank you for reading the memo
00:00:05that you have to wear all black.
00:00:07(laughing)
00:00:09Okay, so I do want to cover a little bit of introductions.
00:00:12I know each of you in different ways,
00:00:15but maybe the audience, hopefully, doesn't quite.
00:00:17Matan, why don't you go first?
00:00:19What is Factory's position
00:00:24to the broader world in AI coding?
00:00:26- Yeah, so at Factory,
00:00:28our mission is to bring autonomy to software engineering.
00:00:32And what that means more concretely,
00:00:34we have built end-to-end software development agents
00:00:37called droids.
00:00:38They don't just focus on the coding itself,
00:00:40but really the entire end-to-end
00:00:42software development lifecycle.
00:00:43So things like documentation, testing, review,
00:00:48kind of all the ugly parts so that you can also do
00:00:51the more fun parts like the coding itself.
00:00:52And for the parts of the coding you don't want to do,
00:00:54you can also have the droids do that.
00:00:56So you build droids.
00:00:58You build droids.
00:00:59And OpenAI obviously needs some introduction,
00:01:02but your role on the codecs team,
00:01:05I saw you pop up on the codecs video.
00:01:08That's how I knew it was you working on it.
00:01:10But how do you think about codecs these days
00:01:13since it's expanded a lot?
00:01:14- Yeah, so earlier this year,
00:01:16we launched our first coding agent.
00:01:19I worked on codec CLI,
00:01:21bringing the power of our reasoning models
00:01:23into people's computers.
00:01:26Then we released codecs cloud where you could actually
00:01:28distribute and delegate those tasks to work in the cloud.
00:01:31And over the last some odd months,
00:01:33we've been unifying these experiences.
00:01:34So they work as seamlessly as possible.
00:01:36So a lot of our focus is around how do we make
00:01:38the fundamentals, the primitives as useful as possible.
00:01:41We just released a dev day codecs SDK.
00:01:43So I think one of the key directions we've been seeing
00:01:46is not just using coding or code executing agents for coding,
00:01:50but also for general purpose tasks.
00:01:52And so whether it was try to be the agent,
00:01:54which I worked on earlier this year
00:01:55that actually executes code in the background
00:01:57to accomplish some tasks,
00:01:59but starting to enable our developers to build on top of
00:02:02not just the reasoning models,
00:02:04but also things like sandboxing
00:02:05and all the other primitives that we built into codecs.
00:02:07- Awesome.
00:02:09V0?
00:02:09- Yeah, the goal of V0 is to enable developers
00:02:14to do preview driven agentic programming.
00:02:16So today when you build web apps,
00:02:19you probably have an agent open,
00:02:21your IDE open, so some kind of code,
00:02:23and then a preview of what you're actually building.
00:02:25Usually you're running dev server.
00:02:26With V0, our goal is to allow you to just have
00:02:28an agent running and directly prompt against your running app.
00:02:32And that's how we think the future of DX is gonna pan out.
00:02:35- Okay, awesome.
00:02:36And everyone has different surface areas
00:02:38in which to access your coding agents.
00:02:40So I think one of the things we kinda wanna kick off with is
00:02:43how important is local versus cloud?
00:02:45You started local with cloud,
00:02:47you started cloud with local, you're cloud only for now.
00:02:50What's the split?
00:02:52Is everyone just gonna merge eventually?
00:02:55- Yeah, so maybe I can start there.
00:02:58So I think at the end of the day,
00:02:59the point of these agents is that
00:03:02they are as helpful as possible
00:03:04and they have a very similar silhouette
00:03:06to that of a human that you might work with.
00:03:08And you don't have local humans and remote humans
00:03:11that are like somehow, you know,
00:03:13this one only works in this environment,
00:03:15this one only works in that environment.
00:03:16Generally, humans can be helpful
00:03:18whether you're in a meeting with them
00:03:19and you come up with an idea
00:03:20or you're sitting like shoulder to shoulder at a computer.
00:03:24So I guess asymptotically, these need to become the same,
00:03:28but I think in the short term,
00:03:29remote is typically, what we're seeing is it's typically
00:03:34more useful for smaller tasks that you're more confident
00:03:37that you can delegate reliably.
00:03:39Whereas local is when you wanna be
00:03:41a little bit closer to the agent,
00:03:43it's maybe some larger task or some more complicated task
00:03:46that you're gonna kind of actively be monitoring.
00:03:49And you want it to be local so that if something goes wrong,
00:03:52you don't need to pull that branch back down
00:03:54and then start working on it,
00:03:55but instead you're right there to guide it.
00:03:57- Yeah, maybe I'm just greedy, but I want both.
00:04:00And I think having a modality to Matan's point
00:04:04where I like to think about what are the primary forms
00:04:07of collaboration that I'm used to
00:04:08and I enjoy with my coworkers.
00:04:11Often that starts something like a whiteboarding session
00:04:13and maybe we're just like jamming on something in a room.
00:04:17When we were building, I think a good example
00:04:19was agents.md, which is our custom instructions
00:04:23intended to be generic across different coding agents.
00:04:26The way that it started was Romain and I
00:04:28were just in a room coming up with this idea.
00:04:31Then we just started whiteboarding and then took a photo
00:04:33and then kicked it off in codec CLI locally,
00:04:36just like in a workshop at Next.js app that we could work on,
00:04:40went to lunch, came back.
00:04:41It had a good amount of the kind of core structure.
00:04:44And then from there, we were able to iterate
00:04:45a little bit more closely.
00:04:46So having that kind of pairing
00:04:48and kind of brainstorm style experience.
00:04:49And then I think to that second point
00:04:51about what kind of tasks you delegate to,
00:04:54I think historically smaller, monarily scoped tasks
00:04:57where you're very clear about what the output is,
00:05:00is kind of the right modality
00:05:01if you're doing a fire and forget.
00:05:02But I think what we're starting to see with,
00:05:04we just launched GBD5 codecs about two months ago now.
00:05:08And I think one of the main differences
00:05:09is that it can actually do these longer running,
00:05:11more complex, more ambiguous tasks,
00:05:14as long as you are clear about what you want by the end.
00:05:16So it can work for hours at a time.
00:05:18I think that shift as models increase in capability
00:05:21will start to enable more kind of use cases.
00:05:24- Yeah.
00:05:24Yeah, I think there are three parts of making an agent work.
00:05:27There's the actual agent loop,
00:05:29there are the tool calls it makes,
00:05:30and then the resources upon which the tool calls need to act.
00:05:33Whether you go cloud or local first
00:05:35is based on where those resources are, right?
00:05:37If you're trying to work on a local file system,
00:05:39those are the resources you need to access.
00:05:41It totally makes sense
00:05:42that your agent loop should run locally, right?
00:05:44If you're accessing resources that typically exist in the cloud
00:05:46you're pulling from GitHub,
00:05:47directly from like third party repo of some kind,
00:05:51then it makes sense for your agent
00:05:52to start off in the cloud, right?
00:05:54Ultimately though, these resources exist in both places, right?
00:05:57Every developer expects an agent to be able to work
00:06:00both on the local file system,
00:06:02as well as on an open PR that might be hosted on GitHub.
00:06:04And so it doesn't really matter where you start, I think,
00:06:07everyone is converging at the same place,
00:06:08which is that your agent loop needs to be able to run anywhere,
00:06:11your tool calls need to be able to be streamed
00:06:13from the cloud locally or from a local backup to the cloud.
00:06:16And then it all depends on where the resources
00:06:18you actually want to act on are located.
00:06:20- Yeah, awesome.
00:06:22Okay, so we were chatting off stage
00:06:24and we were casting around for spicy questions and stuff.
00:06:27So I really liked this one and I think it's very topical.
00:06:31Do you guys generate slop as a living?
00:06:33Like are we in danger of potentially being in a hype bubble
00:06:40where we believe that this is like a sustainable path to AGI?
00:06:44- I mean, I think to start, you could say that one man's slop
00:06:48is another man's treasure, which to some extent might be true.
00:06:52Like, you know, if for example, you have, I don't know,
00:06:56like let's suppose you had a repo
00:06:58that had no documentation whatsoever.
00:07:00You could use, you know, many of the tools
00:07:04that we've been talking about to go and generate
00:07:06documentation for this repo.
00:07:08Now, is it gonna be the most like finely crafted
00:07:12piece of documentation?
00:07:13No, but is it providing alpha?
00:07:16Yes, in my mind, because having to like sift through
00:07:19some super old legacy code base that has no docs
00:07:22is a lot harder than looking through
00:07:23some somewhat sloppified documentation.
00:07:26And so I think the big thing is it's figuring out
00:07:29where you can use these tools for leverage
00:07:32and the degree to which it's slop,
00:07:35I think also kind of depends on how much guidance you provide.
00:07:38So if you just say like, build me an app that does this,
00:07:40like you're probably gonna get some generic slop app
00:07:43that does--
00:07:44- It's purple.
00:07:44- Yeah, blue, purple like fade, yeah.
00:07:48Whereas if instead you're like very methodical
00:07:50about exactly what it is that you want,
00:07:52you provided the tools to actually run tests
00:07:54to verify some of the capabilities that you're requesting.
00:07:58I think that makes it much more structured
00:08:00to a similar extent that if you were to, you know,
00:08:03hire some junior engineer onto your team
00:08:06and you just say, hey, go do this.
00:08:08Like they're probably gonna yield some like median outcome
00:08:11because they have no other specification to go off of.
00:08:14And it's pretty ambiguous like what you actually want done.
00:08:19- I think the key word there is leverage, right?
00:08:21Like what AI coding agents allow you to do
00:08:23is do 10X more than you would be able to do yourself
00:08:25with a pretty high floor, right?
00:08:27So if you plot skill level against how useful an agent is
00:08:30or how likely it is, you know,
00:08:31how useful it actually is in generating non-slop,
00:08:33there's probably a like pretty low floor
00:08:35if you have no skill.
00:08:36You have a pretty high floor still, right?
00:08:38Agents are pretty good just out of the box.
00:08:39If you don't know anything about development,
00:08:41the agent is gonna do much more than you could possibly do.
00:08:44But as you get to higher and higher skill levels,
00:08:46senior and principal and distinguished engineers
00:08:48actually use agents differently.
00:08:50They're using it to level up
00:08:51the things they could already do.
00:08:53You know, a principal engineer might be able to
00:08:55write manually 5,000 lines of code a day.
00:08:57With agents, they can write like 50,000 lines of code a day.
00:09:00And it really operates at the level of quality of the inputs
00:09:03and the knowledge that you put in there.
00:09:04So I think we're, you know, slowly raising the floor
00:09:07over time by, you know, building better agents.
00:09:11But I do think it's a form of leverage.
00:09:14It's a way for you to accelerate
00:09:16the kinds of things you can already do, do them faster.
00:09:18And for folks who don't have skills, you know,
00:09:20that's when you can actually really raise the floor
00:09:22of what it can be do.
00:09:23- Absolutely, and just to add on to both of these points,
00:09:26I think they're tools and amplifiers of craft.
00:09:29If you have it, you can do more of it.
00:09:31If you don't, it is just harder,
00:09:32but it does raise the floor.
00:09:34I think that's really worth calling out.
00:09:36I think for folks who are just trying
00:09:39to build their first prototype,
00:09:40they're trying to iterate an idea
00:09:42that example was mentioning earlier.
00:09:44It's not that like I couldn't make a front end
00:09:47that kind of is like a content-driven site,
00:09:50but I just didn't have time.
00:09:51And it was more fun to just draw on a whiteboard,
00:09:53talk, have a conversation, and then kick it off to an agent.
00:09:57But I think one of the interesting examples of this
00:09:58was when we were building much earlier iterations of codecs
00:10:01and well over a year ago.
00:10:03And we were putting in front of two different archetypes,
00:10:05folks who did a lot of product engineering
00:10:08where they're used to using local,
00:10:12in the inner loop style tools
00:10:14where they're used to just chatting and maybe iterating.
00:10:18And then a completely different modality
00:10:20when we talk to folks on the reasoning teams
00:10:23where they would sit for maybe five minutes
00:10:25just defining the task and have an essay length,
00:10:29like word problem for the agent to go off and do,
00:10:32and then it would work for an hour.
00:10:33And that was effectively 01 or earlier kind of versions of it.
00:10:37And I think the interesting part there
00:10:39was just the way that people would approach
00:10:41giving the task to the agent was completely different
00:10:44based on their understanding of what do they think it needs.
00:10:48And so I think really anchoring on specificity,
00:10:52being really clear about what you want the output to be.
00:10:55And I think there's a broader item
00:10:56that is a responsibility on both us as builders of agents
00:11:00and folks training models to really raise that floor
00:11:04and to ensure that the ceiling
00:11:06for people with high craftsmanship, with high taste
00:11:08are able to exercise that in the way that they see fit.
00:11:11- I think actually something that you've mentioned
00:11:13brought this idea to mind that we've started to notice.
00:11:16So our target audience is the enterprise.
00:11:19And something that we've seen occur time and again
00:11:21is that there's a very interesting bimodality
00:11:24in terms of adoption of agent native development.
00:11:28And in particular, normally earlier in career developers
00:11:32are more open-minded to start building
00:11:34in an agent native way,
00:11:36but they don't have the experience
00:11:38of managing engineering teams.
00:11:39So they're maybe not the most familiar with delegation
00:11:42in a way that works very well.
00:11:44Meanwhile, more experienced engineers
00:11:46have a lot of experience delegating.
00:11:47They know that, hey, if I don't specify these exact things,
00:11:50it won't get done.
00:11:51And so they're really good at like writing out that paragraph,
00:11:54but they're pretty stubborn
00:11:56and they actually don't wanna change the way that they build
00:11:59and you're gonna have to pry Emacs
00:12:01out of their cold dead hands.
00:12:03So it's an interesting balance there.
00:12:05- So funny you say that.
00:12:06Similar thing we've seen on the enterprise
00:12:08is senior engineers, higher up folks will write tickets.
00:12:12So they'll actually do the work
00:12:13of writing out all the spec of what needs to be done.
00:12:16They'll hand it off to a junior engineer to actually do.
00:12:18The junior engineer takes that super well-written ticket
00:12:20and gives it to the agent to do, right?
00:12:21So you're just arbitraging the idea
00:12:23that the junior engineer will actually do the agent work
00:12:26because they're more comfortable doing that.
00:12:28But the senior engineer is the person
00:12:29who's actually really good at writing the spec,
00:12:31very good at understanding
00:12:32what are the architectural decisions we should be making
00:12:35and putting that into some kind of ticket.
00:12:37- Yeah, for those who don't know,
00:12:40Matan and factory in general have been writing
00:12:42and advocating about the age of native development.
00:12:44So you can read more on their website.
00:12:45I think one thing, by the way,
00:12:48I do wanna issue maybe like one terminology thing,
00:12:51which is raise the floor for you is a good thing.
00:12:54I think actually other people say lower the floor
00:12:55also mean the same thing.
00:12:57Basically just like it's about skill level
00:12:59and like what they can do
00:13:00and just giving people more resources for that.
00:13:05I think also the other thing is like,
00:13:07a lot of people are thinking about the model layer, right?
00:13:13Obviously you guys own your own models, the two of you don't.
00:13:18And I think there's a hot topic of conversation
00:13:21in the value right now.
00:13:22Airbnb, Brian Chesky has said that
00:13:25like most of the value was like relies on Quinn apparently.
00:13:28How important is open models to you guys
00:13:30and you can, for what you can chime in as well,
00:13:33but like how important is open models
00:13:35as a strategy for both of you?
00:13:37- I'd be curious to hear from you first.
00:13:38- Yeah.
00:13:38Well, love open models.
00:13:42I think one of the important things about,
00:13:44so just being able to talk about models,
00:13:45I think openness is really key
00:13:48to I think a sustainable development lifecycle
00:13:51where with Codex CLI, we open sourced it out the gate
00:13:54and part of the priority was understanding
00:13:57that an open model was coming down the line.
00:13:58We wanted to make sure that we could as best document
00:14:01how to use our reasoning models.
00:14:02We saw a lot of kind of confusion about,
00:14:05what kind of tools to give it,
00:14:06what the environment should be, the resources.
00:14:08And so we want to make sure that that was as clear as possible
00:14:10and then also make sure that it worked well with open models.
00:14:12So I think there are definitely a lot of use cases,
00:14:14especially when you get into kind of embedded use cases
00:14:18or where cases where you don't want the data
00:14:22to leave the perimeter.
00:14:23There's a lot of really good reasons
00:14:25for why you would want to do that.
00:14:26And then I think the benefit of kind of cloud-hosted models,
00:14:31and that's what we see with a lot of open models.
00:14:33They end up being, they're not run on device,
00:14:35but they're actually cloud-hosted anyway,
00:14:37maybe for efficiency, maybe for cost,
00:14:39that there's still a lot of value
00:14:42in just the pure intelligence that you get
00:14:44from using a much bigger model.
00:14:46And that's why we see people really gravitate
00:14:48towards models from O3 to GBD5 to GBD5 Codex.
00:14:52There's still a lot of value in that.
00:14:53Now we see that that overhang still kind of comes,
00:14:57it resolves itself where every couple of months
00:15:01there's a new, very small, very, very impressive model.
00:15:04And I think that's the magic
00:15:05if we just consider at the beginning of this year,
00:15:06we had O3 mini as kind of the frontier and where we are now.
00:15:10And so, yeah, I think that there's a ton of value
00:15:13in open models, but still, I think personally,
00:15:17from a usage perspective,
00:15:18more value in using the kind of cloud-hosted ones.
00:15:21- Yeah, I'll just interject a bit.
00:15:23Ford actually cares a lot about privacy,
00:15:25security, agent robustness.
00:15:27And so if you run into him, talk to him more about that.
00:15:30But for both of you guys, maybe you wanna start off with,
00:15:33actually, what's your ballpark
00:15:35of open model token percentage generated
00:15:38in your respective apps?
00:15:39And is it gonna go up or down?
00:15:42- So I guess, so maybe to start,
00:15:44'cause I think what you said is really interesting.
00:15:47So a couple of weeks ago,
00:15:48when we released our factory CLI tool,
00:15:52people were really interested
00:15:53because we also released with it
00:15:54our score on this benchmark called Terminal Bench.
00:15:57And one of the first asks was,
00:15:59can you guys put open source models to the test?
00:16:01'Cause our droid agent is fully model agnostic.
00:16:04So immediately people were like,
00:16:06throw in the open source models and show us how it does.
00:16:09And I think something that was particularly surprising
00:16:12was that the open source models,
00:16:14and in particular GLM, were really, really good.
00:16:17They were in fact obviously less performant
00:16:19than the frontier models,
00:16:21but not by a huge margin.
00:16:24I think, so one thing that was noteworthy though
00:16:26was when we benchmarked the open source models,
00:16:29of the seven that were at the top,
00:16:32one of them was made in the United States
00:16:34by yours truly over here,
00:16:36which I think is kind of a shame.
00:16:37Like the fact that by far of the frontier models,
00:16:41it's United States across the board.
00:16:43But then when it comes to open source,
00:16:45we're really dropping the ball there.
00:16:47So I think that's one thing that's noteworthy
00:16:49and I think something that, at least when I saw that,
00:16:52I really think there should be like a call to arms there
00:16:54in terms of changing that.
00:16:56Because I think to answer your question,
00:16:59what we found is that since we released support
00:17:02for open source models,
00:17:03the percent of people that are using open source models
00:17:06has dramatically risen.
00:17:08Partially because of cost and that, you know,
00:17:11it's allows you like,
00:17:12let's say in that documentation example,
00:17:15maybe you want to generate docs,
00:17:16but you don't want it to be like,
00:17:17you know, on super high reasoning, like to the max,
00:17:19like cost you a thousand dollars,
00:17:21but you just want to get like some initial first pass in.
00:17:24And also people like having a little bit more control.
00:17:28And I feel like they get a lot more of that control
00:17:30with some of these open source models,
00:17:33both control and the cost and just like kind of observability
00:17:36into what's actually happening there.
00:17:39So I think the demand has grown to a point
00:17:42where I actually did not expect a year ago.
00:17:43I think a year ago, I was less bullish on open source models
00:17:47than I am now, open-weight, but yeah.
00:17:49- Yeah, I think we use both open source
00:17:51and closed source models in our overall agent pipeline.
00:17:54And I think the way we think about them
00:17:56is there's two different use cases for an LLM call.
00:17:58One is you want state-of-the-art reasoning.
00:18:01It's a very, very open-ended question.
00:18:02You actually don't know what the answer is.
00:18:04The goal is like,
00:18:05the goal function is not super well-defined.
00:18:07In those cases,
00:18:09closed source models are still state-of-the-art
00:18:11when it comes to reasoning and intelligence.
00:18:13We use closed source models pretty much exclusively
00:18:15for those kinds of use cases.
00:18:16There's a second use case where we have a more niche task
00:18:20with a much clearer goal function.
00:18:22In those cases, we almost always try to fine tune
00:18:25an open source model.
00:18:26We're okay taking a 20% cut hit maybe
00:18:29in terms of reasoning ability
00:18:31so that we can actually fine tune
00:18:33a very, very specific use case.
00:18:35And I think we found that open source models
00:18:37are catching up very, very, very fast.
00:18:39A year and a half ago, it was unthinkable for us
00:18:42to be able to use open source models
00:18:43as part of v0's pipeline.
00:18:45Today, every single part of the pipeline,
00:18:47we're like, okay, can we bring open source models into this?
00:18:49Can we replace what we're doing currently
00:18:52with closed source state-of-the-art frontier models
00:18:55with a fine tune of an open source model?
00:18:57And we've seen a ton of success with Quen, QEMI-K2,
00:19:00other kinds of models like that.
00:19:02- Yeah, I'll call this out as one of the biggest deltas
00:19:05I've seen across everyone,
00:19:07which is at the start of this year,
00:19:08I did a podcast with Ankur from BrainTrust,
00:19:10and he said that open source model usage is roughly 5%
00:19:14across what BrainTrust is seeing, and going down.
00:19:17And now I think reasonably it's gonna go
00:19:19to between the 10 to 20% range for everybody.
00:19:22- I do think it's interesting that even closed source models
00:19:25are investing more heavily into their small class models.
00:19:29The Haikus, GPD5 Minis, Gemini Flashes of the world,
00:19:33which I think also is that model class
00:19:35is what competes with open source the most.
00:19:38It's the small model class competing against a fine tune
00:19:40of an open source model.
00:19:42- And I also think there's some use cases
00:19:43where it's just, it will just be overkill
00:19:46to use a frontier model, and if it is overkill,
00:19:49you are then just gonna obviously be incentivized
00:19:51to use something that's faster and cheaper.
00:19:53And I think part of that, part of I think this delta
00:19:56in terms of percent usage is there is this threshold
00:19:59of when open models cross the threshold of for most tasks,
00:20:04it's actually enough, and then for some niche tasks,
00:20:06you need like the extra firepower.
00:20:10I think we're really getting there
00:20:11with some of these open models,
00:20:12which is why I would suspect
00:20:13we'll see more usage going forward.
00:20:16- Yeah, awesome, that's very encouraging.
00:20:18So we have a bit of time left to prep to you guys
00:20:20with the closing question, which is,
00:20:22what's something that your agents cannot do today
00:20:25that you wish they could do,
00:20:26that they'll probably do it next year?
00:20:27- Am I going first?
00:20:31Okay.
00:20:32Yeah, I think that what we've seen over the last year,
00:20:34just maybe starting as a reference point with 01,
00:20:38a little over a year ago, or 01 preview,
00:20:40what we've seen from then,
00:20:42when I was using very early checkpoints of that model,
00:20:47it was great relative to 40,
00:20:49but still had so much left to be desired.
00:20:51I wouldn't put it, I was on the security team at the time,
00:20:55and there was a lot of work and tasks
00:20:57that I just couldn't delegate to that model.
00:21:00And when we compare it to today,
00:21:01where I can take a pretty well-defined task,
00:21:04like maybe it's like two sentences,
00:21:06a few bullet points to your point,
00:21:07like here are the gotchas
00:21:08that I think you'll probably get stuck on,
00:21:10and then come back and 30 minutes later,
00:21:12an hour later, it's done it.
00:21:14We've seen cases where it's running for many hours,
00:21:17maybe even seven to eight hours,
00:21:19effectively a full workday
00:21:20that I spend a lot of my day in meetings,
00:21:22and so don't necessarily have that solid block of time.
00:21:26But that's only half of what engineering is really about.
00:21:30Part of it is coding, part of it is architecting
00:21:32and troubleshooting and debugging.
00:21:34The other half of the problem is writing docs,
00:21:36is understanding the system, convincing people.
00:21:39And so I think what we'll start to see
00:21:41is this super collaborator where what we want to bring,
00:21:45whether it's in codecs or these other interfaces
00:21:48through the codecs model is the ideal collaborator
00:21:53that you want to work with.
00:21:53The person you first go to, that favorite coworker
00:21:56that you want to jam on ideas with,
00:21:58that's really what we want to see, at least with codecs.
00:22:02I think for us, we've seen a bunch of rapid progression
00:22:05on two different fronts.
00:22:07The first is how many steps can you reasonably expect
00:22:10an agent to be able to do and get reasonably good output?
00:22:14Last year, there's probably one, maybe max three, right?
00:22:17If you wanted reliable output with over 90% success,
00:22:20you're probably running one to three agent steps.
00:22:22Today, most tools run five to 20
00:22:24with no really great reliability rates, over 90% success.
00:22:29I think next year, we're gonna add in
00:22:30sort of that like 100 plus, 200 plus,
00:22:32let's run tons of steps all at once,
00:22:34have long running tasks for multiple hours
00:22:36and be confident that you'll get an output
00:22:38at the end that will be useful.
00:22:40The second is in terms of what resources can be consumed.
00:22:42A year ago, it was whatever you are putting
00:22:44into the prompt form, like that was pretty much it.
00:22:47Today, you can now configure external connections via MCP
00:22:51or by making API calls directly in your application.
00:22:55You can kind of do that if you're knowledgeable,
00:22:57you have the ability to configure things.
00:22:58And I think in a year from now, those will just happen.
00:23:00Like it will just work.
00:23:02The goal is like, you should not need to know
00:23:03what sources of context you need to give the agent.
00:23:06The agent will actually go and find
00:23:08those sources of context proactively.
00:23:09We're kind of starting to see that already today,
00:23:12but I'm still not really confident
00:23:14that's very reliable and useful today.
00:23:16I think by next year, that'll be the default mode.
00:23:18- Yeah, I would agree with that.
00:23:19I think agents can do basically everything today,
00:23:23but the degree to which they do so reliably and proactively
00:23:27is I think the slider that is going to change.
00:23:29But that's a slider that's also dependent on the user.
00:23:31Like if you're a user who's like not really like
00:23:33changing your behavior and meeting the agent where it is,
00:23:36then you might get lower reliability and proactivity.
00:23:38Whereas if you kind of set up your harness correctly
00:23:41or set up your environment correctly,
00:23:42it'll be able to do more of that
00:23:44reliably and more proactively.
00:23:45- Yeah, amazing.
00:23:46Well, we're out of time.
00:23:48My contribution is computer vision.
00:23:49Everyone try Atlas.
00:23:51Everyone try like more computer vision use cases,
00:23:53but thank you so much for your time.
00:23:55- Thank you.
00:23:56(audience applauding)
00:23:57(upbeat music)