00:00:00(upbeat music)
00:00:02- Okay, thank you everyone, hello.
00:00:07My name is Luke Sandberg.
00:00:09I'm a software engineer at Vercel, working on Turbo Pack.
00:00:11So I've been at Vercel for about six months,
00:00:15which has given me just enough time to come up here on stage
00:00:20and tell you about all the great work I did not do.
00:00:23Prior to my time at Vercel, I was at Google,
00:00:27where I got to work on our internal web tool chains
00:00:30and do weird things like build a TSX
00:00:33to Java byte code compiler
00:00:35and work on the on the closure compiler.
00:00:37So when I arrived at Vercel,
00:00:40it was actually kind of like stepping on to another planet,
00:00:43like everything was different.
00:00:45And I was pretty surprised by all the things we did
00:00:47on the team and the goals we had.
00:00:50So today I'm gonna share a few of the design choices we made
00:00:53in Turbo Pack and how I think they will let us continue
00:00:57to build on the fantastic performance we already have.
00:01:00So to help motivate that, this is our overall design goal.
00:01:05So from this, you can immediately infer
00:01:11that we probably made some hard choices.
00:01:14So like, what about cold builds?
00:01:17Those are important, but one of our ideas
00:01:20is you shouldn't be experiencing them at all.
00:01:22And that's what this talk is gonna focus on.
00:01:24In the keynote, you heard a little bit
00:01:26about how we leverage incrementality
00:01:29to improve bundling performance.
00:01:31A key idea we have for incrementality is about caching.
00:01:35We wanna make every single thing the bundler does cacheable
00:01:37so that whenever you make a change,
00:01:39we only have to redo work related to that change.
00:01:43Or maybe to put it another way,
00:01:45the cost of your build should really scale
00:01:47with the size or complexity of your change
00:01:50rather than the size or complexity of your application.
00:01:53And this is how we can make sure that Turbo Pack
00:01:55will continue to give developers good performance
00:01:58no matter how many icon libraries you import.
00:02:01So to help understand and motivate that idea,
00:02:05let's imagine the world's simplest bundler,
00:02:08which maybe looks like this.
00:02:09So here's our baby bundler.
00:02:12And this is maybe a little bit too much code
00:02:14to put on a slide, but it's gonna get worse.
00:02:17So here we parse every entry point.
00:02:19We follow their imports, resolve their references,
00:02:23recursively throughout the application
00:02:25to find everything you depend on.
00:02:28Then at the end, we just simply collect everything
00:02:31each entry point depends on
00:02:33and plop it into an output file.
00:02:35So hooray, we have a baby bundler.
00:02:38So obviously this is naive,
00:02:40but if we think about it from an incremental perspective,
00:02:43no part of this is incremental.
00:02:45So we definitely will parse certain files,
00:02:49multiple times, maybe depending on how many times
00:02:51you import them, that's terrible.
00:02:53We'll definitely resolve the react import
00:02:55like hundreds or thousands of times.
00:02:57So, you know, ouch.
00:03:01So if we want this to be at least
00:03:03a little bit more incremental,
00:03:03we need to find a way to avoid redundant work.
00:03:08So let's add a cache.
00:03:10So you might imagine this is our parse function.
00:03:14It's pretty simple.
00:03:15And it's probably kind of the workhorse of our bundler.
00:03:19You know, very simple.
00:03:19We read the file contents, hand them off to SWC
00:03:23to give us an AST.
00:03:25So let's add a cache.
00:03:27Okay, so this is clearly a nice simple win.
00:03:31But, you know, I'm sure some of you
00:03:35have written caching code before.
00:03:36Maybe there's some problems here.
00:03:38Like, you know, what if the file changes?
00:03:41This is clearly something we care about.
00:03:46And, you know, what if the file isn't really a file,
00:03:49but it's three symlinks in a trench code?
00:03:52A lot of package managers will organize
00:03:54dependencies like that.
00:03:55And we're using the file name as a cache key.
00:03:59Is that enough?
00:04:00Like, you know, we're bundling for the client and the server.
00:04:03Same files end up in both.
00:04:04Does that work?
00:04:05We're also storing the AST and returning it.
00:04:08So now we have to worry about mutations.
00:04:11So, you know, and then finally,
00:04:14isn't this a really naive way to parse?
00:04:16I know that everyone has massive configurations
00:04:19for the compiler.
00:04:21Like, some of that has to get in here.
00:04:23So, yeah, these are all great feedback.
00:04:27And this is a very naive approach.
00:04:32And to that, of course, I would say,
00:04:34yeah, this will not work.
00:04:36So what do we do about fixing these problems?
00:04:39Please fix and make no mistakes.
00:04:44So, okay.
00:04:46So maybe this is a little bit better.
00:04:49You know, you can see here that we have some transforms.
00:04:52We need to do customized things to each file,
00:04:55like maybe down-leveling or implement use cache.
00:04:58We also have some configuration.
00:05:00And so, of course, we need to include that
00:05:02in our key for our cache.
00:05:04But maybe right away you're suspicious.
00:05:08Like, is this correct?
00:05:09Like, is it actually enough to identify a transform
00:05:11based on the name?
00:05:13I don't know, maybe that has some complicated configuration
00:05:15all of its own.
00:05:16And, okay, and like, is this two JSON value
00:05:21gonna actually capture everything we care about?
00:05:24Will the developers maintain it?
00:05:26How big will these cache keys be?
00:05:29How many copies of the config will we have?
00:05:31So I've actually personally seen code exactly like this,
00:05:34and I find it next to impossible to reason about.
00:05:37Okay, we also tried to fix this other problem
00:05:41around invalidations.
00:05:43So we added a callback API to read file.
00:05:46This is great, so if the file changes,
00:05:49we can just nuke it from the cache,
00:05:51so we won't keep serving stale contents.
00:05:54Okay, but this is actually pretty naive,
00:05:56'cause like, sure, we need to nuke our cache,
00:05:59but our caller also needs to know
00:06:00that they need to get a new copy.
00:06:03So, okay, so let's start threading callbacks.
00:06:06Okay, we did it.
00:06:09We threaded callbacks up through the stack.
00:06:12You can see here that we allow our caller
00:06:14to subscribe to changes.
00:06:16We can just rerun the entire bundle if anything changes,
00:06:20and if a file changes, we call it.
00:06:22Great, we have a reactive bundler.
00:06:25But this is still hardly incremental.
00:06:28So if a file changes, we need to walk all the modules again
00:06:33and produce all the output files.
00:06:37So, you know, we saved a bunch of work
00:06:40by having our parse cache, but this isn't really enough.
00:06:45And then finally, there's all this other redundant work.
00:06:49Like, we definitely wanna cache the imports.
00:06:52We might find a file a bunch of times,
00:06:53and we keep needing its imports,
00:06:55so we wanna put a cache there.
00:06:57And, you know, resolve results
00:07:00are actually pretty complicated,
00:07:01so we should definitely cache that
00:07:03so we can reuse the work we did resolving React.
00:07:08But, okay, now we have another problem.
00:07:11Your resolve results change when you update dependencies
00:07:14or add new files, so we need another callback there.
00:07:18And we definitely also wanna, like,
00:07:21cache the logic to produce outputs
00:07:24because you think about it in an HMR session,
00:07:26you're editing one part of the application,
00:07:28so why are we rewriting all the outputs every time?
00:07:31And also, you might, like, delete an output file,
00:07:35so we should probably listen to changes there, too.
00:07:39Okay, so maybe we solved all those things,
00:07:43but we still have this problem,
00:07:44which is every time anything changes, we start from scratch.
00:07:48So, kind of the whole control flow
00:07:50of this function doesn't work
00:07:52because if a single file changes,
00:07:54we'd really kinda wanna jump into the middle
00:07:55of that for loop.
00:07:56And then, finally, our API to our caller
00:08:01is also hopelessly naive.
00:08:03They probably actually wanna know which file has changed,
00:08:05so they can, like, push updates to the client.
00:08:07So, yeah.
00:08:10So, this approach doesn't really work.
00:08:13And even if we somehow did thread all the callbacks
00:08:16in all these places,
00:08:17do you think you could actually maintain this code?
00:08:21Do you think you could, like, add a new feature to it?
00:08:24I don't.
00:08:25I think this would just crash and burn.
00:08:28And, you know, to that, I would say, yeah.
00:08:34So, once again, what should we do?
00:08:36You know, just like when you're chatting with an LLM,
00:08:41you actually first need to know what you want.
00:08:43And then you have to be extremely clear about it.
00:08:48So, what do we even want?
00:08:50So, you know, we considered a lot of different approaches,
00:08:55and many people on the team actually had
00:08:56a lot of experience working on bundlers.
00:08:59So, we came up with these kind of rough requirements.
00:09:02So, we definitely wanna be able
00:09:03to cache every expensive operation in the bundler.
00:09:05And it should be really easy to do this.
00:09:08Like, you shouldn't get 15 comments on your code review
00:09:10every time you add a new cache.
00:09:12And then I don't actually really trust developers
00:09:17to write correct cache keys or track inputs
00:09:21or track dependencies by hand.
00:09:24So, we should handle it.
00:09:26We should definitely make this foolproof.
00:09:30Next, we need to handle changing inputs.
00:09:33This is like a big idea in HMR, but even across sessions.
00:09:36So, mostly this is gonna be files,
00:09:38but this could also be things like config settings.
00:09:40And with the file system cache,
00:09:41it actually ends up being things
00:09:42like environment variables, too.
00:09:45So, we wanna be reactive.
00:09:46We wanna be able to recompute things
00:09:48as soon as anything changes,
00:09:52and we don't wanna thread callbacks everywhere.
00:09:54Finally, we just need to take advantage
00:09:58of modern architectures and be multi-threaded
00:10:01and just generally fast.
00:10:02So, maybe you're looking at this set of requirements,
00:10:07and some of you are thinking,
00:10:09what does this have to do with a bundler?
00:10:12And to that, I would say, of course,
00:10:15my management team is in the room,
00:10:17so we don't really need to talk about that.
00:10:20But really, I'm guessing a lot of you jumped
00:10:22to the much more obvious conclusion.
00:10:24This sounds a lot like signals.
00:10:28And yeah, I am describing a system that sounds like signals.
00:10:31It's a way to compose computations, track dependencies,
00:10:35with some amount of automatic memoization.
00:10:37And I should note that we drew inspiration
00:10:41from all sorts of systems, especially the Rust compiler
00:10:44and a system called Salsa.
00:10:45And there's even an academic literature
00:10:48on these concepts called Adaptons, if you're interested.
00:10:51Okay, so let's take a look at what the,
00:10:54let's see what this looks like in practice,
00:10:55and then we're gonna take a very jarring jump
00:10:57from code samples in JavaScript to Rust.
00:11:01So here's an example of the infrastructure we built.
00:11:05A TurboTask function is a cached unit of work
00:11:11in our compiler.
00:11:12So we can, once you annotate a function like this,
00:11:17we can track it, we can construct a cache key
00:11:19out of its parameters, and that allows us to both cache it
00:11:24and re-execute it when we need to.
00:11:28These VC types here, you can think of like signals,
00:11:31this is a reactive value, VC stands for value cell,
00:11:34but signal might be a little bit of a better name.
00:11:39When you declare a parameter like this,
00:11:43you're saying this might change,
00:11:44I wanna re-execute when it changes.
00:11:47And so how do we know that?
00:11:49So we read these values via a weight.
00:11:52Once you await a reactive value like this,
00:11:56we automatically track the dependency.
00:11:58And then finally, of course,
00:12:01we do the actual computation we wanted to do,
00:12:05and we store it in a cell.
00:12:07So because we've automatically tracked dependencies,
00:12:11we know that this function depends on both
00:12:13the contents of the file and the value of the config.
00:12:17And every time we store a new result into the cell,
00:12:22we can compare it with the previous one,
00:12:24and then if it's changed, we can propagate notifications
00:12:27to everyone who's read that value.
00:12:29So this concept of changing
00:12:31is key to our approach to incrementality.
00:12:33And yeah, again, the simplest case is right here.
00:12:37If the file changes, Turbo Pack will observe that,
00:12:41invalidate this function execution,
00:12:43and re-execute it immediately.
00:12:45And then if we happen to produce the same AST,
00:12:49we'll just stop right there
00:12:51because we compute the same cell.
00:12:53Now, for parsing a file,
00:12:55there's hardly any edit you can make to it
00:12:58that doesn't actually change the AST.
00:13:00But we can leverage the fundamental composability
00:13:04of Turbo Pack functions to take this further.
00:13:07So here, we see another Turbo Pack cache function
00:13:12extracting imports from a module.
00:13:15You can imagine this is a very common task
00:13:19we have in the Bundler.
00:13:20We need to extract imports just to actually find
00:13:22all the modules in your application.
00:13:25We leverage them to pick the best way
00:13:26to group modules together into chunks.
00:13:29And of course, the import graph
00:13:30is important to basic tasks like tree shaking.
00:13:34And so because there's so many different consumers
00:13:38of the imports data, a cache makes a lot of sense.
00:13:41So this implementation isn't really special.
00:13:44This is like what you would find in any kind of Bundler.
00:13:46We walk the AST, collect imports
00:13:49into some special data structure that we like,
00:13:52and then we return them.
00:13:55But the key idea here
00:13:56is that we store them into another cell.
00:13:58So if the module changes,
00:14:01we do need to rerun this function because we read it.
00:14:05But if you think about the kind of changes
00:14:08you make to modules,
00:14:09very few of them actually affect the imports.
00:14:12So you change the module, you update the function body,
00:14:16a string literal, any kind of implementation detail.
00:14:20It'll invalidate this function
00:14:22and then we'll compute the same set of imports.
00:14:25And then we don't invalidate anything that has read this.
00:14:29So if you think about this in like an HMR session,
00:14:32this means that we do need to reparse your file,
00:14:35but we really don't need to think about
00:14:38how to do chunking decisions anymore.
00:14:40We don't need to think about any kind of tree shaking results
00:14:43because we know those didn't change.
00:14:45So we can immediately jump from parsing the file,
00:14:48doing this simple analysis,
00:14:51and then jumping right to producing outputs.
00:14:53And this is one of the ways we have really fast refresh times.
00:14:57So this is pretty imperative.
00:15:02Another way to think about this basic idea
00:15:04is as a graph of nodes.
00:15:06So here on the left, you might imagine a cold build.
00:15:12Initially, we actually do have to read every file,
00:15:14parse them all, analyze all imports.
00:15:16And as a side effect of that, we have collected
00:15:18all the dependency information from your application.
00:15:21And then when something changes,
00:15:25we can leverage that dependency graph we built up
00:15:27to propagate invalidations, back up the stack,
00:15:30and re-execute Turbo Pack functions.
00:15:32And so if they produce a new value, we stop there.
00:15:35Otherwise, we keep propagating the invalidation.
00:15:37So great.
00:15:41You know, this is actually kind of a massive oversimplification
00:15:44of what we're doing in practice, you might imagine.
00:15:47So in Turbo Pack today,
00:15:49there are around 2,500 different Turbo task functions.
00:15:53And in a typical build, we might have literally
00:15:56millions of different tasks.
00:15:58So it really looks maybe a little bit more like this.
00:16:01Now, I don't really expect you to be able to read this.
00:16:04Couldn't really fit it on the slide.
00:16:06So maybe we should zoom out.
00:16:08Okay, so that is not obviously helpful.
00:16:14In reality, we do have better ways to kind of track
00:16:17and visualize what's happening inside of Turbo Pack.
00:16:21But fundamentally, those works by throwing out
00:16:23the vast majority of dependency information.
00:16:26And now I'm guessing that some of you maybe actually
00:16:29have experience working with signals, maybe bad experiences.
00:16:34You know, I for one actually like stack traces
00:16:38and being able to step into and out of functions
00:16:40in a debugger.
00:16:41So maybe you're like suspicious
00:16:43that this is the complete panacea.
00:16:45Like it obviously comes with trade-offs.
00:16:47And yeah, so and to that I would of course say,
00:16:53well, you know, what I'd actually say is
00:16:57all of software engineering is about managing trade-offs.
00:17:01We're not always solving problems exactly,
00:17:03but we're really picking new sets of trade-offs
00:17:07to deliver value.
00:17:08So to achieve our design goals around incremental builds
00:17:12in Turbo Pack, we put kind of all our chips
00:17:15on this incremental reactive programming model.
00:17:19And this of course had some very natural consequences.
00:17:23So, you know, maybe we actually really did solve the problem
00:17:29of hand rolled caching systems
00:17:31and cumbersome invalidation logic.
00:17:33In exchange, we have to manage
00:17:36some complicated caching infrastructure.
00:17:39And of course, you know, that sounds like
00:17:40a really good trade-off to me.
00:17:41I like complicated caching infrastructure,
00:17:44but we all have to live with the consequences.
00:17:48So the first of course is just
00:17:52the core overheads of this system.
00:17:54You know, so if you think about it
00:17:57in a given build or HMR session,
00:17:59you're not really changing very much.
00:18:04So we track all the dependency information
00:18:06between like every import
00:18:08and every resolve results in your application,
00:18:11but you're only gonna actually like change a few of them.
00:18:13So most of the dependency information we collect
00:18:15is never actually needed.
00:18:16So, you know, to manage this,
00:18:19we've had to focus a lot on driving,
00:18:22on improving the performance of this caching layer
00:18:25to drive the overheads down and let our system scale
00:18:28to larger and larger applications.
00:18:30And the next and most obvious is simply memory.
00:18:34You know, caches are always fundamentally a time
00:18:36versus memory trade-off.
00:18:38And ours doesn't really do anything different there.
00:18:41Our simple goal is that the cache size should scale
00:18:45linearly with the size of your application.
00:18:49But again, we have to be careful about overheads.
00:18:51This next one is a little subtle.
00:18:54So we have lots of algorithms in the bundler
00:18:57as you might expect.
00:18:58And some of them kind of require understanding
00:19:00something global about your application.
00:19:02Well, that's a problem
00:19:05because anytime you depend on global information,
00:19:07that means any change might invalidate that operation.
00:19:10So we have to be careful about how we design
00:19:12these algorithms, compose things carefully
00:19:14so that we can preserve incrementality.
00:19:17And finally, this one's maybe a bit of a personal gripe.
00:19:24Everything is async in Turbo Pack.
00:19:27And so this is great for horizontal scalability,
00:19:29but once again, it harms our fundamental,
00:19:30like, you know, debugging performance profiling goals.
00:19:38So I'm sure a lot of you have experience debugging async
00:19:43in like the Chrome dev tools.
00:19:46And this is generally a pretty nice experience.
00:19:48Not always ideal.
00:19:49And I assure you Rust with LLDB is like light years behind.
00:19:53So to manage that, we've had to invest
00:19:57in custom visualization, instrumentation, and tracing tools.
00:20:01And look at that, like another infrastructure project
00:20:04that isn't a bundler.
00:20:07Okay, so let's take a look and see if we made the right bet.
00:20:11So at Vercel, we have a very large production application.
00:20:17We think it's maybe one of the largest in the world,
00:20:19but you know, we don't really know.
00:20:21But it does have around 80,000 modules in it.
00:20:23So let's take a look at how Turbo Pack does on it.
00:20:26For fast refresh, we really do dominate
00:20:30what Web Pack is able to deliver.
00:20:32But this is kind of old news.
00:20:33Turbo Pack for dev has been out for a while,
00:20:35and I really hope everyone is at least
00:20:37using it in development.
00:20:39But you know, the new thing here today, of course,
00:20:41is that builds are stable.
00:20:42So let's look at a build.
00:20:44And here you can see a substantial win over Web Pack
00:20:48for this application.
00:20:49This particular build is actually running
00:20:50with our new experimental file system caching layer.
00:20:53So about 16 of those 94 seconds
00:20:56is just flushing the cache out at the end.
00:20:59And this is something we're gonna be working on improving
00:21:01as file system caching becomes stable.
00:21:04But of course, the thing about cold builds
00:21:05is that they're cold, nothing's incremental.
00:21:07So let's take a look at an actual warm build.
00:21:10So using the cache from the cold build, we can see this.
00:21:14So this is just a peek at where we are today.
00:21:17Because we have this fine-grained caching system,
00:21:19we can actually just write out the cache to disk,
00:21:21and then on the next build, read it back in,
00:21:24figure out what changed, and finish the build.
00:21:26Okay, so this looks pretty good,
00:21:28but a lot of you are thinking like,
00:21:29well, maybe I personally don't have
00:21:31the largest Next.js application in the world.
00:21:34So let's take a look at a smaller example.
00:21:37The react.dev website is quite a bit smaller.
00:21:41It's also kind of interesting 'cause it's a React compiler.
00:21:44It's unsurprisingly an early adopter of the React compiler.
00:21:47And the React compiler's implemented in Babel.
00:21:49And this is kind of a problem for our approach
00:21:51because it means for every file in the application,
00:21:53we need to ask Babel to process it.
00:21:55So, and fundamentally, I would say we, or me,
00:21:59I can't make the React compiler faster.
00:22:01It's not my job.
00:22:02My job is Turbo Pack.
00:22:03But we can figure out exactly when to call it.
00:22:07So looking at fast refresh times,
00:22:11I was actually a little disappointed with this result.
00:22:13And it turns out that about 130 of those 140 milliseconds
00:22:16is the React compiler.
00:22:18And both Turbo Pack and Web Pack are doing that.
00:22:22But with Turbo Pack, we can,
00:22:23after the React compiler has processed this change,
00:22:26we can see, oh, imports didn't change.
00:22:29Chuck it into the output and keep going.
00:22:31Once again, on cold builds,
00:22:34we see this kind of consistent 3x win.
00:22:37And just to be clear, this is on my machine.
00:22:39But again, no incrementality in a cold build.
00:22:44And in a warm build, we see this much better time.
00:22:47So again, with a warm build,
00:22:50we already have the cache on disk.
00:22:52All we need to do is basically, once we start,
00:22:54figure out what files in the application change,
00:22:57re-execute those jobs, and then reuse everything else
00:23:00from the previous build.
00:23:01So the basic question is, are we Turbo yet?
00:23:05Yes.
00:23:06So yeah, this was discussed in the keynote, of course.
00:23:09Turbo Pack is stable as of the next 16.
00:23:12And we're even the default bundler for next.
00:23:14So, you know, mission accomplished, you're welcome.
00:23:17But. (laughs)
00:23:19(audience applauds)
00:23:23And if you notice that revert thing in the keynote,
00:23:27that was me trying to make Turbo Pack the default.
00:23:30It only took three tries.
00:23:31But what I really want to leave you with, again, is this.
00:23:35You know, 'cause we're not done.
00:23:37We still have a lot to do on performance,
00:23:39and finishing the swing on the file system caching layer.
00:23:42I suggest you all try it out in dev.
00:23:44And that is it.
00:23:46Thank you so much.
00:23:47Please find me, ask me questions.
00:23:49(audience applauds)
00:23:50(upbeat music)
00:23:54(upbeat music)