Community Session: Vercel Workflow

VVercel
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00So I want to stop here and kind of just jump into questions.
00:00:02I think there's a lot that we can go over.
00:00:05I also want to point you to the workflow docs, where there's
00:00:08a lot more on workflows and how each of these different bits
00:00:11work, along with the blog post that I'm
00:00:14happy to go over as it makes sense for questions.
00:00:19But yeah, think of this as the basic just getting started
00:00:23guide and the three basic primitives
00:00:25we think about with workflow, which is you have steps.
00:00:27You have used workflow and steps that can suspend and resume.
00:00:30You have hooks that allow you to wait
00:00:31for any external arbitrary event.
00:00:33And you have sleeps that allow you to wait
00:00:36for arbitrary amounts of time, anything
00:00:38from a couple of seconds to a couple of days
00:00:39or even a couple of months.
00:00:42And with that, you can kind of model
00:00:44a ton of background pipelines, agents
00:00:48being a natural extension of this as well.
00:00:51I haven't gone into this, and I'll just
00:00:53point you to this in the docs before we jump into questions.
00:00:56But agents tend to be a very natural representation
00:01:01of basically just a long-running task because you're looping,
00:01:05making an LLM call, and then running one or more tool calls,
00:01:08maybe like writing to memory using a sandbox,
00:01:11using a file system, whatever, and then just doing that
00:01:13over and over again on a loop.
00:01:15So you're streaming tokens, making a tool call,
00:01:17streaming tokens, making a tool call.
00:01:20And doing that with workflows is both extremely powerful
00:01:25and extremely tricky if you're trying to wire this all up
00:01:27yourself.
00:01:27So we have a class called Durable Agent
00:01:29that basically does a lot of this orchestration for you
00:01:32and just works natively inside workflows.
00:01:35I'll stop there for now, but excited to hear
00:01:39what questions we have.
00:01:45Hey.
00:01:47I just want to say we started using workflows for some
00:01:50of the tools that we use to manage community notifications
00:01:54because we have so many places where it's coming in.
00:01:58Agents have made managing those notifications great,
00:02:01and workflows has been great for handling some
00:02:03of those kinds of tasks as well.
00:02:05So it's been really a big shift forward in the way
00:02:11that we build things.
00:02:13But the first question that I have,
00:02:15because I see so many skeptical people who
00:02:18ask every time we release some new open source thing,
00:02:23we've got workflow SDK, which is open source,
00:02:26and then we got Vercel workflows.
00:02:28So how open is it really?
00:02:31With the Vercel workflows, does that
00:02:32mean workflow SDKs only can work on Vercel?
00:02:35Or how easy is it to use other places?
00:02:39Yeah.
00:02:40Yeah, that's a good question.
00:02:41We get all the time on workflows.
00:02:43And essentially workflow SDK is the open source framework.
00:02:49And we name it as such, too, where
00:02:52we ship this thing called Worlds from day one.
00:02:54It's basically analogous to adapters.
00:02:57And Next.js also launched adapters around the same time.
00:03:00But the idea is that workflows themselves with workflow SDK
00:03:03are just a way you represent long running code.
00:03:06You can write use workflow and steps,
00:03:08and you can have sleeps and timers and weights and hooks
00:03:11and whatever.
00:03:12And the best analogy I can think of is something like Docker,
00:03:15where this is a Docker file syntax for how
00:03:18you write a long running workflow.
00:03:22By default, we've shipped and maintained
00:03:24three different worlds, which is the local world.
00:03:26So when you run workflows locally,
00:03:28it stores everything in a file system that you can inspect.
00:03:31You get the same observability locally
00:03:33that just inspects the file system.
00:03:36The queue is in memory.
00:03:38And so you have an entire--
00:03:39from day one, we've always had an entire great local dev
00:03:42experience that pretty much mocks
00:03:44what we're doing on Vercel.
00:03:45We also ship the Vercel world, of course.
00:03:47So when you deploy it on Vercel, there's a lot of stuff
00:03:49that you get out of the box.
00:03:50We encrypt everything end-to-end.
00:03:52We've done a lot of work on the compute layer
00:03:54to make this fast, keep networking private.
00:03:58And then we also ship this Postgres adapter
00:04:00from day one, which means that you could already
00:04:02take workflows.
00:04:03And we have customers who do this in production, where
00:04:05they run their workflows, the same code.
00:04:07And they can easily migrate them from Vercel over to Postgres,
00:04:10or likewise.
00:04:12But essentially, you can just run a Postgres backend yourself
00:04:14and then start workflow with an environment variable that points
00:04:17to the Postgres instance.
00:04:19And the world adapter will set it up with all the tables
00:04:25that we need and then use it as the persistence layer
00:04:29and use it reliably.
00:04:31We have more worlds coming as well.
00:04:32So we're working with the community and stuff.
00:04:34We have a Cloudflare world in the works.
00:04:36We have an AWS world in the works.
00:04:39It's been built to be something that can be run anywhere.
00:04:43We've done a little bit of the work,
00:04:44like I said, with those three first-party maintained ones.
00:04:47And there's a lot more coming there, both from our team
00:04:50and from the community right now.
00:04:54We've got some love coming in in the comments, someone
00:04:57who says they are actually using Vercel workflows right now,
00:05:02another who thinks workflows is a great addition to Vercel.
00:05:08Another question that I see come up with this kind of thing
00:05:12is, there's code snippets in the docs,
00:05:15but are there any full apps or other more complex examples,
00:05:19scenarios, that can help people get started with real work
00:05:23and not just the theoretical easy path beginner stuff?
00:05:27Yeah, I'd actually love to share my screen, if that's OK.
00:05:31Oh, yeah, sure.
00:05:32You can share that.
00:05:34I was going to--
00:05:37Chrome always makes you hit four buttons to do this.
00:05:40OK, there we go.
00:05:42So two things I wanted to point to is,
00:05:44one is workflow examples.
00:05:46It's a repo that we maintain.
00:05:48And we use this for a lot of our demos,
00:05:50including the birthday card generator that I just showed.
00:05:54We also have a flight booking app
00:05:55that uses Durable Agent, like I was talking about.
00:05:58We show you how to run this on Postgres.
00:06:00And there's also-- I have a PR to show you how you deploy
00:06:03this off of Vercel as well.
00:06:08So we have a lot of examples here.
00:06:09And the other thing that we're spending a lot of time on right
00:06:12now is the cookbook.
00:06:15So there's a lot more work coming onto this page with not
00:06:19just showing you--
00:06:21currently, we use this to show you a lot of common patterns
00:06:23within workflow.
00:06:24There's some stuff that gets really interesting and exciting
00:06:27for the ways that we even use workflow at Vercel.
00:06:31For example, I just wrote this guide
00:06:32on using a distributed abort controller.
00:06:34So if you look at those primitives of hook and sleep
00:06:37and steps, we think of those as primitives.
00:06:41And then there's all of this fun stuff you can build on top.
00:06:44For example, v0 uses distributed abort controller
00:06:48as their mechanism of having the stop button work.
00:06:51So when you use a v0 chat and you hit a stop button,
00:06:53we basically use a workflow with a support controller
00:06:56to stop all the ongoing processes in the background.
00:07:00So we'll be adding even more stuff to this.
00:07:01But I think these are two good places
00:07:03to look at for examples of not just the basic, simple code
00:07:08snippets, but where you go from there.
00:07:13Nice.
00:07:13And it is fairly real examples.
00:07:16It's things that we've done at Vercel and some ideas
00:07:18that we've seen customers working with.
00:07:21So I like that.
00:07:23Yeah.
00:07:26At this point, everyone is pretty much using AI coding agents.
00:07:31So all of that's great for people to explore.
00:07:36But what about supporting coding agents?
00:07:41How easy is it for them to pick up this new workflow
00:07:45way of working?
00:07:47Yeah, I mentioned this earlier.
00:07:51There's been two great things that we've seen.
00:07:54One that agents have picked up how
00:07:58to code the way that humans have been coding for a long time.
00:08:01All of this is trained from the way that we like programming.
00:08:06So we found really early on, because there's
00:08:08a lot of ways that we could have taken the DX with workflows.
00:08:11And we chose to do this thing that's extremely lightweight
00:08:13on the SDK.
00:08:14That's mostly just JavaScript.
00:08:15That's mostly just directives.
00:08:17Agents understand that well.
00:08:19And it's very easy for an agent to model
00:08:22doing things in parallel at scale.
00:08:23Let me just promise to all of them.
00:08:24If you want to do a step, but you want to time it out,
00:08:27the pattern and workflow is you basically
00:08:29just do a promise.race, and you erase your step with a sleep.
00:08:34It's a very natural way to think about code and solve problems
00:08:37when you-- if you already know the fundamentals of JavaScript.
00:08:40And agents seem to have captured that really well.
00:08:43And because they already know how to do the hard stuff.
00:08:47They know how to write extremely complex framework code.
00:08:50We use it on Next.js.
00:08:53We use elements with Next.js all the time.
00:08:56They have a great understanding of infrastructure code--
00:08:59or sorry, of framework code.
00:09:02That transfers really well into workflow
00:09:05when there isn't a lot of--
00:09:08when you aren't adding a lot of weight to the SDK.
00:09:11In comparison, it's also easy to have
00:09:13had an SDK with a ton of options,
00:09:15with a ton of complexity, which makes it not just complicated
00:09:18for a human, because then you have tons of docs to read.
00:09:22And this is kind of a problem that I
00:09:23had with the competition before starting on workflow.
00:09:26It often just felt like the DX of using a workflow system
00:09:29was-- it was great in theory, but really hard
00:09:32to actually run in practice.
00:09:33Or I had to read this giant instruction booklet.
00:09:36Agents also struggle with that, because they either
00:09:38need to have the entire thing in context.
00:09:40Or if they do learn that-- you didn't see this
00:09:42in certain other examples--
00:09:45they start to-- it gets really hard
00:09:47to change the SDK in the future.
00:09:49But as you change things--
00:09:50as you change versions, old models
00:09:53are still set in their ways in what they've learned.
00:09:57So yeah, one is just that we found agents to be--
00:10:01so that's a long answer.
00:10:02We found agents to be really good at workflow.
00:10:04And then, of course, we shipped a skill as well for our workflow.
00:10:07So you can do NPX skills at Vercel workflow.
00:10:11If you look at the skill, which I think is a fun point
00:10:14to prove what I was just saying, the workflow skill basically
00:10:17just tells it to--
00:10:20shows it a couple of very basic code examples
00:10:23of how you can do stuff like race,
00:10:25and then just points it to read the docs, which
00:10:27we ship along with our NPM package.
00:10:30So the agents essentially just look at the source code
00:10:34and end up being really good at writing workflows as well.
00:10:37And then that skill with the docs
00:10:38keeps it up-to-date on all the latest, which is nice.
00:10:41Right, exactly.
00:10:42Keeps it up-to-date, keeps it pointing to the right version
00:10:44because you ship the docs along with your package.
00:10:47So if we change the SDK as well, but you're
00:10:49on an older version of the client,
00:10:51the few options that we do have, your LLM just
00:10:54has access to those locally.
00:10:57Nice.
00:10:58We had another question come in the chat.
00:11:00This one is, do we have scale numbers
00:11:03on how many workflows and subworkflows and steps
00:11:07we can have on a single run?
00:11:11Oh, yeah.
00:11:13Yeah, that's a great question.
00:11:16So your concurrency on workflows when deploying--
00:11:22I'm assuming you're talking about Vercel.
00:11:24When you deploy to Vercel, the concurrency
00:11:26on steps and workflows are just limited by your Vercel function
00:11:30concurrency, which I believe is something like 10,000 or 100,000.
00:11:33It's based on your tier.
00:11:36But basically, workflow itself doesn't
00:11:39add a limit in concurrency.
00:11:41That said, the runtime being able to run--
00:11:44the way that suspension resumption basically
00:11:46works with workflow is that we have this event log
00:11:50that we keep tracking all of the inputs and outputs of steps.
00:11:53So as your event log gets really large,
00:11:55as you either have 1,000 steps one after the other,
00:11:59or you have 1,000 steps in parallel, whatever it is,
00:12:01you start to build up this event log.
00:12:03And we have to keep loading the entire event log for replays
00:12:06as the workflow runs for longer.
00:12:09If you're coming from a temporal world,
00:12:10I think this limit is 50,000 events or 50 megabytes
00:12:14in your total storage.
00:12:16We just published this on workflow.
00:12:18We have our own limits as well.
00:12:22I think it's on our--
00:12:26it's on the bracelle docs, which I can really
00:12:30extend your link after.
00:12:33I have to find it.
00:12:34But there is--
00:12:36I think our limits for this are you can run--
00:12:40you can run a lot more workflows in parallel.
00:12:42You can start a lot of child workflows.
00:12:44None of those affect the event log.
00:12:45The thing that does affect the event log
00:12:47is doing a ton of steps in parallel.
00:12:49So when you have something like 1,000 to 10,000 steps
00:12:52in parallel, you could start to hit timeout issues.
00:12:55You could start to have workflows just run for longer.
00:12:59We've seen workflows that are-- we've
00:13:01seen agents that are doing 5,000 steps be just fine.
00:13:04They start to have-- the last couple of steps
00:13:06start to hit 15 to 20 second long latencies between steps.
00:13:11So based off of what your use case is that could be really
00:13:13long, or if it's a background task that
00:13:15needs to run for 5,000 steps for some customers, that's fine.
00:13:19At the same time, this is a big focus
00:13:21on what we're doing on workflow SDK5.
00:13:25So a lot of what we're doing with SDK5, which is in beta
00:13:29right now, is now that we have the DX right,
00:13:34we're trying to solve this replay problem entirely.
00:13:37So when you use something like temporal,
00:13:38for example, you eventually hit this limit.
00:13:40And the solution has always been when you get to 5,000 events,
00:13:44you should just kill your workflow
00:13:46and continue as a new workflow.
00:13:48You can already do that today with workflows.
00:13:50You can just start a new workflow if you wanted to.
00:13:52But the correct solution, I think,
00:13:54is being able to improve the runtime
00:13:56and have suspension and resumption be O of 1
00:13:59instead of being tied to the length of the event log.
00:14:03There is some work we're doing right now
00:14:04with a VAS and snapshot-based runtime
00:14:06that we're exploring to see if we can completely kill that replay,
00:14:09at which point you could have arbitrarily high concurrency
00:14:12and arbitrarily long workflow runs.
00:14:19All right, we are getting very close to time.
00:14:21I want to be sensitive to that.
00:14:22So one last question.
00:14:25And you kind of started touching on this.
00:14:27What is coming next?
00:14:28What should people be on the lookout for?
00:14:31Yeah.
00:14:33As we went GA with workflow 4, like I mentioned,
00:14:36workflow 5 is currently in beta.
00:14:39No timelines yet on when we went back to GA that.
00:14:42But the main focus with workflow 5
00:14:44has been on performance, like limits--
00:14:47like I talked about with limits, trying to make it
00:14:50so that you don't have to worry about continuing
00:14:52as new at some point.
00:14:55Also, native concurrency controls.
00:14:57So in RFC, we've had on day one with workflow,
00:15:00which is just like those primitives of step, hook,
00:15:05and time, and wait, and sleep, we're
00:15:08adding a new primitive called lock, which essentially just
00:15:11lets your workflow wait for a lock,
00:15:15wait for concurrency slots.
00:15:16You could have 20 workflows that all run in parallel,
00:15:18but you only have a concurrency of one per minute.
00:15:20They can all the way to lock that just gets resumed
00:15:23as those locks open up.
00:15:24It's a really nice, native way to have concurrency at scale.
00:15:28And improving the agent streaming experience.
00:15:31That's one of the big use cases with workflows, like I said,
00:15:34is you could either use it for these traditional birthday card
00:15:36style workflows, or you could use it
00:15:39for AI agents, where we have a lot of opinions on that SDK,
00:15:43and how it works with streaming, and how it works with sandboxes.
00:15:47There's a lot more to come there with just improving performance
00:15:49and improving the DX of the agent use case.
00:15:54Awesome.
00:15:55Lots to look forward to.
00:15:56We did get a couple more questions.
00:15:58We will answer those async in the chat.
00:16:01Thank you so much for being here to talk to everyone, Praneet.
00:16:04Thanks so much, Amy.
00:16:05This was fun.
00:16:07All right, and thank you all for watching.
00:16:09We have another session with some folks from next Thursday.
00:16:14So come back Thursday.
00:16:15We'll see you then.
00:16:39[BLANK_AUDIO]

Key Takeaway

Vercel Workflows provides a portable, agent-ready runtime for long-running processes that simplifies state management by using standard JavaScript primitives and adapter-based persistence.

Highlights

  • Vercel Workflows enables long-running background processes using three primitives: steps, hooks, and sleeps.

  • The workflow SDK is open-source and supports portability through adapters for local file systems, PostgreSQL, and upcoming support for Cloudflare and AWS.

  • The Durable Agent class within the SDK automates complex agentic loops, including streaming tokens and executing tool calls.

  • Performance limits currently track event logs, where high-parallelism tasks exceeding 5,000 steps may experience 15 to 20-second latencies between operations.

  • Version 5 of the workflow SDK focuses on improving runtime performance to achieve O(1) suspension and resumption, eliminating replay limitations.

Timeline

Workflow Core Primitives and Agents

  • Workflows rely on three primitives: steps for state, hooks for external events, and sleeps for time-based waiting.
  • Durable Agent classes automate orchestration for repetitive tasks like LLM streaming and tool execution.
  • Background pipelines allow tasks to run across durations ranging from seconds to months.

The workflow architecture is designed to model complex background tasks as long-running code. The system manages the complexities of stateful execution, allowing agents to loop through LLM calls and tool usages reliably. This design moves the heavy lifting of orchestration from the developer to the workflow runtime.

Portability and Open Source Implementation

  • The workflow SDK is an open-source framework analogous to Docker for long-running code.
  • Existing adapters support local file system inspection, Vercel-hosted encrypted environments, and PostgreSQL persistence.
  • Community and team efforts are expanding support to include Cloudflare and AWS environments.

Users can run the same code across different environments by switching the backend via environment variables. The local development experience mocks production behavior by using the local file system for storage and an in-memory queue. This decoupling of the execution logic from the infrastructure layer ensures production workflows remain portable.

Agentic Integration and Scaling

  • AI coding agents natively understand the SDK because it uses standard JavaScript patterns.
  • Workflows can manage concurrency and process termination using primitives like distributed abort controllers.
  • Current event-log based architecture supports high-scale execution but faces latency issues beyond 5,000 steps.
  • Future SDK updates target O(1) suspension and resumption using snapshot-based runtimes.

Agents perform well with the SDK because the code relies on idiomatic JavaScript, such as Promise.race patterns for timeouts. While current performance is limited by the size of the event log, upcoming runtime improvements aim to remove the need for 'continue-as-new' workarounds. Developers can currently access existing complex examples, such as flight booking apps and v0's stop-button mechanism, through the public examples repository.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video