Observability for the AI Cloud

VVercel
AI/미래기술창업/스타트업컴퓨터/소프트웨어

Transcript

00:00:00(upbeat music)
00:00:02- Hello everyone, thanks so much for coming today.
00:00:12I'm Malavika, and as you heard,
00:00:13I'm a product manager at Vercel.
00:00:15I hope you've had a great day at the conference.
00:00:18I've certainly been blown away
00:00:20by what all of you have been building
00:00:21with our open source tools and infrastructure primitives.
00:00:24It's been so nice to meet many of you in person
00:00:27for the first time and see some new faces
00:00:30as well as familiar faces.
00:00:31So today we've learned a lot about the AI cloud.
00:00:38By now you're probably tired
00:00:40of hearing about the AI cloud.
00:00:43You might've been thinking,
00:00:44gosh, can I just get a drink at happy hour already?
00:00:47But just as a reminder,
00:00:50in case the message just hasn't quite sunk in yet,
00:00:54Vercel is a unified platform for building, deploying,
00:00:58and running intelligent applications
00:01:01and the agents behind them.
00:01:02Vercel's mission has always been to abstract away
00:01:07the complexity of managing infrastructure
00:01:09and let you focus on building amazing user experiences.
00:01:14So when we think about how successful we've been
00:01:19in abstracting away complexity
00:01:21across the software development life cycle,
00:01:24we've done a really good job on making build time a breeze.
00:01:29Framework defined infrastructure means
00:01:31that you don't need to think
00:01:32about the underlying infrastructure primitives.
00:01:35There's no need for complex orchestration
00:01:38or infrastructure as code.
00:01:40We handle provisioning compute, networking, caching,
00:01:43and more so that you can focus on your application logic.
00:01:48But if we reflect on runtime,
00:01:50there's still a lot of work we have to do
00:01:53to make that as easy and breezy as build time.
00:01:56Sadly, managing an application at runtime
00:02:02is still quite a time-consuming effort for development teams.
00:02:06Like, how many in the audience here
00:02:09have dealt with a major incident in the last week?
00:02:12I don't see that many hands up.
00:02:15I'm surprised.
00:02:16I'm like, are you all using GCP or something?
00:02:20Maybe you there in the audience.
00:02:22You're probably feeling really good right now
00:02:25that you're self-hosting Next.js on a VPS in Hetzner.
00:02:29You're like, the cloud's not gonna bring me down.
00:02:33But there's so much manual toil
00:02:35associated with incident response.
00:02:38First, teams need to set up alerts and monitors
00:02:41to identify potential problems.
00:02:44These monitoring tools then in turn cause alert fatigue
00:02:47as teams try to identify high signal issues
00:02:50amongst all the noise.
00:02:52And then once we identify an issue,
00:02:55teams spend hours debugging the issue,
00:02:58trying to identify the root cause, and then apply a fix.
00:03:02So, actually here in the audience,
00:03:06we're gonna make this a little bit interactive.
00:03:08How much time do you think developers spend
00:03:11on debugging an incident?
00:03:13How many of you think it's under 20% of incident time
00:03:16is spent on debugging?
00:03:18- More. - More?
00:03:20I heard 80%.
00:03:22So we think it's almost as, who thinks it's more like 40%?
00:03:25Okay, we've got, someone thinks it's 60, 70?
00:03:33Okay, that seems to be maybe consensus.
00:03:3650%, actually turns out that 50% of incident time
00:03:41is spent on identifying the root cause
00:03:43and figuring out who should solve it.
00:03:46Like that's crazy, in 2025 with AI,
00:03:49we're spending hours of valuable developer time
00:03:52debugging issues.
00:03:54What if we could reduce that to seconds?
00:03:56Well, with Vercel Agent, you can.
00:04:01Our anomaly alerts monitor your application
00:04:04for suspicious activities.
00:04:06Out of the box, no configuration required.
00:04:09As soon as we detect unusual behavior,
00:04:11Vercel Agent investigates the issue,
00:04:14performs a root cause analysis, and diagnoses the problem
00:04:17in a matter of seconds.
00:04:20Unlike traditional observability tools
00:04:22or infrastructure providers, we have full context
00:04:25on your app.
00:04:26We built it, we deployed it,
00:04:28and we're running it on production.
00:04:30Even though we don't know your app as well as you do,
00:04:33and certainly not as well as your 10X engineer,
00:04:37we're uniquely positioned to give you an AI-native approach
00:04:41to ensuring reliability, performance, and security
00:04:44at runtime.
00:04:45Vercel Agent investigations build on top
00:04:50of our native observability tooling,
00:04:52which we've thoughtfully designed to give you visibility
00:04:56into runtime behavior with a build time context.
00:05:00Runtime logs give you granular visibility
00:05:05into application behavior.
00:05:07With runtime logs, you can trace through the life
00:05:09of an HTTP request to your application
00:05:12from the point in time that it ingresses
00:05:14into the Vercel network, all the way to the point
00:05:17that a response is returned to the client.
00:05:19We also provide you with opinionated dashboards
00:05:24out of the box, so you can quickly understand the health
00:05:27of your application, pinpoint issues,
00:05:30and optimize performance.
00:05:32With Vercel's recently launched anomaly alerts,
00:05:37now you can actively monitor your application
00:05:40for unusual activity, so that you can quickly
00:05:43identify and resolve issues.
00:05:45And lastly, with our query tool,
00:05:50you can explore the vast amount of metrics we collect
00:05:53and surface on your application.
00:05:55You can author queries to answer a range of questions,
00:05:59like what bots are crawling my application,
00:06:02all the way to quantifying the P90 time to first token
00:06:06of various model providers you're using
00:06:08in your application.
00:06:09Vercel agent investigations build on top
00:06:15of all of these features, reducing the need
00:06:18for manual exploration and bubbling up key insights.
00:06:22As we look to what's next for Vercel agent,
00:06:28our goal is to ultimately reimagine the way we interact
00:06:31with observability tools.
00:06:34Our vision is that the AI cloud will repair
00:06:37and optimize your application,
00:06:39not just inform you about issues.
00:06:42Fundamentally, we believe the AI cloud
00:06:49shouldn't just present you with problems,
00:06:51it should give you solutions, recommendations,
00:06:55pull requests, and automated actions.
00:06:58And that's the world we're building towards
00:07:00with Vercel agent.
00:07:03You can get started with Vercel agent today.
00:07:05As you heard, it's in public beta,
00:07:07and we're giving away $100 in free credits
00:07:10to all Vercel users to try out these new features.
00:07:13We launched the code review skill last month,
00:07:16and that's available to all Vercel users.
00:07:19Agent investigations are available starting today,
00:07:23and they're available to pro and enterprise customers
00:07:25who have observability plus.
00:07:27And you can visit the agent tab in the Vercel dashboard
00:07:30to get started.
00:07:33So I'm gonna shift gears a little bit.
00:07:35I actually want to spend time on another important topic,
00:07:37which is evals.
00:07:39So how many of you in the audience are using
00:07:41a dedicated AI observability product for evals?
00:07:44Oh, so I see actually very few hands up.
00:07:48Interesting.
00:07:49But for those of you who are,
00:07:51you know AI applications are non-deterministic.
00:07:55So it's very important that we're also monitoring
00:07:58the quality of the output.
00:08:00Agents daisy chain a series of reasoning steps,
00:08:03introducing even more complexity.
00:08:06And that's why we've seen a large ecosystem
00:08:08of agent frameworks that build upon open telemetry
00:08:12to help developers monitor, debug,
00:08:14and optimize their agent workflows.
00:08:17And I guess I saw very few hands up,
00:08:20but how many of you were collecting traces for evals?
00:08:23Okay, I see a few hands up in the audience.
00:08:27I see a few hands up.
00:08:27I'm sorry, I'm like peeking over here.
00:08:29It's quite bright.
00:08:30So if you're collecting traces using open telemetry,
00:08:34you can send those traces to any destination
00:08:38using Vercel drains.
00:08:40Vercel drains allows you to export all of your Vercel data
00:08:43to a third party observability tool.
00:08:46So if you're using a third party evals tool,
00:08:48you can export your traces there.
00:08:50To make it really easy for developers to run evals
00:08:55and monitor model quality,
00:08:57we've actually partnered with Brain Trust
00:08:59through the Vercel marketplace.
00:09:01With this new integration,
00:09:03you can automatically stream traces and evaluation data
00:09:06from Vercel to Brain Trust with just a few clicks.
00:09:10Earlier today, you may have heard more
00:09:13about our marketplace integration during a panel
00:09:16with Ankur Goyal, the founder of Brain Trust.
00:09:19To make it really easy for you to get started,
00:09:23we've actually got a demo app.
00:09:25Here you can scan this QR code.
00:09:27This demo app you can clone from our template library.
00:09:30It's an AI chat bot built with Next.js,
00:09:33the AISDK, and AI Gateway
00:09:35with traces being sent to Brain Trust.
00:09:38I'll give you all a second.
00:09:39Oh, am I on the way?
00:09:44I'm gonna get out of the way.
00:09:45Awesome.
00:09:47As you can see, our goal is to provide you
00:09:51with flexibility and control over what tools you use.
00:09:55While we're working hard to build really great
00:09:57native observability tooling,
00:09:59you'll always have the freedom
00:10:01to send your data wherever you want.
00:10:03The most important thing is that you're able
00:10:05to build great user experiences
00:10:07and turn unpredictable code into reliable systems.
00:10:11And with that, that's the end of today.
00:10:15If you catch me at the happy hour,
00:10:17I'd love to learn more about what you're building.
00:10:19Thanks so much.
00:10:20(audience applauding)
00:10:22(upbeat music)
00:10:25(upbeat music)

Key Takeaway

Vercel is introducing AI-powered observability and automated incident response capabilities to transform runtime application management, reducing hours of debugging to seconds while providing developers full visibility and control over their AI applications in production.

Highlights

Vercel's AI Cloud platform abstracts away infrastructure complexity to let developers focus on building intelligent applications and agents

Vercel Agent uses anomaly detection and AI-powered root cause analysis to diagnose production issues in seconds instead of hours of manual debugging

Native observability tooling includes runtime logs, opinionated dashboards, anomaly alerts, and query tools that provide full application visibility with build-time context

The vision for the AI Cloud is to move beyond problem identification toward automated solutions, including pull requests and automated remediation actions

Vercel Drains enables flexible data export to third-party observability tools, with a new Brain Trust integration for seamless evaluation data streaming

50% of incident response time is spent on identifying root cause and determining ownership, representing a critical efficiency opportunity for AI-driven solutions

Timeline

Introduction and Vercel's AI Cloud Mission

Malavika, a product manager at Vercel, opens by welcoming the conference audience and establishing the context for the discussion. She emphasizes that Vercel is a unified platform for building, deploying, and running intelligent applications and agents, with a core mission to abstract away infrastructure complexity and let developers focus on building amazing user experiences. The speaker acknowledges that the AI Cloud concept has been emphasized throughout the conference and sets up the discussion by noting that while Vercel has succeeded in abstracting complexity at build time, there is still significant work to be done at runtime. This introduction frames the need for better runtime observability and incident management as the central challenge Vercel aims to address.

The Complexity of Runtime Management and Incident Response

The speaker highlights the substantial manual effort required for runtime application management, comparing it unfavorably to the streamlined build-time experience. She describes the incident response workflow: teams must set up alerts and monitors to identify problems, then deal with alert fatigue as they filter signal from noise, and finally spend hours debugging to identify root cause and apply fixes. Through an interactive audience engagement, Malavika reveals that approximately 50% of incident time is spent on identifying the root cause and determining ownership, which she emphasizes is wasteful given current AI capabilities. She poses the question: 'What if we could reduce that to seconds?' to introduce the solution Vercel is building, establishing the urgency and business value of addressing this pain point.

Introducing Vercel Agent Investigations

Vercel Agent is introduced as the company's solution to dramatically reduce debugging time through AI-powered anomaly detection and root cause analysis. The speaker explains that Vercel Agent monitors applications for suspicious activities out-of-the-box with no configuration required, and when unusual behavior is detected, the agent investigates automatically, performs root cause analysis, and diagnoses problems in seconds. A key competitive advantage is that Vercel has full context on applications since they build, deploy, and run them in production, positioning them uniquely to provide AI-native approaches to ensuring reliability, performance, and security. Vercel Agent investigations build on top of native observability tooling including runtime logs, opinionated dashboards, anomaly alerts, and a query tool that allows developers to explore vast amounts of collected metrics and answer detailed questions about their applications.

Native Observability Features and Tools

The speaker details the components of Vercel's native observability tooling designed to provide comprehensive visibility into runtime behavior while maintaining build-time context. Runtime logs offer granular visibility into application behavior and allow developers to trace HTTP requests from ingress into the Vercel network to the response returned to the client. Opinionated dashboards are provided out-of-the-box to help developers quickly understand application health, pinpoint issues, and optimize performance. Anomaly alerts enable active monitoring for unusual activity, allowing teams to identify and resolve issues quickly without waiting for users to report problems. The query tool empowers developers to explore metrics and author custom queries to answer specific questions, ranging from identifying bot traffic to quantifying performance metrics like P90 time to first token for various model providers, giving developers granular control over what they monitor.

The Future Vision: From Insights to Automated Actions

The speaker articulates Vercel's vision for the future of the AI Cloud, emphasizing a fundamental shift from reactive monitoring and information delivery to proactive problem-solving. Rather than simply identifying and informing developers about issues, the goal is for the AI Cloud to actively repair and optimize applications by providing solutions, recommendations, pull requests, and automated actions. This represents a significant evolution beyond traditional observability platforms that only present problems, moving toward an autonomous system that can implement fixes and improvements. The vision acknowledges that the current generation of Vercel Agent represents a starting point, with automatic remediation and pull request generation as features the company is building toward, representing the ultimate goal of reducing developer toil in production incident management.

Availability, Pricing, and Getting Started

The speaker provides practical information about how developers can access Vercel Agent features and what tier of service includes which capabilities. Vercel Agent is in public beta with $100 in free credits offered to all Vercel users to try the new features. The code review skill launched previously and is available to all Vercel users. Agent investigations are available starting immediately and are accessible to pro and enterprise customers who have observability plus. Developers can access and configure these features by visiting the agent tab in the Vercel dashboard, making it straightforward to begin using the new capabilities without requiring additional setup or configuration.

AI Evaluation Monitoring and Third-Party Integrations

The speaker transitions to discuss the importance of monitoring output quality in non-deterministic AI applications, noting that agents introduce even more complexity through their multi-step reasoning processes. She highlights that a large ecosystem of agent frameworks has emerged using OpenTelemetry to help developers monitor and debug agent workflows. To address this need while maintaining developer flexibility, Vercel has partnered with Brain Trust through the Vercel marketplace, allowing developers to automatically stream traces and evaluation data from Vercel to Brain Trust with just a few clicks. The company provides a demo app (an AI chatbot built with Next.js, AISDK, and AI Gateway with traces sent to Brain Trust) that developers can clone from the template library to quickly get started with evaluation monitoring. This approach reflects Vercel's philosophy of providing both native observability tooling and flexible integration with third-party tools through Vercel Drains, ensuring developers have freedom to choose their tools while being able to easily collect and export their data.

Community Posts

View all posts