The Biggest Problem Of AI Coding Is Finally Solved

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00AI made coding accessible to everyone and people have started shipping code at a much

00:00:04faster pace.

00:00:05But at an even faster pace, security issues inside those apps started piling up.

00:00:09And in the past few months, things have actually gotten worse.

00:00:12There have been many instances when an agent deleted someone's entire project.

00:00:16Another agent deleted an entire production database while the developer was working on

00:00:20something completely unrelated.

00:00:22And there have been many similar issues like Apple's internal Clod.md being leaked.

00:00:26So tooling that can actually catch these issues matters more now than it did.

00:00:30Seeing this rise of issues, Versil just released a security harness to detect breaches in AI-generated

00:00:35applications called DeepSec.

00:00:37Now you might think Clod code can already do security reviews on its own with its agents.

00:00:42So why would you need DeepSec in the first place?

00:00:44It's because DeepSec is a structured tool that handles reviews far more systematically.

00:00:49Under the hood, it's using coding agents like Clod code and Codex.

00:00:52The tool is designed for scanning large repositories because it supports a parallel design that

00:00:57speeds up the workflow and batches code into multiple groups, which makes it perfect for

00:01:01reviewing large code bases.

00:01:03Now this is not built with cost-effectiveness in mind.

00:01:06They are using the most powerful models of Clod code and Codex, which are Opus 4.7 at

00:01:10max effort and GPT 5.5 at x high reasoning, both of which consume a lot of tokens.

00:01:16And with them running in parallel, the token usage piles up quickly, increasing cost.

00:01:20Several known apps have already run this harness on their code bases and reported good results.

00:01:25In the test they ran, the false positive rate of this tool is roughly 10-20%.

00:01:30This number is significant given how LLM accuracies usually are.

00:01:33Conversely, this means the agent is correct most of the time and its true positives are

00:01:37high.

00:01:38The architecture behind this is what makes it different.

00:01:40If you ask Clod code or any agent for a security review, it will start by directly scanning

00:01:45the code base and then produce a full review report.

00:01:48That not only takes a lot of time, but it also consumes a lot of tokens and the review

00:01:52might still miss things.

00:01:53So the first part of this workflow is scanning, performing a RegEx-only scan of all files for

00:01:58security-sensitive areas that the subsequent steps will focus on.

00:02:01RegEx detection matters here because the tool is designed for large code bases where there

00:02:06can easily be thousands of files.

00:02:08RegEx matching is a series of code patterns that match known areas likely to have security

00:02:13vulnerabilities and then filter those files out from the large pool.

00:02:16Once the large pool of files has been filtered, the next step is investigation using the agent.

00:02:21The agent is the expensive part consuming a lot of tokens and typically taking a long

00:02:25time depending on how big your code base actually is.

00:02:28So this tool splits all the files into batches and parallelizes them so they can be processed

00:02:32at the same time.

00:02:34Once that process is done, there's another step of revalidation where the investigation

00:02:37is checked again so that false positives are cross-checked.

00:02:40In case something was missed, it catches that and ensures the classification has been done

00:02:45correctly.

00:02:46This revalidation is actually optional.

00:02:47After that, the agent uses Git metadata and other sources to identify which people are

00:02:51responsible for which issues.

00:02:53Once all of that is done, the findings are stored as markdown or JSON so that they can

00:02:57be turned into tickets for humans as well as coding agents.

00:03:01Now as mentioned earlier, the files are grouped into batches with around 5 files processed

00:03:05together per batch.

00:03:06For each batch, a fresh prompt is assembled based on the identified framework along with

00:03:11other project information.

00:03:12These are then analyzed by the Clod Agent SDK or Codex Agent SDK whichever you have configured

00:03:17and they're given tools with read-only access to understand what the code base contains.

00:03:22Once they have the findings, everything is merged into a single file that is deduplicated

00:03:26and normalized.

00:03:27At the end, there's a follow-up step to make sure the analysis has actually covered everything.

00:03:31This architecture makes it effective because of its systematic process and structured analysis

00:03:36method and it helps identify issues far better than it could without the harness.

00:03:41So to test this out, we used an open source project that is a web application containing

00:03:45built-in security risks just for practice.

00:03:47We wanted to see if this tool was able to detect all of the issues in this repo on its

00:03:52own.

00:03:53This project contains 10 security issues with all the details available directly in the code

00:03:56itself including how to remove them.

00:03:58So to run deepsec, you first run the deepsec init command which installs the dependencies

00:04:03and creates a .deepsec folder and then you install the dependencies inside that folder.

00:04:08It also gives you a prompt that you need to paste into whichever coding agent you use.

00:04:12Since we were using claud code, we ran that prompt in claud which contains the instructions

00:04:16for creating a small info.md file that includes all the project information and is built around

00:04:21a specific template.

00:04:23You do not have to run this command in the project folder itself, you run it in the .deepsec

00:04:27folder because it instructs the agent to look in the previous directory and read all the

00:04:31information from it.

00:04:32The info.md file contains a general overview of what the code base does and what the authentication

00:04:37flow looks like, as well as the threat models, project specific patterns and all the known

00:04:42false positives inside the code.

00:04:44So once this file has been created, the next task is to run the deepsec scan command.

00:04:48This command is the regex matcher we previously talked about and it finds all the matching

00:04:52endpoints and lists all the filtered files containing potential security issues.

00:04:57This part happens fast because it's just code working in action.

00:05:00The next step is to run the deepsec process command.

00:05:02You can specify any API key of the model you want to use, whether it is the Vercel API gate

00:05:07way, codex or claud inside the .env.local file.

00:05:11But if you do not do so, like we didn't, it automatically defaults to the claud code subscription

00:05:16and uses your authentication instead of requiring any API key.

00:05:19It splits the project into batches and calls multiple tools on each one.

00:05:23After each batch, it gives a summary of how many tokens were used and what the estimated

00:05:27cost was.

00:05:28Now, if you are using a subscription, it will not charge anything beyond your subscription

00:05:32but it still provides an estimate for API costs.

00:05:35Since this is designed for large codebase reviews, it keeps reliability in mind.

00:05:39So in case there are any errors during the review, it does not restart everything from

00:05:43scratch and instead continues from the point where the error occurred.

00:05:46Once the scan has been completed, you run the deepsec report command and it generates a report

00:05:50in both JSON and Markdown format containing a general overview of all the findings categorized

00:05:55by severity level.

00:05:56Now, once this report has been generated, you can run the revalidation step.

00:06:00This step is entirely optional.

00:06:02You can run it if you want or skip it completely.

00:06:04Once you run it, it validates the findings to check whether the reports are false positives

00:06:08or not.

00:06:09After that has been done, you can export everything using the export command and it will write

00:06:13the findings into the findings folder.

00:06:15This findings folder contains the issues ordered by priority as folder names and creates one

00:06:20file per identified issue.

00:06:22It first lists the source of the issue meaning the exact file and the lines causing the issue,

00:06:26how severe the issue is and how confident the model was in identifying it.

00:06:30It also mentions which commit introduced the issue and assigns the user who committed it.

00:06:34It then explains the recommended fix, lists the revalidation results and mentions all

00:06:39the issues that were explicitly addressed.

00:06:41It also includes the steps to reproduce the bugs inside the findings.

00:06:44But this report still did not identify all of the issues, even though the tutorial was

00:06:48actually inside the code itself and it should have been able to identify them.

00:06:52So we iterated with Claude on why the original vulnerability lessons that were bundled into

00:06:56the app by design were not identified.

00:06:59Upon iteration with Claude, we found that the reason this tool only reported 3 findings was

00:07:03because of an explicit mention in the info.md file.

00:07:07DeepSec expected an app where the 10 vulnerabilities are already known and it only focused on issues

00:07:12besides them because they were already known, meaning it was actually trying to go beyond

00:07:16what was already known and only focus on other patterns so that the scan becomes much more

00:07:21effective and does not waste time and tokens on issues that are already documented.

00:07:25We then tested another app to see if it did better this time.

00:07:28We ran the same steps, starting from the scan to the processing stage.

00:07:32We did not run the revalidation part, we just created the report and exported it directly.

00:07:36And this time Claude's info.md file only contained details about the app and did not include statements

00:07:42like the previous one.

00:07:43Side by side, we also asked Claude to review the code and write a report.md file with a

00:07:48complete security review so we could compare which one actually performed better.

00:07:52So the report created by DeepSec found multiple bugs with different severity levels.

00:07:56It found 9 issues and created a detailed report along with recommended steps on how to fix

00:08:01them.

00:08:02And these recommended steps are what most other reports miss because this is what helps

00:08:05the agent understand how to fix the issue, which makes debugging much easier.

00:08:09But we noticed that Claude's report was much more detailed and highlighted 39 issues.

00:08:13So we asked it to create a diff first.

00:08:15The diff showed that Claude's number was larger.

00:08:18But we had already seen this during our testing with Codex.

00:08:20Claude tends to identify other issues in addition to the scope along the way.

00:08:24It does not solely focus on the scoped issues that DeepSec was specifically designed for.

00:08:29So once we asked it to focus only on scope, it narrowed the findings down to 13 issues.

00:08:34But there were still a few issues that DeepSec missed which were identified in Claude's report.

00:08:38The reason DeepSec missed a few findings is because it focuses only on issues that the

00:08:43code directly contains and that can be resolved directly from the functions themselves.

00:08:47It does not identify issues that might arise when the app actually runs, like cores related

00:08:52problems.

00:08:53It does not really focus on logical patterns and architectural decisions either.

00:08:57As we mentioned previously, it uses RegEx to filter out files first.

00:09:01So it mainly focuses on what is explicitly present in the code and not on issues that

00:09:05may occur dynamically when the application is running.

00:09:08Also if you are enjoying our content, consider pressing the hype button because it helps us

00:09:12create more content like this and reach out to more people.

00:09:15Now instead of running these steps one by one on our own, we've created this DeepSec skill

00:09:20which contains all the instructions on how to use Vercel's security scanner end to end

00:09:24and how it should identify from the user's prompt what is being asked.

00:09:28It then follows the entire step by step process and manages the whole harness on its own.

00:09:32It is also bundled with multiple assets, evals and references for all the issues, along with

00:09:37multiple scripts that might actually help with the working solution and the overall functioning

00:09:42of this repository.

00:09:43So with this in place, you can just run this security scan and specify which model you want

00:09:47to use and it will directly handle everything for you.

00:09:50It will run through all the steps we saw earlier along with addressing the issues that it missed

00:09:54previously and will be able to perform a much better security review by combining DeepSec's

00:09:59abilities while also covering the gaps in its findings.

00:10:02Now this skill along with all resources can be found in AI Labs Pro for this video and

00:10:07for all our previous videos from where you can download and use it for your own projects.

00:10:11If you've found value in what we do and want to support the channel, this is the best way

00:10:15to do it.

00:10:16The link is in the description.

00:10:17That brings us to the end of this video.

00:10:19If you'd like to support the channel and help us keep making videos like this, you can do

00:10:23so by using the super thanks button below.

00:10:25As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Vercel's DeepSec harness solves AI-coding security risks by using a structured multi-stage workflow—RegEx filtering, parallelized high-reasoning agent analysis, and revalidation—to provide a systematic review with only a 10-20% false positive rate.

Highlights

Vercel released DeepSec, a security harness designed to detect breaches and vulnerabilities in AI-generated code bases.
DeepSec achieves a false positive rate of 10-20%, which is significantly lower than standard LLM-based security reviews.
The tool uses a parallel processing architecture with high-reasoning models like Claude 3.5 Opus and GPT-5.5 at max effort levels.
The initial scanning phase uses RegEx-only matching to filter thousands of files into a manageable pool for expensive AI analysis.
The system generates comprehensive reports in Markdown and JSON, including exact line numbers, commit metadata, and step-by-step reproduction instructions.
DeepSec focuses on static vulnerabilities within functions rather than dynamic runtime issues like CORS or architectural logic errors.

Timeline

The Security Crisis in AI-Generated Code

Rapid shipping of AI-generated code has led to a proportional surge in critical security vulnerabilities.
Uncontrolled coding agents have caused catastrophic failures, including the deletion of entire production databases.
Internal sensitive files, such as Apple's internal documentation, have leaked due to improper AI tooling constraints.

While AI has lowered the barrier to entry for software development, it lacks the inherent safety checks required for production environments. Recent incidents demonstrate that autonomous agents can unintentionally destroy projects or expose private company data. DeepSec acts as a protective layer to catch these breaches before they reach production.

Architecture and Cost of DeepSec

DeepSec employs a structured, systematic review process that outperforms standard, unstructured agent scans.
The harness utilizes the most expensive high-reasoning models including Claude Opus 4.7 and GPT 5.5.
A parallel design batches code into groups of five files to accelerate the review of large repositories.

Standard AI agents often miss vulnerabilities when performing direct, one-pass scans of a code base. DeepSec is built for accuracy over cost-effectiveness, opting for maximum-effort reasoning models that consume high token volumes. This parallelized batching ensures that even repositories with thousands of files can be processed without the linear time delay of a single-agent scan.

The Three-Stage Security Workflow

RegEx-only scanning serves as the first filter to identify security-sensitive patterns without consuming AI tokens.
The investigation phase parallelizes identified files into batches for deep analysis by the Claude or Codex SDKs.
Optional revalidation cross-checks findings to eliminate false positives and ensure classification accuracy.

The workflow begins with a fast RegEx match to narrow the scope of the search to areas likely to house vulnerabilities. Following the AI-led investigation, the system uses Git metadata to assign responsibility for specific issues to the correct developers. The final output is deduplicated and normalized into markdown or JSON tickets for human or AI remediation.

Implementation and Practical Execution

The deepsec init command generates a .deepsec folder and a project-specific info.md template.
The info.md file contains the threat model, authentication flows, and known false positives to guide the agent.
The tool supports automatic resumption from error points to maintain reliability during long scans of large code bases.

Setup involves installing dependencies and creating a project summary that defines the application's boundaries. Users can provide specific API keys in a .env.local file or default to an existing Claude Code subscription. Reports categorize findings by severity and provide the exact source lines, commit history, and recommended fixes for every identified bug.

Performance Analysis and Scope Limitations

DeepSec intentionally ignores known vulnerabilities listed in info.md to focus on new or undocumented risks.
The tool prioritizes issues that can be resolved directly within the function source code.
Dynamic runtime issues and high-level architectural decisions fall outside the current scope of the RegEx-based filter.

Comparative tests show that while raw LLM scans might flag more issues, DeepSec is more effective at identifying high-priority, fixable code vulnerabilities. It avoids wasting tokens on documented issues by reading the provided project context. However, because it relies on static patterns for the initial filter, it may miss dynamic problems like CORS misconfigurations that only appear during application execution.

Integrated Security Skills and Resources

A specialized DeepSec skill automates the entire end-to-end harness management from a single prompt.
The skill bundles scripts, evaluation tools, and reference assets to fill the gaps in the native DeepSec findings.
Automated workflows address the issues DeepSec might miss by combining its systematic approach with broader LLM reasoning.

To streamline the process, a pre-configured skill handles all command-line steps and model configurations automatically. This integration allows users to run comprehensive reviews that combine the structured benefits of the DeepSec harness with additional logic to catch architectural flaws. These resources are available through the AI Labs Pro repository for direct project implementation.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video