Shanon: The Open Source AI Pentester Powered By Claude Code

BBetter Stack
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00This is Shannon, an open-source autonomous AI pen tester that performs code analysis and executes
00:00:05live exploits using browser automation to find all kinds of security vulnerabilities from server-side
00:00:11request forgery to cross-site scripting to SQL injection and much much more, giving you a detailed
00:00:17comprehensive security report with zero false positives. But with the announcement of Claude
00:00:22code security and also the fact that Shannon is built on the Claude SDK, meaning you can't use
00:00:27your subscription, is there any point in learning at all that might not be around for that long?
00:00:31Hit subscribe and let's get into it.
00:00:32In one of my previous jobs, we pay thousands of dollars to external pen testers before a
00:00:38major release only to find out there were bugs we needed to fix and then we'd get it tested again,
00:00:43costing us lots of time and of course money. But this is exactly what Shannon exists to fix.
00:00:48You can run Shannon as many times as you want. You can even put it in a CI/CD pipeline and run
00:00:53it automatically. And because it's open source, it's completely free. Well, there is a paid version,
00:00:58which we'll talk about later. But as someone who's not a security expert, I'd rather run my project
00:01:03through Shannon than boot up Kali Linux. In fact, let's see Shannon in action. So Shannon is built
00:01:08using the Anthropic Agent SDK. So you're going to need a Claude API key for it to work. Unfortunately,
00:01:13the subscription won't work either, but I've got it installed on a VPS using a non-root user and I'm
00:01:20going to run it against the OASP juice shop, which is an app designed to have loads of vulnerabilities
00:01:25for testing reasons. Now I've already cloned the Shannon repo, which you'll need if you want to run
00:01:30it. And for this to work, you'll have to have your repository you want to test inside the Shannon
00:01:34repos directory. So I've got juice shop inside here. And with the juice shop project running,
00:01:39I'm going to run this command, which will connect to the app running locally for browser testing
00:01:44and connect to the repo inside the repo directory to scan through the code. Now, if this is the first
00:01:50time you're running Shannon, because it uses Docker compose, it will first have to pull a bunch of
00:01:54images from Docker hub. But because I've already gone through that process, it just jumps straight
00:01:58to here. We get a link to the temporal workflow and we can view it using the web UI, which looks
00:02:03like this, showing all the steps that need to take place. Or we could run this command to view the logs
00:02:07in real time, which I sometimes prefer since the web UI doesn't always show the most information.
00:02:12But wait, what's temporal? I thought we were talking about Shannon. Well,
00:02:16Shannon pen tests can take one or many hours depending on the size of the project and temporal
00:02:21ensures durable execution, no matter the scenario. So if your computer crashes midway through a pen
00:02:26test or you run out of cloud credits and need to top up, you don't lose any progress. Temporal
00:02:32remembers exactly where you left off and restart Shannon from that checkpoint. Let me know in the
00:02:36comments if you want a dedicated video about temporal, but it also orchestrates all of Shannon's phases
00:02:42and activities. And even though there are only five phases, a lot of things happen inside them.
00:02:47Let me show you. Starting with the pre-flight phase that makes sure API credentials are valid.
00:02:53Docker containers are ready and the repo actually exists. Then the pre-recon stage, which analyses
00:03:00the code to understand how the app works. So architecture, mapping entry points and security
00:03:05patterns. Next is the actual recon stage, which is very different from the pre-recon because here
00:03:12playwright is used to navigate through the app. So it will click buttons, fill in forms and use
00:03:18that to observe network requests, take screenshots, look at cookies, basically map out all the
00:03:24functionality of the app. And then phase four does five pipelines in parallel. So here we have
00:03:31injection related vulnerabilities and exports, then cross-site scripting, vulnerabilities and exports,
00:03:38then authentication, server-side request forgery. And finally we have authorization. So accessing
00:03:45privileged data or other people's information. And all of this happens in parallel on five different
00:03:52agents for vulnerability and then another five for exploits. And finally we have phase five, which
00:03:59compiles everything into a comprehensive pen test report by combining the last five checks. Speaking
00:04:07of report, let's see how our pen test is coming along. So after almost two and a half hours, the
00:04:12full process is complete and we can see here it started with the pre-flight validation before
00:04:17moving on to the pre-recon and then the recon agent. And then here it runs all the vulnerability
00:04:25checks. So we've got the injection vulnerability agent, cross-site scripting, authorization, SSRF.
00:04:31And you can see for some of these, the green line is not solid. This is because it had to retry
00:04:36because I ran out of cloud credits. So you can see there's a two here and for these ones, there were
00:04:40any retry. So it may have been faster than two and a half hours if it wasn't for these retries, but
00:04:46I don't think it would have been less than two hours. Anyway, after it's done the five vulnerability
00:04:51checks, it then moves on to do the five exploit checks. So we can see SSRF here, we've got the
00:04:56auth exploit, we've got the injection and so on. And once it's done all of those, we can see the
00:05:02auth exploit takes the longest. It then wraps everything up using the report agent. Now, of
00:05:07course, if we wanted to, we could expand all of this to see more information about each stage, but
00:05:13I'm no expert on temporal and I'm sure if we go through the documentation, there'll be a lot more
00:05:17on how to use the platform. But let's now take a look at the final report Shannon has generated.
00:05:22So here in the deliverables directory of our juice shop project, we can see the list of all the
00:05:28reports it's generated. And it's a lot more than I thought it would do. So let's first take a look at
00:05:33this report, which is the auth analysis. And you can see it has a summary at the top. And over here,
00:05:37it's noted that there are 11 critical vulnerabilities identified and we can see what they are.
00:05:43So zero out of six authentication endpoints enforced HTTPS, which makes sense because I was
00:05:47running it locally. And then we also have the proper cusp control, which it was missing.
00:05:52And the authentication endpoints didn't have adequate rates. They're missing. This is really
00:05:56detailed. I mean, if you scroll down, we can see exactly what the problems were, where they were,
00:06:01and the endpoints that caused them. Now I'm not going to bore you and go through every single
00:06:05report, but let's go through the summary, which is called comprehensive security assessment reports.
00:06:10And inside here, we have details about the model that was used, the scope of the project. And now
00:06:15if we scroll down, we can see that it found four critical auth vulnerabilities that were fully
00:06:21exploited and lists them down here. Wow, this is very thorough, but take a look at this. If we
00:06:26scroll down even further, it gives us a summary of the report. So this is the first idle one
00:06:31and scroll down even more. We can see exactly how an attacker could exploit this. So the exact curl
00:06:38command they could run with the details and the type of information they could extract. And this
00:06:43level of detail exists for every single vulnerability, which goes to show how much
00:06:48detail went into the assessment. Now, if you're interested, I'm going to leave a link to all the
00:06:54reports inside this description. But two and a half hours is a really long time for Claude Sonnet to
00:06:59scan through a repo. Is there anything Shannon Pro could have done to help? So it doesn't look like
00:07:04Shannon Pro could help with the speed, but it does do some other things like provide CSVV scoring,
00:07:09which the basic or free version does not contain. It has CI/CD pipeline support, API access. And more
00:07:16importantly, if you're an enterprise user, you get all the things you'd expect, including OASP
00:07:22compliance reporting, as well as SOC 2 and PCI DSS. So even though two and a half hours is a really
00:07:27long time, I've done some research and found out the first run of Shannon takes the longest and then
00:07:32subsequent runs are much faster. Now I know what you're thinking. Almost two and a half hours running
00:07:37Claude Sonnet 4.6 on a single pen test. How much did that cost in credits? Let's just say a lot.
00:07:43I topped up about $66 and ended up having this much left. So almost $60 in Claude credits was
00:07:50spent running this pen test, which is cheaper than hiring a human tester, but it's still a lot of
00:07:55money. And I would have loved to use my Claude Pro or Max subscription, which would have made the whole
00:08:00thing a lot cheaper, which is hopefully what Claude's code security will allow you to do when
00:08:05it's properly released, unless the team at Keygraph end up rewriting Shannon in something like the open
00:08:10AI agent's SDK or use the vacel AI SDK, which allows you to use many more models. But overall,
00:08:16if you're a startup and don't want to spend a lot of money on a human pen tester, then Shannon is a
00:08:21good alternative. If you're an indie hacker with even less money, then maybe hold off and just
00:08:26release the products to see if people actually use it. And while we're on the topic of AI and security,
00:08:30if you want to know how to safely install OpenClaw on a VPS,
00:08:34then check out the next video where I go through step-by-step exactly how to do that.

Key Takeaway

Shannon offers a sophisticated, automated alternative to human pentesting by leveraging Claude's AI to conduct deep code analysis and live browser-based exploit testing for a fraction of the traditional cost.

Highlights

Shannon is an open-source autonomous AI pentester capable of code analysis and executing live exploits via browser automation.

The tool identifies a wide range of vulnerabilities including SQL injection

Timeline

Introduction to Shannon and the Problem It Solves

The speaker introduces Shannon as an autonomous AI pentester designed to find vulnerabilities like server-side request forgery and SQL injection. He explains that traditional human pentesting is prohibitively expensive, often costing thousands of dollars per session before a major release. Shannon aims to solve this by being an open-source tool that can be run as many times as needed, even within a CI/CD pipeline. The speaker notes that as a non-expert, using Shannon is more accessible than manually operating complex security suites like Kali Linux. This section establishes the value proposition of using AI to automate high-stakes security audits.

Installation, Requirements, and Temporal Integration

This segment covers the technical setup, highlighting that Shannon is built on the Anthropic Agent SDK and requires a Claude API key rather than a standard subscription. The speaker demonstrates running the tool against the OWASP Juice Shop, a deliberately vulnerable application used for testing security software. A critical component discussed is Temporal, which provides durable execution to ensure that the pentest can resume from a checkpoint if a crash or credit exhaustion occurs. Temporal orchestrates the various phases of the pentest, which can take several hours depending on the project's complexity. The speaker emphasizes that this reliability is essential for long-running autonomous agents.

The Five Phases of an AI Pentest

The speaker breaks down the Shannon workflow into five logical phases to explain how the AI thinks and acts. Phase one is the pre-flight check for credentials and Docker readiness, followed by pre-recon which maps the application's architecture and entry points. In the third phase, Shannon uses Playwright to navigate the app, filling out forms and observing network requests like a human tester would. The fourth phase involves five parallel pipelines targeting injection, XSS, authentication, SSRF, and authorization vulnerabilities. Finally, phase five compiles all findings into a structured report, merging data from the vulnerability and exploit agents.

Execution Results and Deep Dive into Security Reports

After a two-and-a-half-hour run, the speaker reviews the results, noting that retries occurred due to API credit limits. The generated reports are found in the deliverables directory and provide an incredibly high level of detail, including summaries of critical vulnerabilities. For example, the tool identified missing HTTPS enforcement and inadequate rate limiting on authentication endpoints. Each vulnerability includes a summary, the specific code location, and a concrete curl command demonstrating how an attacker could exploit the flaw. This section showcases the "zero false positive" claim by showing how the AI proves its findings with actionable evidence.

Shannon Pro, Cost Analysis, and Final Verdict

The final section addresses the costs and the differences between the free and Pro versions of Shannon. The speaker reveals that the single pentest cost approximately $60 in Claude API credits, which is significantly cheaper than human labor but still a notable expense for individual developers. Shannon Pro is introduced as a solution for enterprises needing CVSS scoring and compliance reports like SOC 2 or PCI DSS. While the speaker acknowledges the high cost of running Claude 3.5 Sonnet, he suggests that future updates or different SDKs might lower the price. Ultimately, he recommends Shannon for startups as a cost-effective alternative to human pentesters while advising hobbyists to weigh the API costs carefully.

Community Posts

View all posts