Nvidia's New Tool Just Fixed Agent Skills

AAI LABS
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Right now, AI agent skills are everywhere. Every agent runs them and you trust them without any
00:00:05checks. But here's the scary part. Researchers studied over 30,000 of these skills and more than
00:00:10a quarter of them had a security vulnerability. So NVIDIA built a tool called Skill Spectre that
00:00:15scans any skill before you install it and tells you exactly how dangerous it is. But here's where
00:00:20it gets interesting. One type of attack can slip right past it and the setting that actually
00:00:24catches it is off by default, so most people never even know it's there. Turning that on normally
00:00:29costs money, but we found a way around it. And by the end, we didn't just scan skills. We built a
00:00:34whole workflow that changes how you find and install them for good. Now before we get into the full
00:00:39workflow, let's give you a quick tour of the tool and what you need to use it. So these are the install
00:00:44commands in the GitHub repo. You can just copy them and hand them to Claude Code and it'll basically
00:00:49install and set up the whole thing for you. Claude Code's gonna install all the dependencies you can
00:00:54see right here. And once all that's done, you can start using Skill Spectre. Inside the GitHub repo,
00:00:59there's this test folder and inside that they've got some dangerous skills you can actually run it on to
00:01:04confirm the tool works. So we ran it on these skills and with every one of them, it tells you not to
00:01:09install. The higher the score, the more dangerous the skill. And with each test, it doesn't just give
00:01:14you a number. It shows you the exact line number, the exact location and the file name where the conflict
00:01:19is, which is basically what pushed the score up. Now this isn't the only way to use the tool, it's got
00:01:24another mode. But before you get why we even need that second mode, you need to know two things: how a skill
00:01:30even attacks you and how this tool actually catches that attack. Now there are 14 categories,
00:01:34but to keep it simple, we've grouped them into six similar ones. So the first way a skill can attack
00:01:39you is with hidden instructions. See, a skill is just a text file full of instructions and your agent reads
00:01:45the whole thing and treats it as orders. The problem is, a bad skill can hide extra instructions in there that
00:01:50you'll never see, but the agent does. They tuck them inside comments, or they use invisible characters,
00:01:55or they scramble the text into a code that looks like nonsense to you, but the AI reads it just fine.
00:02:01So the scanner is built specifically to hunt these hidden instructions down and find them. The second
00:02:06way is impersonation. So your agent has tools it trusts and reaches for by name. Say there's one just
00:02:12called "read" that reads a file for it. So a malicious skill gives its own tool that exact same name,
00:02:17and your agent grabs the bad one thinking it's the safe one it already knows. And the way they pull
00:02:22it off is sneaky. They swap one letter for a lookalike from another alphabet. So they name it "read",
00:02:27but the "A" is actually a Russian letter that looks identical to ours. To you and to your agent at a
00:02:33glance, it's the same word, but underneath it's a completely different tool. And the scanner catches
00:02:38this by checking the real identity of every single character, so it spots that one fake letter and
00:02:43flags it. The third way is when the skill just lies about what it does. The description says one thing,
00:02:48the code does another. So it calls itself a simple formatter and then quietly reaches out to the
00:02:53internet in the background. Or it says it only needs permission to read your files, but the code is
00:02:58actually writing files and running commands too. And this one's way harder to catch. This is where that
00:03:03second mode comes in, but we'll get to that later. The fourth way is the skill steals your credentials.
00:03:08This could be your API keys, your passwords. So a skill goes through all the keys saved on your
00:03:13machine, scoops them up, and sends them off to some server. The fifth way is the skill just runs
00:03:18straight up malware. This includes things like a reverse shell, which basically hands a stranger
00:03:23remote control of your whole computer. And because this kind of malware has known fingerprints,
00:03:28the scanner just matches the code against a big library of those fingerprints. And the sixth way is
00:03:32poison dependencies. So a skill will often use a CLI tool, basically a small outside program it runs in
00:03:39the terminal to handle part of its job. And a bad skill grabs a piece that's actually malicious.
00:03:44Maybe it's a fake package with a name that's one typo off a real popular one. So you pull the wrong
00:03:49one and it runs malware like the last type. So the scanner checks every package the skill pulls in
00:03:54against a live database of known bad ones. And it flags the fake names and those download and run
00:03:59commands to keep your system safe. So in that first mode, it's just matching patterns without any context,
00:04:05which means it ends up flagging stuff that's completely fine. And those are what we call false
00:04:09positives. So that's where the second mode comes in the AI scan and turning it on is simple. You just
00:04:14drop this no LLM flag and it does the second scan here. But if you look inside the code, you'll find out
00:04:20that to run an AI check on a skill, you need to plug in an open AI key. So to get around that cost,
00:04:26we just use Claude Code itself to run that AI check. Now the main agent in Claude Code doesn't actually
00:04:32do it itself. We use Claude's headless mode, which is basically Claude Code running in the background
00:04:38with no chat window, just executing commands on its own. And we're sure most of you know it isn't free,
00:04:43but you do get monthly credits for it with your Anthropic plans. And you can just ask Claude Code to
00:04:48make the change we just talked about and it'll do it for you. Of course you might hit a bug or two,
00:04:52but it's just a single line prompt Claude can set up for you. And if you're enjoying the video so far,
00:04:57subscribe to the channel and hit the hype button. This small gesture of support goes a long way for us.
00:05:03So they've also got dangerous skills in their test folder that actually need the AI check. When you
00:05:07run the no LLM check on one of them, the score comes out as zero, which means it's perfectly safe.
00:05:12But the second you run it with the AI check, the score jumps to 100, it tells you not to install,
00:05:17and it lays out exactly why. But what if instead of just detecting the problems in a skill,
00:05:22the scanner also helped you fix them. So that's exactly why we turned the scanner into a skill. And
00:05:27you might be wondering why is it called Discover Skills? Well, because we didn't just make one
00:05:31separate skill. We made a whole process that helps us discover more skills and make sure they're safe
00:05:36before we install them. So we've been using skills.sh to find new skills for a while now. It's basically a
00:05:42git repo built specifically for skills. So one big shared library you can pull from. And we think they
00:05:47recently shipped a CLI update. So now Claude can just run search queries straight through the command
00:05:53line and pull the best skills it needs before installing anything. And we wanted our scanner
00:05:57running on top of that. So in here, we've got scan.sh, which is the script that actually runs
00:06:02skill specter. Since skill specter is a CLI tool, it has to be run as a command. So we made a whole
00:06:08script and we baked the Claude headless mode fix right into it. So by default, it runs the normal
00:06:13check, but if you want, it'll run the AI check too. And if you open up skill.md, you can see the basic
00:06:19steps laid out. It identifies the target, then scans it, then it shows you the findings. And once it knows
00:06:24what the problems are, it goes ahead and fixes them, then runs the whole loop again after to make
00:06:28sure everything's clean. So for example, this folder we're showing you right now is our AI labs design
00:06:34folder. It's basically our whole design process compressed into one folder with a bunch of skills
00:06:39inside. We've got a whole video on this. And on top of that, the whole system's available in AI labs
00:06:44pro, which is our community. So if you want to support the channel and grab this whole design system,
00:06:49go check it out. And this discovery skill is going to be uploaded in there too. The link's going to be
00:06:54in the description, but we're building on top of this here. So we're adding a new make design.md skill,
00:06:59which lays out the fastest way to pull design tokens out of an app you've already built, basically the
00:07:04colors, fonts, and spacing rules, and merge them into a design.md file. So here we wanted to create
00:07:10the design.md file. So we told it that we wanted to improve it and that it should go search for other
00:07:15tools out there. So it used skills.sh, then we loaded the discovery skill and that pulled back a
00:07:21handful of skills. These are the skills it brought back and the first two looked interesting. So we wanted
00:07:26to dig in. We asked it to install and test both of them. And just like the discover skills workflow
00:07:31says, it won't install any skill without scanning it first. So it installed them and read through them
00:07:36and told us straight up that neither one was going to help with the make design.md skill. But from a
00:07:41safety point of view, the first one got a score of 10, which meant it was safe, and the second got a
00:07:46100, which meant don't install it. So we told it to run the AI check on that second skill. It ran it again
00:07:52through Claude's headless mode and this time the score came back as zero. This means that the skill
00:07:56was safe to use. And that's the whole point of this system. You're not just grabbing skills blindly off
00:08:01the internet. You have a whole process that you can kick off just by using a skill. Now let's have a
00:08:06word from our sponsor. Nimblist. If you use Claude code or codex, you know the problem. You've got multiple
00:08:12sessions running, files changing everywhere, and you're constantly switching between terminal, browser,
00:08:17and editor just to keep track of what your agents are doing. Nimblist is an open source visual workspace
00:08:23that puts everything in one place. I had three agents working on different parts of a project at
00:08:28the same time and instead of jumping across windows, I could see all of them on a Kanban board, jump into
00:08:33any session, review code changes as red and green diffs, and approve or reject them individually. I was
00:08:38editing markdown docs, UI mockups, and architecture diagrams visually right alongside my agent. When I was
00:08:45done, I didn't have to clean up commits manually because it generated git commit messages automatically
00:08:50based on what changed. Tasks stayed connected to the actual sessions and there's even a mobile app to
00:08:56continue the session while you're away from your desk. Nimblist is completely free and open source
00:09:00and you can check it out by using the link in the pinned comment. That brings us to the end of this
00:09:05video. If you'd like to support the channel and help us keep making videos like this, you can do so by
00:09:10using the super thanks button below. As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Implementing a automated security scanning workflow like Skill Spectre, which combines pattern matching with AI intent analysis, prevents the installation of malicious AI agent skills that often harbor hidden instructions or deceptive file permissions.

Highlights

  • Security analysis of over 30,000 AI agent skills revealed that more than 25% contain security vulnerabilities.

  • Skill Spectre scans potential AI skills for risks before installation, flagging specific file locations and line numbers for identified conflicts.

  • Malicious skills often employ hidden instructions in comments, character spoofing using lookalike letters from other alphabets, and unauthorized API key exfiltration.

  • The tool utilizes a two-mode scanning system: pattern matching for known malware signatures and an AI-driven check for intent-based deception.

  • Claude Code's headless mode provides an alternative to paid OpenAI keys for performing the AI-driven security validation of skills.

  • A structured 'Discover Skills' workflow automates the search, verification, and installation process for AI agent skills to eliminate manual, blind reliance on third-party code.

Timeline

Vulnerability landscape of AI agent skills

  • Researchers identified that over 25% of 30,000 examined AI skills contained security vulnerabilities.
  • Skill Spectre functions as a scanning tool to evaluate the safety of AI skills prior to local installation.
  • The scanner provides granular feedback, including exact file names, line numbers, and danger scores for detected threats.

AI agents rely on external skills often installed without adequate security checks. Skill Spectre mitigates this risk by analyzing code structure and providing clear, actionable danger reports. The tool integrates easily via GitHub and command-line instructions, allowing users to verify potentially dangerous code before deployment.

Threat vectors and detection mechanisms

  • Hidden instructions embedded in comments or scrambled text allow malicious skills to manipulate AI agent behavior.
  • Impersonation attacks utilize character spoofing, replacing standard characters with lookalikes from other alphabets to deceive agents.
  • Malicious skills frequently perform unauthorized actions, such as stealing API keys, executing reverse shells, or pulling poisoned dependencies.

Skills attack users through deceptive methods ranging from hidden code execution to spoofing tool names. The scanner identifies these threats by validating character identity, matching code against known malware fingerprints, and checking external package requests against databases of known malicious sources.

AI-driven validation and automated workflows

  • Pattern matching alone can produce false positives, requiring a secondary AI-based scan to verify intent.
  • Claude Code's headless mode allows for AI-driven security checks without incurring additional OpenAI API costs.
  • Integrating the scanner into a 'Discover Skills' process ensures that every skill is verified before it is permitted to run in the environment.

Standard scanners can misinterpret safe code as malicious. By adding an AI-driven scan, the system can distinguish between safe functions and genuine threats. This entire process is automated into a workflow where skills are searched, scanned, and only approved if they pass safety protocols.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video