I Updated /grill-me And Solved Claude Code

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Plan mode is not enough. Skills like Matt Pocock's Grill Me or even larger orchestration layers like

00:00:06GSD or superpowers are all trying to solve the same problem. Take that fuzzy idea in your head

00:00:11and turn it into something Claude Code can actually build. But no matter what path you go

00:00:16down or what skill you choose, they all run into the exact same problem. You're relying on a single

00:00:21model to not only plan and build, you're relying on a single model to actually grade its own work.

00:00:26So when you ask Claude, hey, was this the optimal path forward? What's it going to say? Well,

00:00:31it's going to say it was great no matter what you did. And this is a problem because if you do not

00:00:35come from a technical background, you don't actually know if what Claude wrote actually makes sense.

00:00:41But in this video, I'm going to show you how to fix that. We're going to build upon Matt Pocock's

00:00:45Grill Me skill and we are going to bolt on an adversarial code review from Codex. But it's a

00:00:51code review that goes well beyond the Codex plugin you've seen in the past. This code review is

00:00:55iterative. Claude Code and Codex are going to be talking to one another through multiple rounds

00:01:00to get you to a place where both leading AI tools sign off on your plan. So you actually can feel

00:01:07confident that what Claude Code came up with actually makes sense. And with this skill, you're going to be

00:01:12able to start every project with two things. One, a plan that you actually understand. And two,

00:01:18a plan that multiple AI tools have signed off on. So what you're going to get today are two skills for me.

00:01:23And both of those skills are built on the backs of what Matt Pocock gives us here in his GitHub repo.

00:01:28He has two skills, Grill Me and Grill with Docs. The two skills I'm going to give you are Grill Me Codex

00:01:35and Grill with Docs Codex. So what's happening? Well, Grill Me and Grill with Docs are essentially a plan

00:01:41mode on steroids. Just like GSD, just like superpowers, it takes it a step further. The questions it asks are

00:01:48deeper. It's going to give you better insight into what you're actually trying to build because whether you

00:01:53want to admit it or not, you probably suck at actually articulating what you want. And if you

00:01:57can't articulate what you want to Claude Code in the beginning, you're going to have a lot of

00:02:01assumptions on the AI side, which give you a mediocre product on the back end. So Grill Me and Grill with

00:02:07Docs give you better outputs by going deeper in the planning phase to make sure you're all on the same

00:02:12page. What my skills are going to give you is a second phase to that, where after you and Claude

00:02:19Codex have gotten on the same page, Codex comes in and says, hey, that makes sense. That doesn't fix

00:02:24this, fix that. And then Claude Code and Codex go back and forth. And I think this is important because

00:02:28stuff like Grill Me, GSD and superpowers, they identified this gap right here, this gap between you

00:02:34and Claude Code, where you have an idea, you can articulate it, we're going to go back and

00:02:38forth, we're going to get on the same page, right? Grill Me, you know, GM is perfect for this.

00:02:44The problem is, even if you and Claude Codex are on the same page, does that mean we are automatically

00:02:51on a journey to this place of optimal code, where this is what should actually be built? Maybe, maybe

00:02:57not, who's to say? You probably aren't to say, are you an expert software engineer? You might be,

00:03:03but I would guess most of the audience watching this does not fall into that camp.

00:03:08And stuff like Matt Pocock's thing, as great as it is, I mean, like, it's skills for real engineers.

00:03:13Are you a real engineer? Probably not. Maybe you are. If you're not, will you fall into the problem

00:03:19where you can't even evaluate what Claude Codex has written? Even if you're on the same page,

00:03:23it could be trash, it could be amazing, who knows? And the other issue is, you can't judge it,

00:03:28and neither can Claude Codex because Claude Codex, and this is something Anthropic themselves have said,

00:03:34is very nice and talks very well about the code it has written, right? You ask Claude Codex to judge

00:03:40what it's written, it's like, oh yeah, sick, A+. So, are they like a reliable narrator and a reliable

00:03:46evaluator in this case? No, they're not. So, if you don't know what's going on, and we can't

00:03:50necessarily trust Claude Codex, where does that leave us? Well, we have this gap here then, right?

00:03:56We have this gap between Claude Code and quote-unquote optimal code. And so, the obvious solution is,

00:04:02well, let's bring in a third party, a neutral third party to take a look at our plan. In comes Codex.

00:04:09And this Codex review is what I've added to Pocock skills, and it's what I'm going to be giving you today.

00:04:16So, the first half is exactly the same as GrillMe. Questions back and forth, we get this plan going

00:04:21together, everything is nice and neat right here. And once we have the plan all set in stone, well,

00:04:27then Codex is going to come in, it's going to see what Claude Codex has come up with and say,

00:04:32this looks good, this looks bad, what do you think? Claude Code is going to take a look at it and say,

00:04:36oh, that makes sense, let's fix that, here's what I did, take a look again, Codex. And it's going to

00:04:41go through a cycle of like, well, it maxes out at five turns, you can easily edit that, but it's going

00:04:48to have five back and forth, which is a little bit different than the standard adversarial review

00:04:52Codex plugin, because it's more iterative. And the idea being, if they go back and forth enough

00:04:57times, we'll eventually get to a place, hopefully sooner than five turns, where they're both like,

00:05:01hey, thumbs up, it's good to go, push forward. So all's that to say is what I'm giving you today

00:05:09is meant to fix this gap right here. This gap between Claude Code and the optimal code that you

00:05:16and I will struggle to identify because we are not expert software engineers and Claude Code can't

00:05:21be trusted to do it to a certain extent. So that's what we're covering. And now we're all on the same page.

00:05:28But before we hop into the demo, a quick word from today's sponsor, me. So as you know,

00:05:33Chase AI Plus is the home of my Claude Code masterclass. And it is the number one way to go

00:05:37from zero to AI dev, especially if you do not come from a technical background. We focus on real use

00:05:42cases. And I recently added the Claude OS masterclass there as well. So if you're like, hey, I also want to

00:05:49learn how to integrate things like Obsidian and create a full command center. This is the place for

00:05:54you. You can find a link to it in the pin comment. So for today's demo, we're going to add a new page

00:05:59to our website. So this is the website for my AI agency. And the new page is going to give people

00:06:05access to some exclusive skills. And to get access to this page, when they click on it, they're going

00:06:11to have to add their email. So it's somewhat gated, we grab their email, then they have access to the

00:06:16things they can download. Now the email needs to then get handled with our database, which already exists.

00:06:22So we're not just creating some feature from thin air, it needs to take a look at the code base that

00:06:27already exists and make it coherent. So this is the prompt I'm giving Claude code run grill me codex.

00:06:32I want to add a email capture gate to the site that unlocks the grill me codex Claude code skill.

00:06:38If visitor lands on a page where the skill download is blurred behind an overlay,

00:06:42they enter their email to unlock it and their email is stored. And then I gave it some additional context.

00:06:49So the first part is going to be the grill me skill. It's the exact same grill me part as

00:06:56Matt Pocock's one, the one we're kind of building off of. So that part is the same.

00:07:00And once we go through all the questions and codex will come in. So after I looked through

00:07:03the code base, it's now asking me the first question and saying, how real is this gate

00:07:07when it comes to the blur? Is it a cosmetic thing or is it actually going to be enforced?

00:07:11And just like with grill me, anytime it asks you a question and gives you some potential answers,

00:07:16it also gives its recommendation and why. So for this one, it's just going to be cosmetic.

00:07:21It's a free skill. The goal here is just to capture the email. So we're just going to say,

00:07:25cosmetic is fine. File is free anyway. Next is asking about where the assets is going to live

00:07:30and what format. And again, for the sake of this demo, I'm just going to go with the recommended

00:07:36option. And I'm not going to show you the rest of these questions because this isn't meant to be a

00:07:40grill me video. Just understand that if you haven't seen it before, this is the general cadence.

00:07:44It's going to ask you a series of questions, give you potential answers and a recommendation.

00:07:48Very similar to plan mode, just plan mode on steroids. So you can see here,

00:07:51we ended up going through 10 questions on the grill me side, and then we transitioned into the codex

00:07:56portion. Now the codex portion is going to create two markdown files for us. We have the plan.md

00:08:02and then the plan review log. So the plan.md is the source of truth for what we're going to create.

00:08:10This is what our final deliverable is going to be. The plan review log.md, this is where

00:08:16cloud code and codex are going to go at it. Codex is going to take a look at the original plan.md and

00:08:21take a look at the overall thing that cloud code has created. And it is in the plan review log that codex is

00:08:28going to say, Hey, this sucks. This doesn't, et cetera. This also gives us a log of their back and

00:08:33forth through all of the cycles. And at the end of this back and forth with codex and cloud code,

00:08:38we will have an updated plan.md. So plan.md is the final deliverable. That's what everything will be

00:08:46built off of. The plan review log is the back and forth and where the sausage is actually made. Another

00:08:52note during this adversarial review is that while it is headless, we still give codex the session ID.

00:08:59So it's not like it's a completely blank slate on codex's part on like iteration one versus iteration

00:09:05two versus iteration three. It always has memory of the entire back and forth with cloud code. So we

00:09:12can see here in round one, that codex found 11 things that it considered issues. And we can also

00:09:18see that cloud code went ahead and updated the plan.md based on the findings that it accepted and felt

00:09:25were valid. In round two, it found four additional findings. We've gone from 11 down to four. And again,

00:09:31the plan was updated. And here on round three, we see that the verdict is now approved. It's at this

00:09:35point that codex and cloud code are now on the same page. Codex has still flagged a couple things,

00:09:40but they're just three low level knits. So they're non blockers. And that's reiterated here at the end

00:09:45where it's telling it is approved round three of five tells us what the final plan looks like,

00:09:50what the two acts bought us and specifically in terms of act two, which is round one and round

00:09:56two of codex and cloud code going at it. You know, we caught real security and correctness holes.

00:10:01There was unbounded client skill slug, case sensitive dedupe bypass, relative email link,

00:10:06raw list bombing vector and a table scanning rate limit. And in the second round, it caught the false

00:10:12fixes. So round one codex said, Hey, here's the issues. Cloud code tried to fix them. And in the

00:10:18second iteration codex is like, those aren't real fixes, right? So it noticed that the double opt-in

00:10:24claimed, but wasn't wired the expression index dedupe that super base JS can't target

00:10:30and the away before response that still blocked unlock was moved to after. So just three rounds,

00:10:38but this is a great time saver versus trying to execute the first plan. Cloud code came up with

00:10:44and then going through the whole troubleshooting process. At the end, it also brings up some open

00:10:49items, mainly like the SQL migration and all that. But that's also cloud code being lazy because it can

00:10:54do that on its own. So back on the website up top, we have the free skill. I click on it. Now it's

00:10:58asking me for my email. And cool. Now I have the skill here that I can download in a .zip file.

00:11:08Obviously in reality, what would I actually want to do? Well, I would probably want the text and

00:11:12everything to actually match the rest of the website, but you can see it created what we set out to do.

00:11:18The point of this video wasn't the specific demo, but just to show you this skill in action. As for

00:11:23how to get these skills yourself, I'll put them down in the pen and comment to make it easy for you.

00:11:27But besides that, that's pretty much all I got. Obviously things you need to know for this is,

00:11:31hey, we're using codex. So you are going to need an open AI account. You're going to need codex

00:11:35downloaded, which is relatively simple to do. And there's no reason you need anything beyond the

00:11:39$20 a month open AI plan to get a lot out of this. This system we've created is also something

00:11:45you could easily swap out for some sort of local model. So if you're like, hey, I don't want to

00:11:50pay open AI $20 a month. I'd rather use something like DeepSeq or whatever, any local or cheaper model

00:11:55you have, really easy to do. Like the bones are there. I would just take the skill I've created,

00:12:00bring inside a cloud code and say, hey, can we swap out codex for insert whatever model you're trying to

00:12:07use? It's really that easy. It's very, very flexible. So there's a lot you can do with

00:12:12it. And I think the bones of it make a lot of sense for those of us who don't consider ourselves

00:12:16expert coders who can take a look quickly and efficiently at what cloud code has done and say,

00:12:22this makes sense. This doesn't. It's just not in a lot of people's wheelhouses,

00:12:26nor does it need to be. Frankly, we have tools that can do this for us. So as always,

00:12:32let me know what you thought. Make sure to check out Chase AI Plus if you want to get your hands on

00:12:35on the Cloud Code Masterclass,

00:12:37and I'll see you around.

Key Takeaway

By adding an iterative adversarial review layer using Codex to Claude Code's existing 'Grill Me' planning skills, non-technical users can force AI agents to identify and resolve logic and security flaws before implementation.

Highlights

Claude Code often fails to provide optimal results because it evaluates its own work without objective oversight.
Grill Me Codex adds an iterative adversarial review layer to Claude Code, requiring two AI models to reach consensus before finalizing a plan.
The adversarial process involves up to five rounds of back-and-forth communication between Claude Code and Codex to identify security and logic flaws.
In the demonstrated project, this secondary review process identified critical issues including unbounded client skill slugs, case-sensitive dedupe bypasses, and rate-limiting vulnerabilities.
The Codex review architecture uses a 'plan.md' file as the source of truth and a 'plan review log.md' to record the iterative critique process.
This system remains flexible, allowing users to swap Codex for any local or lower-cost model by modifying the skill configuration.

Timeline

The Problem with Single-Model AI Coding

Relying on a single AI model to both generate and grade its own code creates an inherent quality gap.
Non-technical users often lack the expertise to identify if AI-generated code is logically sound or optimal.

Planning tools like Grill Me, GSD, or Superpowers improve the initial articulation of a project, but they stop short of guaranteeing execution quality. Because models like Claude Code are programmed to be agreeable, they often provide overly positive self-assessments. This creates a risk where users build flawed software without realizing it.

Implementing Iterative Adversarial Review

Grill Me Codex and Grill with Docs Codex extend Matt Pocock's original skills with a secondary adversarial phase.
The system forces Claude Code and Codex to converse through multiple rounds until both AI tools sign off on the plan.

The workflow adds a neutral third-party review stage using Codex. After the initial planning phase, Codex audits the plan, critiques it, and engages in a back-and-forth cycle with Claude Code. This ensures that the final deliverable is not just what the initial model imagined, but a version that has withstood critical review.

Demo: Building an Email Capture Gate

The system uses two files: 'plan.md' for the project requirements and 'plan review log.md' for the adversarial critique history.
Three rounds of iteration identified security risks, including raw list bombing vectors and SQL migration oversights.

When adding an email capture gate, the system generated 11 findings in the first round and four in the second. By the third round, the models reached an approved state, having corrected invalid assumptions regarding database deduplication and security headers. This process demonstrated how automated critique catches issues that a single-model generation would have missed.

Flexibility and Customization

The review framework allows for the substitution of Codex with cheaper or local LLMs.
Automated adversarial review reduces the need for users to be expert software engineers to evaluate code quality.

Because the review system is modular, users can easily adapt it to other models like DeepSeek or local alternatives. The approach lowers the barrier to entry for development by shifting the burden of code validation from the human user to a secondary AI agent.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video