00:00:00So we can now use Codex inside of Claude Code.
00:00:03OpenAI has made it.
00:00:04So the number one competitor to Opus 4.6
00:00:08is now something you can use
00:00:09inside of the anthropic ecosystem.
00:00:11And this is great news for all Claude Code enjoyers,
00:00:15especially if you're someone who has been struggling
00:00:18with usage rates, because frankly,
00:00:20Codex gives you a way better bang for your buck
00:00:23in terms of dollar to credit slash tokens.
00:00:26And so in this video, I'm gonna show you how to set it up
00:00:28and we're gonna go through what Codex can actually do
00:00:31with the Claude Code harness on top of it.
00:00:33And more importantly, what we can do using Claude Code
00:00:38with Opus 4.6 and Codex together, right?
00:00:40How can we play these two models off one another
00:00:43to get a sum that is greater than their parts?
00:00:46Now before we do the install, let's do a quick overview
00:00:48of what the Claude Code plugin brings us,
00:00:50because there's a few things.
00:00:51Now, the two most important things I would argue
00:00:54are the code reviews, right?
00:00:56The ability to essentially have it take a look
00:00:58at something Opus has written.
00:00:59And that goes into stages.
00:01:01First of all, we have the standard Codex review,
00:01:03which is just, you know, kind of a neutral review.
00:01:06You know, it's taking a look, it's just read only.
00:01:08The second one is adversarial review, which I love.
00:01:12So this is essentially telling Codex like,
00:01:13"Hey, take a look at what Opus have built
00:01:15or what any coding agent has built,
00:01:17but have a very discerning eye.
00:01:20Like kind of assume they screwed up
00:01:22and figure out what we can do to make it better."
00:01:25So this is an awesome way to really improve our outputs,
00:01:28because one of the issues with Opus
00:01:31and really a lot of AI models in general
00:01:33is they tend to do a bad job of evaluating their own code.
00:01:36This is something Anthropic talked about
00:01:38in their engineering blog that got released last week.
00:01:40So something like adversary review, perfect, love this.
00:01:44Other than that, we can also use Codex Rescue,
00:01:46which allows us to have Codex create something all on its own
00:01:49just like you would do with Opus inside of Claude Code.
00:01:52And then beyond that, just kind of like some status stuff,
00:01:54like taking a look at where it is in its particular job.
00:01:58So let's dive into this and take a look at the install.
00:02:01Now to install this is pretty simple.
00:02:02You're just gonna run this command
00:02:04to add it to the marketplace.
00:02:06And I'll have all these commands down in the description.
00:02:08And then you're gonna run this plugin command to install it,
00:02:11codex@openai-codex.
00:02:13As usual, ask where you want to install it.
00:02:14I'm gonna do user scope.
00:02:16And then we just need to reload the plugins
00:02:17to get it up and working.
00:02:18And then lastly, we want to run codex colon setup.
00:02:21In case you didn't realize, there's also a GitHub repo
00:02:24for this, which also goes over all of the install commands.
00:02:27So I'll link that in the description as well.
00:02:29And the usage rates are tied to your chat GPT account,
00:02:32even if you're on the free account, apparently.
00:02:34So just understand it's going to be pulling
00:02:36from your Codex usage.
00:02:37It's gonna ask if you want to install Codex, yes.
00:02:39For that, you log in and that will send you to the browser
00:02:42where it runs you through the authentication process.
00:02:44Now there's really two obvious use cases
00:02:47for this Codex tool inside of Claude Code.
00:02:49The first one is dealing with the usage limits
00:02:52inside of Claude Code.
00:02:53Normally, if you're on the pro plan with Anthropic
00:02:55or the 5x max, you can hit those limits very quickly,
00:02:58especially with some of the CLI bugs
00:03:00we've been seeing in the last week.
00:03:02If that's the case, what you might want to do
00:03:03is use Opus 4.6 to plan and Codex to execute.
00:03:07And to do that, again, very simple.
00:03:09You're just gonna do codex rescue.
00:03:11And then from there, you're going to give it the prompt.
00:03:14And you can also specify a whole bunch of things.
00:03:16Like you see all the flags here,
00:03:18including the effort level and all that.
00:03:20And remember, Codex, the model is very solid.
00:03:24And again, the usage isn't even close
00:03:26to what Anthropic charges.
00:03:27But I think the more interesting use case
00:03:28is what I talked about earlier,
00:03:29and that's the adversarial review.
00:03:30So let's put that to the test.
00:03:32So I'm gonna have it take a look
00:03:33at my Twitter engagement/research bot.
00:03:37This is the web app I had Claude Code build.
00:03:39Essentially what it does is it scans tweets in the AI space
00:03:43for every like 30 to 45 minutes.
00:03:45It has a quality filter.
00:03:47It has scoring signals
00:03:48based on a number of different parameters.
00:03:50It's connected to Superbase
00:03:51to make sure the tweets don't get repeated.
00:03:53It has a scoring system and integrates softmax, PIX.
00:03:56Everything gets pushed to Telegram.
00:03:58And I also have AI builds in there to help with responses.
00:04:00So there's a fair amount going on.
00:04:02And then on top of that,
00:04:03it also tracks like all of my responses
00:04:06so we can kind of have a feedback loop.
00:04:07So this is like a relatively, it's not super complicated,
00:04:10but this isn't like a landing page we're having a look at.
00:04:13So we're going to see what Codex comes back with.
00:04:16When we do an adversary review on the code for this, right?
00:04:20So let's see how it does.
00:04:22So we'll keep it pretty open to interpretation.
00:04:23So we're telling Codex,
00:04:24take a look at the code base and let me know what you think.
00:04:27And the first thing it does is it tells us,
00:04:28hey, we're going to estimate the review size
00:04:30to determine the best mode.
00:04:32And then from there it says, hey,
00:04:33do you want to run it in the background
00:04:34or do you just want to wait for the results?
00:04:35So we're just going to wait for the results.
00:04:37And it's telling us review scope includes the full code base
00:04:39plus nine working tree changes, one modified file,
00:04:42eight untracked files.
00:04:43So it knows there's a kind of,
00:04:44there's a lot it needs to take a look at.
00:04:46And while that's working,
00:04:47let's talk about how adversarial review is actually working.
00:04:49So we just kind of saw the first four parts, right?
00:04:52It parsed the arguments.
00:04:54We didn't pass any flags,
00:04:55so it's just going off its default settings.
00:04:57And then it estimated the review size,
00:04:59resolved the target and collected some context.
00:05:01That was all that text about, hey, you know,
00:05:03we have these untracked changes
00:05:04and this is going to take a while.
00:05:05Now, after those first four steps,
00:05:06it's then going to build the adversarial prompt
00:05:09and there's seven attack surfaces
00:05:11it's going to pay special attention to.
00:05:13That's authentication, data loss, rollbacks,
00:05:17race conditions, degraded dependencies,
00:05:20version skew and observability gaps, right?
00:05:23So like seven things that are somewhat under the surface
00:05:26that could really screw us
00:05:27if we try to push this to production
00:05:29and we don't have a handle on.
00:05:30From there, it's going to send all that information
00:05:31back to the OpenAI server, so Codex can take a look at it.
00:05:34And then it will give us our structured JSON output
00:05:37and we should expect it to look something like this, right?
00:05:41And it will give us some sort of severity of its findings,
00:05:43right, versus critical, high, medium and low,
00:05:46as well as recommendations and next steps.
00:05:48But all you have to do is sit there inside of Clod code
00:05:51and wait for the response.
00:05:52So Codex came back with four issues with our code base
00:05:54and all of them had a severity of high
00:05:57and I pasted this over to Excalidraw
00:05:58so it's a little easier for us to go through it.
00:06:00So for each one of these, it gives us the severity,
00:06:02the area, the actual issue, the files,
00:06:06as well as the actual lines of code
00:06:08we need to take a look at.
00:06:09And then importantly, like what's the actual impact here
00:06:12as well as the fix?
00:06:13So number one, it's saying we had an issue
00:06:15with our dead up logic.
00:06:16Number two was how we were dealing with Telegram polling.
00:06:19Third was our schema drift.
00:06:21And then lastly was our actual dashboard build.
00:06:24So this is actually relatively important stuff
00:06:27and luckily it doesn't look like the fixes
00:06:29would be too difficult to implement.
00:06:31But what I'm interested in is,
00:06:33okay, this is what Codex gave us.
00:06:35What would Clod give us if we asked for a similar,
00:06:40you know, sort of adversarial review on its own code base?
00:06:43Because I think that would be kind of enlightening
00:06:45to see them head to head
00:06:46and like what Codex really does differently than the other.
00:06:48'Cause for all we know, they're exactly the same
00:06:50and this whole video was pointless.
00:06:52So I'm now having Opus run the same
00:06:55sort of adversarial code review.
00:06:56I had Codex come up with a particular prompt.
00:06:59So essentially it's just saying,
00:07:00hey, I want you to challenge the implementation,
00:07:02the design choices.
00:07:04Here's some things I want you to evaluate.
00:07:05And then here's the sort of output format.
00:07:07So let's see what it comes back with.
00:07:09And so here's the results broken down.
00:07:11So first of all, they did have one shared finding.
00:07:13So they both agreed that the Telegram issue was a problem.
00:07:17So this was the one issue that they both found
00:07:20and that they said was either high or critical.
00:07:23Codex said it was just high
00:07:24and then Opus said it was critical.
00:07:26Opus itself found seven other additional issues
00:07:30ranked high or critical that Codex didn't.
00:07:32Now we're not saying that just by virtue of saying
00:07:36there's more issues that Opus
00:07:37was necessarily better than Codex.
00:07:39Just pointing out, it found seven things
00:07:41we might want to look at that Codex didn't.
00:07:43Then obviously on the flip side,
00:07:45we found three issues with Codex that Opus missed.
00:07:48So what does this mean
00:07:49if we kind of look at this in totality?
00:07:50Does this mean Opus is better than Codex
00:07:51because it found more or that Codex is better than Opus
00:07:54'cause it narrowed down on four
00:07:56and didn't take us onto a weird path?
00:07:58I think what you draw from this
00:07:59is kind of whatever you want to draw from this.
00:08:01And that probably is that there is kind of value
00:08:04of having these two systems.
00:08:05Look at it, right?
00:08:06A second pair of eyes versus having Opus grade Opus
00:08:09all the time.
00:08:10There is some sort of fundamental flaw, I think,
00:08:13with having the same AI system do the planning,
00:08:16the generating, and the evaluating.
00:08:17And if we're able to very easily bring in Codex,
00:08:20especially at its price point,
00:08:22to even just do things like this,
00:08:24like an adversarial review,
00:08:25again, that's like one of the great AI coding
00:08:28on the margin plays, which again is like, why not?
00:08:30If you're already paying for ChatGPT,
00:08:34if you're already paying the 20 bucks a month,
00:08:35and I can now bring in this and kind of have Codex
00:08:37just take a look at anything,
00:08:38this simply like, what's the downside to this really?
00:08:43I mean, I don't think any quick tests like this,
00:08:47we're going to have any definitive answers like,
00:08:48oh, Codex is better versus Opus.
00:08:50And I think that whole conversation
00:08:51sort of misses the point.
00:08:52This is just like one more tool in our toolbox
00:08:54and now we can use it.
00:08:55So I think this is great.
00:08:56Now we can get way more specific
00:08:58with adversarial review as well,
00:09:00because our prompt was pretty just like open and out there
00:09:03and it was able to interpret it in a lot of different ways,
00:09:06but just based off of the GitHub examples, right?
00:09:08You can get pretty specific
00:09:09about what you want Codex to look at.
00:09:11So overall, I think this is a great addition
00:09:13to the Cloud Code ecosystem.
00:09:14The more tools, the better,
00:09:15especially if you're someone who either A,
00:09:17is already paying for ChatGPT,
00:09:19or B, is like on the Anthropic Pro plan,
00:09:22and then maybe you are paying for ChatGPT,
00:09:23a hundred bucks a month might be a little much,
00:09:25200 bucks might certainly be too much.
00:09:28Like this almost gives us like this middle ground
00:09:30between the $20 sub and the $100 sub,
00:09:33because Codex really is a great value play.
00:09:36So definitely check it out, super easy setup.
00:09:39Let me know what you thought,
00:09:41and as always, I'll see you around.