00:00:00Clod Code was made generally available May 22nd of last year, along with the release of Clod4.
00:00:06But there was also a research preview before this, and so I've been using the tool
00:00:11for a bit over a year now, and I actually did the math. If you count all the time
00:00:15it took for me to prompt Clod, review the code, monitor it, I have used the tool for over 2000
00:00:21hours now. So yeah, I have a thing or two to teach you. That's what I want to do in this video.
00:00:27So right now, I want to share with you all of my battle-tested strategies that will take you
00:00:31from a basic Clod Code user all the way to a power user. I've bundled everything up together into what
00:00:37I call the WISC framework. And here's the thing, these strategies are legit. I am not one of those
00:00:43AI content creators that has just jumped on the Clod Code bandwagon the past few months. I've been
00:00:48using this tool, like I said, daily for over a year now. And so these strategies are going to work on
00:00:54any code base, even massive ones, even projects that have multiple code bases. I've seen all of this
00:01:00applied at an enterprise level, and so no matter what you're working on, this is for you. This also
00:01:05really works for any AI coding assistant. I'm just focused on Clod Code because it is the best right
00:01:10now. And so I am assuming here that you have at least a basic understanding of Clod Code, and now
00:01:15you want to take things to the next level. If you want the basics of building a system for AI coding,
00:01:21I'll have a video that I'll link to right here. All of these strategies, this is when we want to
00:01:25work on real code bases that get messy because we have a bunch of strategies here around context
00:01:32management. This is important because context rot is the biggest problem with AI coding assistants
00:01:38right now. It doesn't matter that we have the new one million token limit for Clod Code, we still
00:01:43need to treat our context as the most precious resource that has to be engineered very carefully
00:01:49with our AI coding assistants. And so the W, I, S, and C for the framework, all these strategies
00:01:56apply to that, and these are all things that you can take and apply to your projects immediately.
00:02:00So I'm going to break it down nice and simple for you here. Now, the question you might be asking
00:02:05yourself is, Cole, why are we focusing so much on context management? Over 2,000 hours of using
00:02:11Clod Code, and this is what you want to focus on? And my answer is yes. I know this is very specific,
00:02:17but we need to lean right now into context rot and how to avoid this. I would go so far as to say
00:02:23that about 80% of the time when your coding agent messes up in your code base, it's because you
00:02:28aren't managing your context well enough. And so I want to start with the problem of context rot,
00:02:33and then we'll very quickly get practical diving into every part of the WISC framework. But I want
00:02:38to start with context rot as a precursor so you can really see why. Once you apply the WISC framework,
00:02:45you're going to immediately see jumps in reliability with your AI coding, even on messier
00:02:50code bases. And I keep emphasizing larger, messier code bases because that's where we see context rot
00:02:56becoming more and more of a problem. Now, there has been a lot of research in the industry on context
00:03:02rot, but my favorite, this is the most practical and probably most popular as well, is the Chroma
00:03:07Technical Report covering how increasing input tokens impacts LLM performance. And the main idea
00:03:13here is just because you can fit a certain amount of tokens into an LLM's context window doesn't mean
00:03:18that you should. And yes, this applies to Claude code with the new 1 million token limit as well.
00:03:24Because large language models get overwhelmed with information just like people do. It is called the
00:03:30needle in the haystack problem. So when you have a very specific piece of information or with coding
00:03:35agents, a specific file that it's read that you need it to recall, it will do a good job recalling
00:03:41that information in its short-term memory, but only if you don't have a super filled context window.
00:03:47When you start to have a massive amount of context loaded, you start to get what are called
00:03:52distractors. And so these are pieces of information that are close or similar to what you need the LLM
00:03:58to recall, but not quite right. And we see this a lot with AI coding, especially with larger code
00:04:04bases. We're following the same patterns for things throughout our code base. We have a lot of
00:04:09similarity in how different parts of our code base are implemented. And so large language models will
00:04:14pull the wrong information and be very confident about their fix or implementation. I'm sure you've
00:04:19seen this all of the time. We have this needle in the haystack problem applying all of the time
00:04:24to AI coding. This is the idea of context rod. The larger our window gets, the more the large
00:04:30language model has a hard time pulling out exactly what we need for the current turn with our coding
00:04:36agent. So going back to the diagram, let me get super specific for you. What we're addressing with
00:04:42all of these strategies is the question, how do we keep our context window as lean as possible
00:04:48while still giving the coding agent all of the context it needs? That is the context engineering
00:04:53that we are doing here. And so I'm going to go through every single strategy. And I even have
00:04:57an example for each of them that I'll go through live with you on a complicated code base and all
00:05:02of the commands and rules and docs that I use as an example, I have in this folder that I'll link
00:05:06to in the description. So you can use all of these strategies conceptually, but also with these
00:05:12commands as an example that I have in the dot clod folder right here. All right, so let's get into the
00:05:17individual strategies now. So W stands for write, I for isolate, S for select and C for compress. And
00:05:24of course we will start with the W here, which is writing, externalizing our agent's memory.
00:05:30As much as possible, we want to capture key decisions and what the agent has been working on
00:05:34so that in future sessions we can catch our agent up to speed a lot faster and have to spend less
00:05:40tokens upfront, having the agent understand what we really need it to do. And so the first strategy
00:05:46here is to use the git log as long-term memory. And I absolutely love this because there are so
00:05:52many people that love to over engineer and have super complicated memory frameworks for their
00:05:56coding agents, but really everyone's already using git and GitHub for version control. And so we can
00:06:01take advantage of a tool that we're already using to provide long-term memory to our agent. Let's go
00:06:07into our code base and I'll show you what I mean. So the code base I'm going to be using for all the
00:06:12examples here is the new Archon. And I've been working my butt off on this the last few months
00:06:18behind the scenes. This is your AI command center where you can create, manage and execute longer
00:06:23running AI coding workflows. And we're even working on a workflow builder. It's going to be like the
00:06:28N8N for AI coding. And so we can kick off workflows. We can view the logs and monitor them in our
00:06:33mission control. We can look at past runs to see exactly what happened. Like this is a very long
00:06:39workflow that I have to validate entire pull requests in my code base. So yeah, you can tell
00:06:44from looking at this and a lot more in Archon coming soon, by the way, but you can tell from
00:06:47looking at this that there are a lot of moving parts. This is a very complicated code base. So
00:06:51it makes for a good example for everything I'm going to cover with you here, all of the strategies.
00:06:57And so going to get as long-term memory, I'll show you an example right here of a one-liner
00:07:03for all of my recent commit messages. And what I want to point out here is that we have a very
00:07:09standard way of creating these commit messages. So we have our merges, but we also have all these
00:07:13feature implementations and fixes. And so I have things very standard because that way I can rely
00:07:19on the commit messages to tell my coding agent what I've worked on recently, because a lot of
00:07:24the time that will guide us for what we want to work on next. And the reason I have this so
00:07:29standard is because there is a commit command that I run. Now, running a get commit is very easy,
00:07:36but if we want to standardize the message and have the coding agent help us with that,
00:07:40having a specific command is very powerful. So I have this full implementation that I did here
00:07:46in a single context window with the coding agent. I'm at the end now where I am ready to run my
00:07:51commit. And so if I just run slash commit, that's all I have to do. It's running this command that
00:07:55has the standardization for how I document any work that I did. And then also anything I did to improve
00:08:01my rules or command. So it's a two-part command. Here's what we built. Here's how we improve the
00:08:06AI layer. And so it's going to make this commit and I'll show you what it looks like after.
00:08:10All right. So now looking at our commit message, we can see that we made some test improvements
00:08:14to the CLI. So a really nice prefix then getting into the details. And then also, so the coding
00:08:19agent knows how its own rules and commands are evolving over time. We include that in the commit
00:08:23message whenever we find an opportunity to improve, let's say our plan command, for example. And of
00:08:29course this commit command is one of the resources that I have for you in the repository. If you want
00:08:33to use this as a starting point, but I also encourage you to customize what your commit
00:08:37messages look like. The important thing here is we standardize the messages. We make them very
00:08:41detailed so we can use it as long-term memory. All right. So the second right strategy is to
00:08:47always start a brand new context window whenever you are writing any code. No matter what I'm working
00:08:53on, my workflow is always, I have one conversation to plan with the coding agent. I'll create some
00:08:57kind of markdown that has my structured plan. And then I'll send that in as the only context to a new
00:09:03session going into the implementation. And so it's very important here that your spec has all of the
00:09:08context the agent needs to write the code and do the validation. So for example, in this conversation,
00:09:14I am just doing planning. So I run my prime to start. I'll talk about this in a little bit.
00:09:18I load in context and then I create my plan with this command. So it's another one that I have as
00:09:24the resource for you. This essentially walks through for the coding agent. Here's the exact structure
00:09:28that we want to create for our single markdown document. So going from our short-term memory
00:09:33into a single document. And then we end the session here. We go to a brand new context window
00:09:38and we go with our implementation. So I have my execute command. And then this is where I can
00:09:42specify the path to my structure plan. No other context because this should have everything that
00:09:48it needs. This is very important because it keeps our coding agent extremely focused on the task at
00:09:53hand. There can be a lot of research and other things that just muddles the context window.
00:09:57If we implement in the same place that we plan. So the last W strategy that I have for externalizing
00:10:03agent memory is progress files and decision logs. You'll see this all the time with more elaborate
00:10:08AI coding frameworks where you have like a handoff.md or a todo.md communicating between
00:10:13different sub-agents or agent teams, even just between different agent sessions. When you're
00:10:17running low on context, a lot of times you want to create this summary of what was just done. So
00:10:22you can go to a fresh session because you're starting to see that context rot with the agent
00:10:27hallucinating as you have these longer conversations. Now, obviously it's ideal to just avoid these longer
00:10:33conversations, but sometimes you need to have them. For example, something I do with Archon a lot is
00:10:38I'll have it use the Vercel agent browser CLI to perform end-to-end testing within the browser. And
00:10:44so I have it go through a bunch of different user journeys and testing edge cases. It takes a lot of
00:10:49context. You can see at the bottom here, I ran a slash context and we're already at 200,000 out of
00:10:56the new 1 million limit. This fills up so quickly. And once you start to have a few hundred thousand
00:11:01tokens in the context window, that's when you see the performance start to degrade for the agent. So
00:11:05I can simply run a slash handoff. This command is going to create a summary that it can now pass into
00:11:11another session so that agent can continue the work. But now it doesn't have hundreds of thousands of
00:11:16tokens of tool calls and things like that sitting in its window. And this handoff command is really
00:11:21just walking through a process of here's exactly what we want to put in this document. So the next
00:11:25agent has what it needs. All right. So that wraps up our W and each one of these strategies is very
00:11:31important because we are logging key decisions for future agent sessions to quickly pick up on.
00:11:36And I know I'm going quick here. So let me know in the comments, if there's any one of these
00:11:40strategies that you want me to make an entire video on, cause I definitely could for each of these.
00:11:45And so now we get into the I for isolate using sub agents. I love using sub agents for all things
00:11:52research, using them pretty much every single session. The important thing here is keeping
00:11:56your main context clean. We can use sub agents to perform tens or even hundreds of thousands of
00:12:03tokens of research across our code base or the web. And then just giving the needed summary to our main
00:12:10clod code at context window. So instead of loading in tens of thousands of tokens of research into our
00:12:16main context window, it is now only something like 500 tokens. So we still get the core information
00:12:21that we need, but we have a 90.2% improvement according to some entropic research using sub
00:12:28agents to load in context upfront for our research, instead of having our main agent taking care of
00:12:33everything. So let me give you an example of this really quick. It's always at the start of the
00:12:38conversation or before that structure plan I covered earlier, like I'm in the planning process. That is
00:12:43when I use sub agents very heavily. Watch this. I want to build a workflow builder into our con.
00:12:50So I want you to spin up two sub agents, one to do extensive research in the code base to see how we
00:12:55would build in a workflow builder and what that means for our con and then spin up another sub agent
00:13:01to do web research on best practices for the tech stack. Like if I want to use react, what library
00:13:06should we use? And generally, how do we build workflow builders like Diffie or N8N? So I'm just
00:13:12using my text to speech tool here. Send off the prompt. There we go. And so not only do we get to
00:13:16the benefit of isolation, but also speed because it's going to use these sub agents in parallel,
00:13:21come back with a summary, and then my main agent will synthesize all that and give me the final say.
00:13:26So there we go. Both of the sub agents are running in parallel behind the scenes. We can go and view
00:13:31the logs for each of them as well. And then it'll come back at the end once they're done with the
00:13:36final report. All right, our sub agents finished. And instead of using hundreds of thousands of tokens
00:13:41in our main context window, which that is how much the sub agents did with their research,
00:13:46we only used 44,000 tokens, only 4% of our window so far. That is the power of sub agents. I don't
00:13:53recommend them for implementation because usually you want all the context of what you did. But for
00:13:57research, it is very powerful. So yeah, isolation and sub agents are very important for your planning
00:14:04process. The other way that we can use sub agents is with what I like to call the scout pattern. We
00:14:09want to send scouts ahead before you commit your main context. There might be parts of your code
00:14:14base or documentation that you want sub agents to explore to see if it is relevant to load into your
00:14:21main Claude code session. So it can kind of make the decision ahead of time. Like yes, we should
00:14:25bring this in for our larger planning or no, we should skip it. It isn't relevant. For example,
00:14:30with Archon, I have a few markdown documents that are very deep dives into certain parts of the code
00:14:36base, not the kind of context we want in our rules because we don't need it all the time. But sometimes
00:14:41you might want to load this and you can imagine this being something in Confluence or Google Drive,
00:14:45like wherever you store your context. And so going back to this main conversation,
00:14:48I can just say, spin up a sub agent to research everything in my dot Claude slash docs. Are there
00:14:54any pieces of documentation here that we would care about loading into our main context for planning?
00:14:59And I can send this in, it'll make the decision and then load in what I care about. So right here,
00:15:04we kicked off an explore sub agent. It found all of our documentation, recommended loading one.
00:15:09And then I said, yep, go ahead and load it. This is really important for what we're planning here.
00:15:13So instead of just doing sub agents for research, sometimes we have entire pieces of documentation
00:15:18that we think are crucial for our main context window. That's when we want to use the scouting
00:15:23pattern. So that is everything for isolation. Remember to use sub agents for your research
00:15:28and planning very extensively. And now that brings us into the S4 select. Load your context just in
00:15:34time, not just in case. And what I mean by that is if you're not 100% confident that a piece of
00:15:40information is important to your coding agent right now, then you shouldn't bother loading it. And we
00:15:46have a layered approach to help with this. And so we start with our global rules. These are our
00:15:51constraints and conventions that we always want our coding agent to be aware of. And so you want this
00:15:57file to be pretty concise, usually between 500 and 700 lines long as what I go for. A lot of people
00:16:02advocate for even less, but you have things like your architecture, the commands to run, the things
00:16:08like you're testing a logging strategy. This is my example from Archon, but these are the things that
00:16:12you want your coding agent to be aware of all of the time. And then we have our layer two. So our
00:16:18on-demand context, as I call it, these are rules that apply only to specific parts of the code base.
00:16:23Like if we're working on the front end, which you aren't always, but if you are, here are the global
00:16:28rules for the front end, or here are the global rules for building API endpoints. So we add this
00:16:33onto our global rules for specific task types, because we aren't always going to be working on
00:16:38the front end, for example. To show you one example of this, we have the workflow YAML reference that
00:16:43I pulled just a little bit ago with the Explorer sub-agent. So when we are working on the workflows,
00:16:48then we care about this, but we don't want this in our global rules because most of the time
00:16:52when we're working on Archon, we're not actually working on this specific part of the code base. And
00:16:57so it's on-demand context. Then the third layer that we have here is skills. This is very popular
00:17:05with Clod Code and beyond right now. We have the different stages here where the agent is going to
00:17:10explore the instructions and capabilities in the skill as it deems that it actually needs it. So
00:17:15we start with the description. This is a very small amount of tokens loaded in upfront with our global
00:17:20rules. If the agent decides it wants to use this skill, then it'll load the full skill.md,
00:17:25which can also point to other scripts or reference documents that we'd want to load if we're going
00:17:29even deeper into the skill. And so as an example of that, I have my agent browser skill. This is
00:17:35what I use for my browser automation for all my end-to-end testing I was showing earlier. I use
00:17:40this every single day. And so whenever I am doing my end-to-end testing, then I want to load this
00:17:46instruction set so the agent understands how to use the agent browser. And then finally for the fourth
00:17:52layer here, I have prime commands. So everything else I've covered here is static documentation
00:17:57that we're going to update every once in a while. But sometimes we need our agent to do exploration
00:18:02of our live code base. We need to make sure that all of its information is completely up to date
00:18:07and we're willing to spend some tokens with sub-agents upfront making that happen. That's
00:18:11what the prime command does is we are exploring our code base at the start of our planning process
00:18:16so it understands our code base going into what we want to build next. And as you can see in my
00:18:22commands folder I have many different prime commands because there are different parts of the code base
00:18:27I want the agent to understand depending on what I want to build. And so my generic prime command is
00:18:32this one we're looking at right here. I just tell it to get an understanding of the Archon code base
00:18:36at a high level. And so step by step here is what I want it to read through including the git log
00:18:41because that is important for using our git log as long-term memory. I also have a specialized one
00:18:47prime workflows for when I know that I'm working on the workflow engine in Archon. So a very similar
00:18:53command but just more specialized. So I use this at the start of the conversation so that my agent can
00:18:58quickly load everything it needs. I can confirm it understands my code base then I get into the
00:19:03planning process that I was showing you earlier. So as a super quick summary global rules are always
00:19:09loaded. On-demand context when you know you're about to work on a part of the code base that
00:19:13is documented separately. Skills when you need different capabilities like okay it's time to do
00:19:18end-to-end testing let's load the skill for the agent browser. And then prime commands I will
00:19:22usually run at the very start of a conversation to set the stage for my planning. So that is
00:19:28everything for select. Now we'll go to compress and this is actually the fastest section to cover
00:19:34because you shouldn't need to compress often if you're doing the right isolate and select
00:19:39well. If we are doing all the other strategies to keep our context lean we are avoiding this and
00:19:46this is good because you want to avoid compressing as much as possible. If you must compress then
00:19:52there are a couple of strategies to cover here. And those two strategies are the handoff and a
00:19:56focus compaction. So let's get into cloud code and take a look at this. So the handoff we already
00:20:02covered it's one of our right strategies. We summarize everything that we just did to hand
00:20:06off to another agent or the same agent after memory compaction. And then we have the built-in
00:20:12compact command in clod code. This is going to summarize our conversation then wipe the
00:20:18conversation and put the summary at the top of our context window. Now the handoff is really
00:20:23powerful because that's where we get to define our own workflow for how we remember information. But
00:20:28the slash compact is very useful as well especially because we can optionally provide summarization
00:20:34instructions. When I absolutely have to compact I will use this every single time. For example focus
00:20:41on the edge cases that we just tested right. So now it's going to when it creates that summary pay
00:20:48more attention to that part of its short term memory. I didn't spell it right that's totally
00:20:53good. It'll run the compaction here. And so the handoff and slash compact are kind of either or.
00:20:58But I definitely find times where I want to use both. The handoff especially when you run into a
00:21:03compaction more than twice usually that conversation is getting way too bloated so you want to start a
00:21:09fresh session with the handoff. But if I'm just doing it once a lot of times I am okay running a
00:21:14slash compact once. But usually after a compact I will still ask the agent to summarize what it
00:21:19remembers so I can make sure that it truly understands right like what do you remember
00:21:24here something like that. And so yeah it really isn't ideal. Avoid compaction as much as possible.
00:21:30The best compression strategy is not needing compression. All right so that is the Whisk
00:21:36framework. I know it was a lot so I hope that you found this helpful and let me know if there's any
00:21:41one strategy that you want me to dive into deeper because I could make an entire video on any one of
00:21:46these strategies. But this is the Whisk framework. I hope that you can use this to take you to the
00:21:52next level of cloud code or really any AI coding assistant. And so if you found this video helpful
00:21:59and you're looking forward to more content on AI coding and being able to apply these kinds of
00:22:04frameworks in practice I would really appreciate a like and a subscribe. And with that I will see you
00:22:09in the next video. Psst! I've got one last thing for you really quick that you don't want to miss.
00:22:14On April 2nd I am hosting a free AI transformation workshop live on my YouTube channel along with
00:22:20Lior Weinstein the founder of CTOX and this is a big deal. Lior is going to teach us how to
00:22:27restructure our entire organization for AI and then I'll teach you how to master the AI coding
00:22:32methodology that I use to build reliable and repeatable systems for my coding agents. And so
00:22:38I'll have a link in the description to this page. It's going to be live on my YouTube channel so you
00:22:42can enable notifications for it by clicking on this button right here. I will see you there!