70% Context Is The Most Important Thing For AI Coding

AAI LABS
Computing/SoftwareInternet Technology

Transcript

00:00:00You already know about these AI coding frameworks like Beemad, Speckit and others, but these
00:00:04are not the only ones.
00:00:06There are hundreds of people experimenting and launching their own workflows, but when
00:00:09you try them out, you'll notice that they often fail to deliver on their promise.
00:00:13It's not because their methods are bad, it's because they don't fit your specific use case.
00:00:18When we build apps, the majority of the time we create our own workflows instead of relying
00:00:22on pre-made ones.
00:00:23This is because workflows should be built around your specific use case and only work if they
00:00:28align with the project you're trying to build.
00:00:30So how do you build a workflow for your own process?
00:00:32For that, you need to know certain principles.
00:00:34These are the principles that every framework uses in one way or another.
00:00:38Before discussing the main principles, it is essential for you to know what's inside the
00:00:42context window of these AI tools.
00:00:44It's really important, as managing context is basically what these frameworks do.
00:00:48The context window is basically the amount of information the model can remember at once.
00:00:53Anything that goes out of the model's context window goes out of its working memory, and
00:00:57it has no way to recall it.
00:00:59Models have a limited context window.
00:01:00For example, Anthropic models have a 200k token context window, and Gemini models have 1 million.
00:01:06Even though these might look like really big numbers in terms of the messages you send,
00:01:10they actually are not that huge, because in these AI tools, the context window does not
00:01:14only consist of your system prompt and user messages, but also includes a lot of other
00:01:18things like your past messages, memory files, tools, MCP calls and so on.
00:01:23You need to learn how to make the most out of this limited working space, so that when
00:01:27you build your workflows, the model does exactly what you want it to do.
00:01:30I will be using Claude Code as my primary coding tool throughout the video, but you can build
00:01:35your workflow with any platform, as they all have the tools needed for these principles.
00:01:39The most important principle and the key to any workflow design is progressive disclosure.
00:01:44That means revealing to the LLM only what matters, and keeping the model's attention focused
00:01:48on what is actually needed right now, rather than filling the context window with everything
00:01:53it might need in the future.
00:01:54Now, more advanced models like Sonnet 4.5 have a context editing feature built right
00:02:00in, where they can understand what's noise and try to filter it out on their own, and
00:02:04they use grep commands to narrow down what you want.
00:02:07But that alone is not enough.
00:02:08When we give vague instructions, even these newer models load a lot of things that are
00:02:12not needed and pollute the window.
00:02:14Instead of asking Claude to fix the error in your backend, it is better to ask it to check
00:02:19the endpoints one by one, rather than asking it to fix everything at once.
00:02:23The skills feature in Claude is now open source and all tools can use it.
00:02:27Skills are pretty much the embodiment of progressive disclosure.
00:02:29Their description provides just enough information for your AI coding platform to know when each
00:02:34skill should be used without loading everything into the context.
00:02:37A huge mistake people make is using MCPs for everything.
00:02:41You should only use MCPs when external data is required and use skills for everything else.
00:02:46The second equally important principle is that information not needed right now should not
00:02:50belong in the context window.
00:02:52To achieve this, the tools use structured note-taking.
00:02:55And we can use this to our advantage by providing your AI tool with external files that it can
00:03:00use to document any decisions, issues or technical debt.
00:03:03This approach allows your agent to maintain critical context that might otherwise be lost
00:03:07when building something really complex.
00:03:09These tools also have a compaction feature to manage the context window.
00:03:13And when the context resets, you don't have to rely solely on the compaction summary.
00:03:17For example, your agent can use these notes to gain context on what has already been done
00:03:22and what still needs to be done.
00:03:23This approach is particularly helpful for long horizon tasks, which are inherently complex.
00:03:28You might be familiar with the agent .md.
00:03:30It's a standard context file that all agents read before starting the session.
00:03:34Some agents don't follow this and have their own, such as the claud .md, and I use them
00:03:38to guide the agent on how the external files are structured and what to write in each one
00:03:43of them.
00:03:44Sometimes these agents randomly pause in the middle of a long-running task.
00:03:47A lot of the time this happens because the context has gone above 70% of its limit.
00:03:52This is where the concept of attention budget comes in.
00:03:55Your context window is what the model pays attention to while generating output.
00:03:59When it goes over 70%, the model has to focus more and there's a higher chance of hallucinations.
00:04:04In terms of AI agents, it stops them from using their tools effectively and oftentimes they
00:04:09just choose to ignore them.
00:04:10To solve this, there are several built-in tools you can use.
00:04:14As you already know, compaction allows the model to start afresh with a proper summary
00:04:18of what has happened as the starting prompt and a reduced context window.
00:04:21So instead of letting it fill up to 90% and triggering the auto-compact feature, try to
00:04:26keep an eye on the context window and do it yourself.
00:04:28If you're experimenting, use claud's built-in rewind so that you can delete the unnecessary
00:04:32parts instead of continuing them and asking claud for changes.
00:04:36You should also clear or start a new context window for any new task so that the previous
00:04:40context doesn't slow down the model.
00:04:42Another thing that stems from the principle of progressive disclosure is the ability of
00:04:46these agents to run tasks in the background without polluting the main context window.
00:04:51Sub-agents work in their own isolated context window and only report the output back to
00:04:55the main agent.
00:04:56This is particularly helpful when working on tasks that are isolated from each other because
00:05:01your main context window is protected from being bloated with the tool calls and searches
00:05:05that the sub-agent makes, ensuring the information remains in its dedicated working zone.
00:05:10Since these agents run in the background, you can continue interacting with your main agent
00:05:14and let it work on something that actually requires your attention.
00:05:17Whenever I want something researched, such as the rules of a new framework that I'm
00:05:21working with, I just use these sub-agents.
00:05:23This way, their tool calls and searches are isolated and they just return the answer to
00:05:27the main agent.
00:05:28If you understand the principle of note-taking, you should also know which file format to use
00:05:33for which task.
00:05:34Since these files have different formats, they affect the token count and hence the efficiency
00:05:39of your workflow.
00:05:40YAML is the most token efficient, so I mainly use it for database schemas, security configs
00:05:45and API details.
00:05:46Its indentation helps models structure information properly.
00:05:49Markdown is better for documentation like your claud.md because the heading levels make it
00:05:53easy for the model to navigate between sections.
00:05:56XML is specifically optimized for claud models.
00:05:59Anthropic states that their models are fine-tuned to recognize these tags as containers and separators,
00:06:04which is useful when you have distinct sections like constraints, summaries or visual details.
00:06:10Other models generally prefer Markdown and YAML over XML.
00:06:13And lastly, JSON.
00:06:14It's the least token efficient because of all the extra braces and quotes, so I only use
00:06:18it for small things like task states and don't really recommend using it for the most part.
00:06:23Git is one of the most basic things you're taught when starting programming.
00:06:26We've seen another trend with these context workflows in which people actually use the
00:06:30git commit history as a reminder to the model of the progress that's been made, whether
00:06:34across the whole project or on a single task.
00:06:37Even if you don't want to use it to store progress, you should generally use these context engineering
00:06:42workflows in a git initialized repository.
00:06:44Having a context engineering workflow means that you don't allow the model to do everything
00:06:48at once, but instead act on planned steps one by one.
00:06:51If at any stage you encounter a problem, git lets you control which version to revert to
00:06:56and helps in evaluating which change is causing problems.
00:06:59People have also implemented parallelism with git worktrees.
00:07:02I've also shown plenty of workflows where sub-agents work in dedicated worktrees for
00:07:06parallel work.
00:07:07Whatever workflow you end up making, there are always going to be cases where you end
00:07:10up repeating instructions for common procedures.
00:07:13A good example is how you ask the AI tools to do git commits or update your documentation.
00:07:18In almost all of these AI tools, there are ways to reuse your most repeated prompts.
00:07:22I often use custom/commands in my own projects because they basically give Claude a reusable
00:07:27guide.
00:07:28I often use a catchup command that contains instructions on how I structure memory outside
00:07:33the context window, so Claude knows how to catch up with the project instead of reading
00:07:37every file.
00:07:38They are also good at enforcing structure.
00:07:40For my commits and documentation to follow a defined format, I use a commit/command that
00:07:45follows a specific structure for how it should write commit messages and what pre-commit checks
00:07:49it should make before committing.
00:07:51This way the /commands keep everything standardized, and I don't have to instruct Claude again and
00:07:55again to perform tasks the way I prefer.
00:07:58As you know, MCPs should be used whenever external data is required.
00:08:01Jira is the most widely used team management software.
00:08:04If you want to get information from tickets, you can use the Jira MCP so it can access tickets
00:08:09directly and start implementing changes.
00:08:11Similarly, I use the Figma MCP to provide Claude code with the app's style guide which it then
00:08:16uses to construct the design.
00:08:18For tasks where the model's built-in capabilities fall short, MCPs are essential for interacting
00:08:23with external sources efficiently.
00:08:25You can include these MCPs directly in your /commands so that they become part of your
00:08:30whole workflow.
00:08:31That brings us to the end of this video.
00:08:32If you'd like to support the channel and help us keep making videos like this, you can do
00:08:36so by using the super thanks button below.
00:08:39As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Building effective AI coding workflows requires understanding context management principles like progressive disclosure, structured note-taking, and attention budgeting rather than relying on pre-made frameworks.

Highlights

Pre-made AI coding frameworks often fail because workflows must be customized to specific use cases rather than relying on generic solutions

Progressive disclosure is the key principle - reveal only what the AI needs right now to keep context focused and efficient

Use skills for internal workflows and reserve MCPs only for external data access to prevent context pollution

Structured note-taking in external files helps manage complex long-horizon tasks and preserves context across sessions

Monitor context window usage and compact before reaching 70% to prevent hallucinations and tool-usage failures

Sub-agents run in isolated context windows for parallel work without bloating the main agent's attention budget

Choose file formats strategically: YAML for efficiency, Markdown for documentation, XML for Claude models, avoid JSON for large data

Timeline

Introduction: Why Pre-Made Frameworks Fail

The video introduces the problem with popular AI coding frameworks like Beemad and Speckit - they often fail to deliver on promises not because their methods are bad, but because they don't fit specific use cases. The creator explains that when building apps, most developers create custom workflows instead of using pre-made ones because workflows must align with the specific project being built. To build effective custom workflows, developers need to understand certain fundamental principles that every framework uses. This sets up the video's focus on teaching these underlying principles rather than promoting specific tools.

Understanding the Context Window

This section explains the critical concept of the context window - the amount of information an AI model can remember at once. Anything outside this window is lost from working memory with no way to recall it. The creator provides specific examples: Anthropic models have a 200k token context window while Gemini models have 1 million tokens. Despite these seemingly large numbers, the context window fills quickly because it contains not just system prompts and user messages, but also past messages, memory files, tools, MCP calls, and more. Understanding how to maximize this limited working space is essential for building workflows that make the model do exactly what you want.

Progressive Disclosure: The Core Principle

Progressive disclosure is introduced as the most important principle in workflow design - revealing to the LLM only what matters right now and keeping attention focused on current needs rather than filling context with everything that might be needed in the future. While advanced models like Sonnet 4.5 have built-in context editing features and can filter noise using grep commands, this alone isn't enough. The creator gives a practical example: instead of asking Claude to fix all backend errors at once, it's better to ask it to check endpoints one by one. The skills feature in Claude Code exemplifies progressive disclosure, providing just enough information in descriptions for the AI to know when to use each skill without loading everything into context.

Structured Note-Taking and External Memory

The second major principle states that information not needed immediately should not occupy the context window. Tools achieve this through structured note-taking using external files to document decisions, issues, and technical debt. This approach maintains critical context that would otherwise be lost when building complex systems. When context resets through compaction features, agents can use these notes to understand what's been done and what remains, rather than relying solely on compaction summaries. This is particularly valuable for long-horizon tasks which are inherently complex. The creator mentions the standard agent.md file that agents read before sessions, though some agents like Claude use their own formats (claud.md) to guide how external files are structured.

Managing Attention Budget

The concept of attention budget addresses why agents randomly pause during long-running tasks - typically when context exceeds 70% of its limit. When the context window goes over 70%, the model must focus harder and hallucination chances increase significantly. For AI agents specifically, this degrades tool usage effectiveness and they often simply ignore available tools. The creator provides several solutions: use compaction proactively before reaching 90% rather than waiting for auto-compact, utilize Claude's built-in rewind feature to delete unnecessary parts during experimentation, and start new context windows for new tasks to prevent previous context from slowing the model. Active monitoring and management of context window usage is essential for maintaining agent performance.

Sub-Agents and Isolated Context

Sub-agents extend the progressive disclosure principle by running tasks in the background within their own isolated context windows, reporting only output back to the main agent. This is particularly valuable for isolated tasks because the main context window remains protected from being bloated with the sub-agent's tool calls and searches. The information stays in dedicated working zones, ensuring efficiency. Since these agents run in the background, users can continue interacting with the main agent on tasks requiring active attention. The creator shares a practical use case: whenever researching new frameworks or rules, they use sub-agents so tool calls and searches remain isolated while only answers return to the main agent.

File Format Selection for Token Efficiency

Understanding which file format to use for different tasks impacts token count and workflow efficiency significantly. YAML is the most token-efficient format, making it ideal for database schemas, security configs, and API details, with indentation helping models structure information properly. Markdown works better for documentation like claud.md because heading levels enable easy navigation between sections. XML is specifically optimized for Claude models - Anthropic states their models are fine-tuned to recognize XML tags as containers and separators, useful for distinct sections like constraints, summaries, or visual details. Other models generally prefer Markdown and YAML over XML. JSON is the least token-efficient due to extra braces and quotes, so the creator only uses it for small items like task states and doesn't recommend it otherwise.

Git Integration and Version Control

Git serves multiple purposes in context workflows beyond basic version control. A growing trend uses git commit history as a reminder to models of progress made across entire projects or individual tasks. Even without using it for progress tracking, context engineering workflows should operate in git-initialized repositories. The creator explains that context engineering workflows prevent models from doing everything at once, instead acting on planned steps sequentially. When problems occur at any stage, git provides control over which version to revert to and helps evaluate which changes cause issues. Advanced implementations use git worktrees for parallelism, with sub-agents working in dedicated worktrees for parallel work, as demonstrated in various workflows the creator has shown.

Reusable Prompts and Custom Commands

Custom commands solve the problem of repeating instructions for common procedures like git commits or documentation updates. Most AI tools provide ways to reuse frequently used prompts. The creator uses custom commands in projects to give Claude reusable guides, such as a catchup command containing instructions on memory structure outside the context window so Claude knows how to catch up with projects without reading every file. Custom commands also enforce structure - the creator uses a commit command that follows specific formats for commit messages and defines pre-commit checks before committing. This standardization means commands keep everything consistent without needing to repeatedly instruct Claude on preferred task execution methods.

MCPs for External Data and Workflow Integration

MCPs (Model Context Protocol) should be reserved specifically for accessing external data sources. The creator provides concrete examples: Jira MCP enables accessing team management tickets directly so the AI can start implementing changes based on ticket information, while Figma MCP provides Claude Code with app style guides for constructing designs. MCPs are essential for tasks where the model's built-in capabilities fall short and external source interaction is required. Importantly, MCPs can be included directly in custom commands to integrate them into complete workflows. This strategic use of MCPs - only for external data rather than everything - prevents unnecessary context pollution while enabling powerful integrations with external tools and platforms that enhance the AI's capabilities where needed.

Community Posts

View all posts