Anthropic Finally Fixed The 1M Context Window Problem

AAI LABS
컴퓨터/소프트웨어경영/리더십AI/미래기술

Transcript

00:00:00The 1 million context window sounds like a huge upgrade, but in reality it's way worse than most people realize.
00:00:05And this is exactly why the engineer working on Claude Code Tarik wrote the article.
00:00:09If you think Claude Code only starts getting worse at 1 million tokens, or that 1 million is so much you don't have to worry about it, you are actually wrong here.
00:00:17The degradation actually starts way earlier than halfway through the window.
00:00:21And the fix most people reach for, which is compaction, usually makes it worse.
00:00:24By the end of this video, you'll know exactly how to stop Claude Code from getting dumber, the same way the team at Anthropic does it.
00:00:31Claude Code feels degraded even though the models themselves are actually powerful.
00:00:35You might have noticed that it hallucinates more, has to be reminded again and again of instructions you gave earlier, and forgets those instructions in the long run.
00:00:44We noticed this as well when we were running longer tasks, and Claude's performance felt downgraded.
00:00:48But there is a whole reason behind it.
00:00:50Now models after Opus 4.5 all ship with a 1 million context window instead of the previous 200,000 one.
00:00:56While this upgrade sounds like most of the issues we used to have will be gone with a 1 million context window, it only sounds good in theory.
00:01:03Because now you are able to fit more at once in the context window than before, and ground it with more documents and information so that Claude doesn't stray from the task it needs to do.
00:01:12A million context window also opens the door to long running tasks without worrying too much about the context issues we used to face.
00:01:19But the thing is, all of this is not entirely solved.
00:01:22The million context window is actually a double-edged sword.
00:01:26While it does let Claude go longer and hold more information at once, it all comes at a cost.
00:01:30It opens the door to context rot.
00:01:32Context rot means the model's performance degrades with more information in its context window, because with a bloated context window, it has more things to pay attention to and cannot stay focused.
00:01:42And with a million context window, your context gets much more stuffed, which means there is way more information available to interfere with Claude's reasoning than there was with the 200,000 context window.
00:01:53Context rot is not something that occurs only at a highly bloated context either.
00:01:57According to the creator of Claude code, context rot actually starts happening around 300 to 400,000 tokens, which is much less than a million, around just 40% usage.
00:02:07So no matter the context window size, we need to do things to prevent context rot.
00:02:11And knowing this will actually change how you work with the one million context window.
00:02:15Now a quick recap.
00:02:16The context window is everything the model sees at once, which includes the conversation so far, the Claude.md file, the system prompt, files read into the session, and every tool call output.
00:02:26Each prompt adds more, and once the window fills up, you summarize to continue with a fresher window, which is compaction.
00:02:32If you don't manage context properly, there are four ways in which your agent can fail.
00:02:37This becomes even more evident and problematic in long-running agents.
00:02:40Context pollution is the first one, which we already discussed and is why it occurs.
00:02:45Goal drift is the second.
00:02:46This happens when your agent strays away from what it needs to do because it has too many things to focus on at the moment, or in simpler terms, it has forgotten the goals it was supposed to work toward.
00:02:55This might have happened often if you are working with Claude code, where you want your UI to look a certain way and have already specified it, but it doesn't follow that and you have to remind it of the actual goal.
00:03:05Memory corruption is the third, and it occurs when, during execution, the agent's internal state or stored facts become incorrect, and it continues acting based on that faulty state.
00:03:14It is often hard to pinpoint the exact cause when agents run for long periods, it becomes unclear where the mistake originated.
00:03:21For example, memory corruption can look like a file being written one way by the agent itself and then modified by a sub-agent that is not in the current context.
00:03:29The agent refers back to its own outdated memory and continues operating as if the file still exists in the same form it originally created.
00:03:37Decision inaccuracy is the last one.
00:03:39It occurs when an agent makes contradictory choices in nearly identical situations, such as using one error handling pattern in one place and a different one elsewhere.
00:03:48All of these issues occur when context is not managed properly and they impact the long-term performance of agents.
00:03:53These are exactly the factors that most agent harnesses try to optimize for.
00:03:57So once you have asked Claude to do something and it has finished, there are actually five possible options for what happens next in terms of your next instruction.
00:04:06Each one depends on what your next prompt is.
00:04:08If you use each one properly, the way you work with Claude can improve a lot.
00:04:12Though the most natural choice is to just continue, the other options actually help you manage your context more effectively.
00:04:18So you need to decide carefully whether you actually want to continue in the same flow or start a new session.
00:04:24Once the context gets bloated, you have two ways to shed the context and the first choice is compaction, which we already explained as a summarization of the existing content.
00:04:32But you need to be clear about when you actually want to summarize because the summary is lossy and a lot of details that might look important to you but not important to Claude can get dropped.
00:04:41As a result, important context may no longer exist in the context window.
00:04:44It is better to control compaction yourself instead of letting Claude hit auto-compact because when it triggers mid-task, the compaction becomes even messier.
00:04:52It tends to keep what it thinks is important and removes everything it does not think will be needed, so Claude is actually least reliable during compaction.
00:05:00At that point, Claude's focus is purely on summarization and it is stripped of supporting context like the system prompt and other elements that normally make it more capable.
00:05:08It then relies heavily on its own assumptions about what is important, which can often lead to poor compaction decisions.
00:05:14Bad compaction usually happens when the model cannot clearly determine the direction of your work.
00:05:19For example, if you are in a long debugging session and there was a warning encountered earlier after auto-compaction, if you ask it to fix that specific warning, it won't know what warning are you talking about.
00:05:29This happens because the session was focused on debugging as a whole, so only a general summary of debugging activity was retained and the specific warning was treated as noise and dropped.
00:05:39Recency bias makes it worse.
00:05:41When compaction is triggered, the prompt prioritizes preserving recent details of what was being worked on.
00:05:46So older but still important information may be ignored or left out.
00:05:50If something was done incorrectly earlier, the model may no longer be aware of it after compaction.
00:05:54It only has access to the transcript-level summary, not the full state of the project, because tool call history is not fully preserved during compaction.
00:06:01You can set flags to control when auto-compaction happens, but this is something you should actively manage more often.
00:06:07Trigger compaction around the 300,000 to 400,000 range mentioned by the creator, because that is typically where context rot begins to appear, and always provide a compaction instruction yourself, because Claude responds more carefully when explicit instructions are included.
00:06:22Tell it which decisions, constraints, and discovered issues to carry forward so it knows what to prioritize.
00:06:27So you should hit compact when you actually want context from the previous task flow to carry into the new window, not when you want a fresh start.
00:06:34But before we move forwards, let's have a word by our sponsor.
00:06:37Verdant, an AI-powered platform that helps builders turn ideas into shipped products.
00:06:41You're mid-build, finally in the zone, and your credits run out.
00:06:45Your AI stops dead, momentum gone.
00:06:47Every AI coding tool does this to you, but Verdant doesn't.
00:06:50When your credits hit zero, just switch to eco mode, a zero-cost mode that keeps your AI running without spending another dollar.
00:06:56No interruption, no top-up, no lost momentum.
00:06:59You just keep building.
00:07:00And when you do have credits, you're not stuck picking between Claude, GPT, or Gemini.
00:07:04Verdant's multi-plan mode runs all three together like a decision committee, giving you better plans without the model anxiety.
00:07:10Want even more flexibility?
00:07:11BYOK lets you plug your own API key directly into Verdant.
00:07:15Use your company's Claude or GPT credits, no platform charges.
00:07:18You just pay for what you actually use.
00:07:20You get 100 credits and 7 days to test it out.
00:07:23Click the link in the pinned comment and try Verdant for free.
00:07:26The second choice is to use the clear command, which removes all context and starts a new session with an empty context.
00:07:32Unlike compaction, nothing is carried forward, and only what you provide, again, remains in the context window.
00:07:37Just like compaction, you should not use clear only when you run out of context.
00:07:41If you are switching to an unrelated task, it is straightforward to clear the session and start fresh, so the previous task does not interfere with the new one.
00:07:49For example, if you ask the agent to write test cases for an application you are working on, you may not want it to retain details about how those test cases were generated.
00:07:57Instead of continuing debugging within the same context, you can start a fresh session.
00:08:01This way Claude can work on debugging your application more effectively without being influenced by how it previously generated the test cases.
00:08:08Now there is another approach you can use which is combining both clear and compaction.
00:08:12This allows you to retain only what you want and discard everything else.
00:08:16The idea is to use a structured JSON format that captures the information you want to preserve.
00:08:21You can create a custom command so that you can reuse it frequently.
00:08:24In that command, you can include a JSON structure that contains the full task, current state, constraints, discovered issues and any other relevant details you want Claude to retain and then instruct it to save this to a file.
00:08:35This approach lets you get the best of both methods.
00:08:38Once you run the command, it will analyze the entire conversation and the current state of the application, something that a normal compaction does not reliably preserve, and save everything into the file as specified.
00:08:48A schema is much stricter than pros, so when Claude follows a defined structure, it can represent what is important more consistently and accurately.
00:08:56After the information has been saved to the file, you can safely use the clear command to remove everything from the context window.
00:09:02Then you can start a new session and instruct Claude to refer back to that document to gather context and implement the next task from there.
00:09:14As mentioned earlier as context grows, the agent's focus can drift because there is simply more information competing for attention and this is even more noticeable with the million context window.
00:09:23This practice helps address both the goal drift problem and the decision inconsistency issues we discussed earlier.
00:09:29Instead of continuously pushing forward in a long running task, it is useful to pause periodically and ask the agent to recap what it has done so far, along with the constraints and other important factors.
00:09:39When you do this, it reinforces the original goals and brings key details back into the more recent part of the context window, rather than leaving them buried in older sections.
00:09:48This helps ensure that important information stays fresh in the agent's working context and is less likely to be lost during compaction or diluted over time,
00:09:56so the agent remains more aligned with the task it is supposed to perform and maintains better consistency in its decisions.
00:10:02Also, if you are enjoying our content, consider pressing the hype button, because it helps us create more content like this and reach out to more people.
00:10:09Sub-agents might not look like much, but they are actually a very important way of managing context.
00:10:14Each sub-agent is its own independent instance, with a dedicated context window, full tool access, and the permissions it needs to complete its task.
00:10:22They execute the assigned work in that separate context provided by the parent agent and then return only the final output back to the main context.
00:10:30So all the tool calls it made, files it read, web searches it performed, and intermediate reasoning stay within the sub-agent's own context and do not pollute the main agent's context window.
00:10:40This is an effective way to reduce context rot. Research tasks are the clearest example.
00:10:45The agent goes through multiple websites, pages, and sources, and you do not want all of that raw information continuously added into the main context window.
00:10:53In such cases, a sub-agent can handle the work independently and return only the final synthesis.
00:10:58The key question you should ask yourself before using a sub-agent is whether you will need access to the intermediate steps again, or whether you only care about the final output.
00:11:07ClodCode also manages sub-agent orchestration on its own and can spawn agents to handle tasks automatically.
00:11:13But sometimes you need to explicitly specify in your prompt that you want the work delegated to a sub-agent so it is handled in isolation.
00:11:20So if you are working on research tasks, refactoring tasks, summarization, or document generation, you should consider separating them using sub-agents instead of your main agent.
00:11:30Last but not least, rewinding is really important compared to simply correcting because it removes irrelevant or incorrect parts from the context window while keeping only the correct state.
00:11:40Whenever Clod runs into a mistake, people often try to re-prompt it to take another approach.
00:11:44But a better option is to rewind instead and then provide the correct direction in the new prompt.
00:11:49You can use the rewind command or press the escape key twice to do this.
00:11:53After rewinding, you can also summarize from that point so the conversation up to that stage is preserved as useful context while removing the parts that led to the issue.
00:12:01Rewinding has multiple benefits.
00:12:03First, it cleans the context window by removing the part where things went wrong, which results in a cleaner compaction summary that preserves only correct implementations.
00:12:12Even if you pin important information, you avoid carrying forward sections where the agent deviated from the goal, which helps reduce both decision inconsistency and goal drift.
00:12:21If you are using sub-agents, rewinding ensures they receive a cleaner and more accurate context when tasks are handed off, so incorrect approaches are not included in their working state.
00:12:30Similarly, if you use a handoff command, it captures the correct state of the application instead of a corrupted or outdated one.
00:12:37So build the habit of rewinding instead of repeatedly correcting forward so the agent consistently works from a clean and accurate state through the whole session.
00:12:45That brings us to the end of this video.
00:12:47If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below.
00:12:54As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Prevent context rot by manually managing tokens between 300k and 400k and using structural techniques like sub-agents, structured JSON state-saving, and rewinding instead of relying on default auto-compaction.

Highlights

Context rot begins appearing between 300,000 and 400,000 tokens, significantly earlier than the theoretical 1 million token limit.

Compaction via auto-summarization often drops critical details, leading to hallucination and loss of task direction.

Delegating complex or multi-step tasks to sub-agents keeps intermediate tool calls and file reads out of the main context window.

Manual management using structured JSON schemas to save state proves more reliable than allowing the model to perform automated summarization.

Rewinding—using the rewind command or pressing escape twice—cleans the context window of errors more effectively than attempting to correct mistakes via new prompts.

Frequent periodic recaps help re-orient the agent toward primary goals, preventing drift in long-running sessions.

Timeline

The Reality of Context Rot

  • A 1 million token context window introduces context rot, where model performance degrades due to information overload.
  • Degradation begins at approximately 300,000 to 400,000 tokens, far before the theoretical 1 million limit.
  • Bloated context windows cause the model to lose focus on instructions and goals.

While larger context windows allow for more information, they function as a double-edged sword. With more data to process, models struggle to maintain focus, leading to hallucinations and instruction forgetting. This phenomenon, known as context rot, occurs well before the window is fully utilized.

Failure Modes in Long-Running Agents

  • Four primary failure modes include context pollution, goal drift, memory corruption, and decision inaccuracy.
  • Goal drift occurs when an agent forgets its objective while processing excessive information.
  • Memory corruption happens when an agent acts on outdated or faulty internal state.

Poor context management leads to systematic failures in long-running agents. Context pollution happens when irrelevant information crowds the window, while decision inaccuracy arises from inconsistent behavior across similar tasks. These issues fundamentally undermine the agent's long-term performance.

Effective Context Management Strategies

  • Compaction via automated summarization is lossy and often detrimental to task precision.
  • Manual triggering of compaction around the 300,000-400,000 token range is more reliable than auto-compaction.
  • The clear command provides a fresh session start for unrelated tasks to prevent cross-task interference.

Default auto-compaction often drops context the model deems unimportant but is necessary for specific tasks. Managing compaction manually allows for explicit instructions on what constraints or details to preserve. Alternatively, clearing the context entirely is preferable when transitioning between unrelated tasks.

Advanced Techniques for Reliability

  • Saving state in a structured JSON schema preserves essential data that summarization typically loses.
  • Sub-agents isolate intermediate work, preventing raw data from polluting the main agent's context.
  • Rewinding to a previous state before correcting an error maintains a cleaner, more accurate context than simply prompting for corrections.

Using structured formats like JSON ensures critical state information is carried over accurately between sessions. Sub-agents further improve reliability by keeping messy intermediate data separate from the primary logic. Rewinding acts as an essential cleanup mechanism, ensuring the agent always operates from a known, correct state.

Community Posts

View all posts