The Claude Code Limits Problem Is Finally Solved

AAI LABS
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Claude code has not been great recently.
00:00:02Our team uses it every day and over the past few weeks we've been running out of limits
00:00:06way faster than we should be.
00:00:07The 1 million token context window was supposed to make things better but its actually made
00:00:12it worse.
00:00:13This is why we went and researched optimizations we could find to make Claude code last longer.
00:00:18Before we move forward to how we can actually make the most out of the limits, let us first
00:00:22discuss how the plans and limits system of Claude actually works.
00:00:26This section is just for explaining for those who are not familiar with how the limits actually
00:00:30work.
00:00:31Claude has 2 paid plans which include the pro and max plan.
00:00:34Max is the most expensive one and pro is a cheaper plan with just $20 monthly.
00:00:38Both plans have access to different features that were not available in the free plan including
00:00:43Claude code, co-work and others.
00:00:45But they all follow the same rule.
00:00:46No matter which plan it is, each gives you a limited number of messages you can send within
00:00:51a 5 hour window and once that window ends, your message count resets.
00:00:55The number of messages you get differs by plan.
00:00:57The 5 hour window starts when you send your first message, whether its on Claude desktop,
00:01:01web or any Claude interface.
00:01:03After the window starts, each message you send is counted against the set limit of your plan.
00:01:08Now you might expect that the window only counts when you are actively using it.
00:01:11But even if you go idle in between and then use it heavily in the 5th hour, the window
00:01:15is still running and you would have to wait until the full 5 hours pass before your limit
00:01:20resets.
00:01:21The 5 hour window is also not dependent on your device.
00:01:23So if you are using more than one device with the same account, all usage will be counted
00:01:27within the same limit.
00:01:28Now for the pro plan, you get around 45 messages per 5 hour window.
00:01:32The max plan gives you 225 and the max 20 times plan, which is more expensive than the
00:01:37100 dollar plan, gives you 900 messages in the same window.
00:01:41These numbers can vary depending on the model you use as you get more messages with Sonnet
00:01:46and fewer with Opus.
00:01:47Now you might think that this number of messages sounds more than enough for your use case.
00:01:51But this is just a rough count and there are other factors that affect it.
00:01:54The first one is the model you are using.
00:01:56Opus models consume around 3 times more tokens for the same request than Sonnet because they
00:02:01are far more powerful and compute intensive.
00:02:03So if you are using Opus all the time, you won't get 45 messages in your 5 hour window
00:02:08and your limit will run out much faster.
00:02:10The pro plan has a lower limit overall.
00:02:12As for the max plan, while a single person might manage on it, max is usually purchased
00:02:16by organizations and distributed across team members, so it won't hold up with multiple
00:02:20people on board.
00:02:21We do the same at AI labs, we've purchased a max plan and distributed it across our team.
00:02:26Even with that, we still run out of the limit frequently which led us to research ways to
00:02:30make it last longer.
00:02:31The second factor is the type of task you are performing.
00:02:34Compute intensive tasks or tasks that require multiple tools consume a lot of tokens.
00:02:38So the window will run out much faster than usual and you might not even make it to 45
00:02:43messages on the pro plan.
00:02:44And on top of all that, Anthropic has recently reduced the session limit faster during peak
00:02:48working hours when many people are using the service heavily at once.
00:02:52So your Claude plan will run out even faster before you can get any actual work done.
00:02:56This is why now is the right time to learn how to make the most out of your window and
00:03:00use Claude effectively all day.
00:03:02But before we move forwards, let's have a word by our sponsor, Twin.
00:03:05If you've tried automating with tools like Zapier or N8N, you know the deal.
00:03:09Rigid workflows, constant breakdowns and hours wasted connecting apps.
00:03:13And local agents like Claudebot are security nightmares and way too expensive.
00:03:17Twin changes that.
00:03:18It's a no-code AI agent that actually does the work for you while you sleep.
00:03:21It connects to tools via APIs when they exist and when they don't, it builds integrations
00:03:26on the fly, giving you an infinite integration library.
00:03:29And if there's no API, Twin can just browse and interact like a human.
00:03:33On top of that, you get built-in access to tools like Perplexity, Gamma, VO3 and Nanobanana.
00:03:38They've just launched the Twin API.
00:03:40So you can trigger agents from anywhere and plug them into your existing workflows.
00:03:44And the best part?
00:03:45These agents learn.
00:03:46They fix themselves when something breaks, improve over time and run 24/7.
00:03:50Stop babysitting broken automations.
00:03:52Click the link in the pinned comment and check out Twin.
00:03:55Now you might already know that the Claude code source code was leaked.
00:03:58And a lot of people identified that there are many issues inside it that can make limits
00:04:02run out faster than intended.
00:04:04One of these is truncated responses staying in the context.
00:04:07So if you get an error message like a rate limit being reached, it can create a partial
00:04:12response.
00:04:13And regarding that, it retries while keeping the previous context along with the partial
00:04:17error-filled message.
00:04:18This bloats the context with unnecessary information and wastes tokens.
00:04:22The skill listings are also injected mainly for faster access, even though they don't provide
00:04:27much value because faster handling through the skill tool already exists.
00:04:31Similar to that, there are some other issues as well.
00:04:33Because of all this, a lot of people are complaining about Claude limits being hit faster than expected.
00:04:38So to counteract both the official limits and these hidden token drains, you have to take
00:04:43certain measures to make Claude code last longer when you're building your products.
00:04:47We share everything we find on building products with AI on this channel.
00:04:51So if you want more videos on that, subscribe and keep an eye out for future videos.
00:04:55We'll start with the tips you might have already heard from us if you've watched our previous
00:04:59videos.
00:05:00The first one is the clear command.
00:05:01Use this whenever you've completed a task and don't need the previous context anymore.
00:05:05For example, when you are done implementing the app and want to move to the testing phase,
00:05:09you don't need the earlier context.
00:05:11So it's better to reset it and start the next task with a fresh context window.
00:05:15But sometimes you do want to retain some of that context.
00:05:18In that case, you can run the compact command instead.
00:05:21It summarizes the whole interaction and frees up space with a summary in the context.
00:05:25The reason we want you to use these is because every time Claude sends a message, it includes
00:05:29the entire conversation so far, along with system prompts, your tools, and all previous
00:05:34conversation history.
00:05:35With each new message, this keeps growing, resulting in a bloated context window and higher
00:05:40token usage per message.
00:05:41Now even with compacting, if you ask side questions in the main window, you're still bloating it
00:05:46with unrelated content.
00:05:47So you can use the by the way command to ask a quick side question.
00:05:50It responds in a separate session context window.
00:05:53This side question wouldn't go with the next message you send, leading to fewer tokens per
00:05:57request.
00:05:58Now even though planning might sound like a token intensive task, you need to start your
00:06:02projects with it.
00:06:03This is because if you don't spend time planning, you will have to course correct Claude later
00:06:07when its implementation is not aligned with what you need.
00:06:10Spending tokens upfront on planning saves you from wasting far more tokens on corrections
00:06:14down the line.
00:06:15Sometimes Claude doesn't follow your instructions as you want to.
00:06:18In those times, we often prompt it again with the correct way of implementation.
00:06:22But instead of re-prompting, you can run the rewind command to restore the conversation
00:06:26and code to a previous point before the message where Claude didn't align and make the changes
00:06:31directly in the prompt.
00:06:32You can also double press the escape key to do the same thing.
00:06:35This removes the incorrect implementation from the context window and the wrong outputs don't
00:06:39get sent to the model.
00:06:41Now all of these commands help you save tokens during a session.
00:06:44But the bigger impact comes from how your project is structured in the first place.
00:06:47You might have already structured your projects using different frameworks like Beemad, SpecKit
00:06:52or more.
00:06:53But the majority of these frameworks are actually token intensive.
00:06:56So if you use them in your own app, expect your token limit to be reached faster.
00:07:00While these frameworks might sustain on max plans, they definitely won't on pro.
00:07:04Now even if you're not using frameworks, you might have set up your own.
00:07:07For creating Claude.md file you must have used the init command which goes through your codebase
00:07:12and creates a Claude.md file for you.
00:07:14It does create one, but it contains a lot of issues.
00:07:17This file is supposed to provide guidance to the AI agent, but it lists certain things that
00:07:20the AI already knows on its own.
00:07:22For example, the commands it shows are ones used to run every dev server and Claude already
00:07:27knows how to do that.
00:07:28Unless you have a different running flag for running the server, there's no need to add
00:07:31those in.
00:07:32As with the architecture, Claude can read file names and deduce what each file is about based
00:07:37on the name because it understands file systems and uses it for navigating around.
00:07:41So there's no real need for these kinds of instructions unless there are specific cases
00:07:45where additional guidance is required.
00:07:47If you're going to write your own Claude.md, it should ideally be less than 300 lines.
00:07:52The shorter the file, the better it will perform and the more focused Claude will be on what
00:07:56actually matters.
00:07:57It should act as a guiding file, not a detailed manual explaining how to do everything.
00:08:01Whatever you include should be generically applicable across the project, not specific
00:08:05details of each part all packed into one file.
00:08:08Include what Claude shouldn't do, any of your development practices and other similar instructions
00:08:13which Claude doesn't know by default only in the Claude.md.
00:08:16You need to configure this file properly because this file gets loaded into the context once
00:08:20every session and stays there.
00:08:22So unnecessary information in the context window means you're wasting tokens with each turn
00:08:27which aren't even needed upfront.
00:08:28For specific aspects of the project like database, schema or other areas where different rules
00:08:33are required, split them into separate documents and link them in the Claude.md file.
00:08:37This allows Claude to progressively pull in only the docs it actually needs.
00:08:41We also mentioned this in our previous video, creating project rules that are specific to
00:08:45certain paths helps Claude stay focused.
00:08:48This way, Claude only has relevant information in context and avoids unnecessary token usage.
00:08:53So you should also separate rules files for area specific logic so that Claude can load
00:08:57only what's required.
00:08:58You also need to make use of skills for repetitive workflows and add scripts and references so
00:09:03it can perform tasks more accurately.
00:09:05Skills help by progressive loading only the required part and this makes Claude stay focused
00:09:10on the relevant aspect of the task.
00:09:12Bundling with scripts help by not wasting tokens on the deterministic tasks which can
00:09:16be handled programmatically.
00:09:17The reason for separating files is simple.
00:09:19If Claude is working on one part, it doesn't need information about unrelated areas.
00:09:24But if everything is placed in the same Claude.md file, all of it will be loaded every time,
00:09:29leading to unnecessary token usage.
00:09:30You can also use the append system prompt flag to add specific instructions directly to the
00:09:35system prompt.
00:09:36The session starts with those instructions instead of putting everything into the Claude.md
00:09:40file.
00:09:41These instructions are temporary and will be removed once the session ends.
00:09:44Now this might sound like it's adding to the context, but it's actually more efficient than
00:09:48putting a one-time instruction in Claude.md.
00:09:51If you add it there, Claude keeps it in the context permanently, wasting tokens unnecessarily.
00:09:56With appending, you provide the instructions exactly when you need them.
00:09:59Also, if you are enjoying our content, consider pressing the hype button because it helps us
00:10:03create more content like this and reach out to more people.
00:10:06You also need to set the effort level of the model you are using.
00:10:10If you are not working on a task that requires much thinking, set it to low since the low
00:10:14setting saves tokens.
00:10:15By default, it's set to effort auto which means the model decides how much effort to
00:10:20use but you can manually change it.
00:10:21If your task isn't very complex, there's no need to use a high effort setting.
00:10:25Now as we mentioned earlier, Opus is the most token consuming model.
00:10:28So if you are working on straightforward tasks, switch to Haiku.
00:10:31If your task requires a reasonable level of thinking, use Sonnet.
00:10:34It might not be as powerful as Opus, but it is still efficient and saves more tokens.
00:10:39If you've configured multiple MCPs for a project and don't need a particular one, just disable
00:10:43it so it doesn't waste tokens by injecting unnecessary information into the context window.
00:10:48Another important step is creating hooks that filter out content that shouldn't belong
00:10:52in Claude's context window.
00:10:54For example, I've configured test cases for my project.
00:10:57When we run them, they report both passed and failed tests and all of that gets loaded
00:11:01into the context.
00:11:02But Claude's main concern is the failed tests since those are what need fixing.
00:11:05So you can create a hook that uses a script to prevent the passed test cases from entering
00:11:10the context window and only the failed ones get included.
00:11:13This saves a significant amount of tokens compared to injecting all test reports.
00:11:17You can configure hooks for many other tasks the same way to optimize token usage.
00:11:21Now aside from all of that, there are certain configurations you need to make in your .claud
00:11:25folder to improve performance.
00:11:27The first one is setting disable prompt caching to false.
00:11:30This makes Claude cache your most commonly used prefixes, which reduces token usage.
00:11:34Anthropic doesn't charge you for parts that are sent repeatedly, you only pay for the new
00:11:38content.
00:11:39You can also disable auto memory to prevent it from adding content to your context and
00:11:43increasing token usage.
00:11:44Auto memory is a background process that analyzes your conversations and consolidates useful
00:11:49information into memory files for your specific project.
00:11:52Disabling it means it won't track your habits but it will save tokens by not running in the
00:11:56background.
00:11:57There's another flag called disable background task which stops background processes from
00:12:00consuming tokens continuously.
00:12:02These include dream, memory refactoring and cleaning and background indexing.
00:12:06Turning this off helps save tokens because even if you're not actively chatting, these
00:12:10processes would still be working on your conversation.
00:12:13You should also disable thinking when it's not needed because thinking consumes a lot
00:12:16of context and wastes tokens extensively on tasks that don't even need it.
00:12:20Now this is different from the effort setting we discussed earlier.
00:12:23The effort setting controls how much reasoning Claude does within a response, so lower effort
00:12:28means less thinking, but it still thinks.
00:12:30Disabling thinking completely turns off the internal reasoning step and Claude just generates
00:12:34the response directly.
00:12:35So if your task doesn't need deep reasoning, disable thinking entirely.
00:12:39If it needs some reasoning but not a lot, lower the effort level instead.
00:12:43Finally configure max output tokens to a set number.
00:12:46There's no default, but limiting this controls how much the model generates.
00:12:50Set it lower if you want to save tokens aggressively or increase it if your task requires longer
00:12:55outputs.
00:12:56Now the Claude.md template and other resources are available in AI Labs Pro for this video
00:13:00and for all our previous videos from where you can download and use it for your own projects.
00:13:05If you've found value in what we do and want to support the channel, this is the best way
00:13:09to do it.
00:13:10The link's in the description.
00:13:11That brings us to the end of this video.
00:13:13If you'd like to support the channel and help us keep making videos like this, you can do
00:13:17so by using the super thanks button below.
00:13:19As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Extending Claude code limits requires a combination of session management commands like 'compact' and 'rewind', project-specific rule sharding, and disabling background memory and thinking processes in the .claud configuration.

Highlights

Claude Pro provides 45 messages per 5-hour window, while Max offers 225 and the Max 20x plan provides 900.

Opus models consume approximately 3x more tokens for the same request compared to Sonnet models.

Truncated responses and error messages remain in the context during retries, leading to unnecessary token bloat.

Maintaining a Claude.md file under 300 lines prevents the AI from losing focus and wasting tokens on every message turn.

The 'rewind' command or double-pressing the escape key removes incorrect implementations from the context window entirely.

Setting 'disable_prompt_caching' to false in the .claud folder allows Anthropic to cache repeated prefixes and reduce costs.

Disabling 'background_task' stops automated processes like dream, memory refactoring, and indexing from consuming tokens continuously.

Timeline

Core Mechanics of Claude Plans and Windows

  • The 1 million token context window often accelerates limit exhaustion rather than improving usability.
  • A fixed 5-hour reset window triggers upon the first message regardless of active usage or idle time.
  • Usage across multiple devices with a single account aggregates against the same session limit.

Subscription tiers dictate specific message caps, with the Pro plan offering roughly 45 messages and the Max plan providing 225. These limits are not static and decrease when using compute-intensive models like Opus, which uses three times the tokens of Sonnet. The reset timer is strictly chronological, meaning a user who sends one message and waits four hours only has one hour of access remaining before the window resets.

Hidden Token Drains and External Factors

  • Peak hour usage triggers faster session limit reductions by Anthropic to manage server load.
  • Partial or error-filled responses remain in the context during retries, creating cumulative token waste.
  • Automated skill listings are often injected unnecessarily, duplicating functionality that already exists in the tools.

Internal architecture flaws in Claude code contribute to rapid limit depletion. When a rate limit error occurs, the system often retries while retaining the failed, truncated response in the active context. This behavior, combined with external throttles applied during high-traffic periods, makes efficient token management a necessity for developers.

Session Management and Context Commands

  • The 'clear' command resets the context window entirely for new phases like moving from implementation to testing.
  • The 'compact' command replaces detailed interaction history with a concise summary to reclaim space.
  • The 'by the way' command isolates side questions into a separate session to prevent main window bloat.

Every message sent to Claude includes the entire conversation history, system prompts, and tool definitions. Strategic use of session commands prevents this payload from growing exponentially. Planning upfront is also a token-saving strategy; while it costs tokens initially, it prevents the heavy token expenditure required for complex course corrections later in the development cycle.

Optimizing Claude.md and Project Structure

  • Effective Claude.md files act as high-level guides and should remain under 300 lines of text.
  • Path-specific rule files allow Claude to load only the logic relevant to the current directory.
  • The 'append system prompt' flag provides temporary instructions that disappear after the session ends.

Default initialization commands often produce bloated documentation containing information the AI already understands, such as standard dev server flags. Efficient projects split database schemas and specific logic into separate documents linked within the main Claude.md file. This structure allows the agent to pull in documentation progressively rather than loading the entire codebase architecture into every single message turn.

Advanced Configuration and Resource Management

  • Custom hooks can filter test reports to only include failed cases in the context window.
  • Disabling the 'thinking' feature removes internal reasoning steps to save tokens on simple tasks.
  • Manual effort settings provide granular control over how much reasoning the model performs.

Technical optimizations in the .claud folder provide the most significant token savings. Setting 'disable_prompt_caching' to false and 'disable_background_task' to true stops the AI from running automated indexing and memory refactoring in the background. Users can also switch between Haiku for simple tasks, Sonnet for standard work, and Opus only for high-complexity problems to maximize their 5-hour message allotment.

Community Posts

View all posts