00:00:00Claude code has not been great recently.
00:00:02Our team uses it every day and over the past few weeks we've been running out of limits
00:00:06way faster than we should be.
00:00:07The 1 million token context window was supposed to make things better but its actually made
00:00:12it worse.
00:00:13This is why we went and researched optimizations we could find to make Claude code last longer.
00:00:18Before we move forward to how we can actually make the most out of the limits, let us first
00:00:22discuss how the plans and limits system of Claude actually works.
00:00:26This section is just for explaining for those who are not familiar with how the limits actually
00:00:30work.
00:00:31Claude has 2 paid plans which include the pro and max plan.
00:00:34Max is the most expensive one and pro is a cheaper plan with just $20 monthly.
00:00:38Both plans have access to different features that were not available in the free plan including
00:00:43Claude code, co-work and others.
00:00:45But they all follow the same rule.
00:00:46No matter which plan it is, each gives you a limited number of messages you can send within
00:00:51a 5 hour window and once that window ends, your message count resets.
00:00:55The number of messages you get differs by plan.
00:00:57The 5 hour window starts when you send your first message, whether its on Claude desktop,
00:01:01web or any Claude interface.
00:01:03After the window starts, each message you send is counted against the set limit of your plan.
00:01:08Now you might expect that the window only counts when you are actively using it.
00:01:11But even if you go idle in between and then use it heavily in the 5th hour, the window
00:01:15is still running and you would have to wait until the full 5 hours pass before your limit
00:01:20resets.
00:01:21The 5 hour window is also not dependent on your device.
00:01:23So if you are using more than one device with the same account, all usage will be counted
00:01:27within the same limit.
00:01:28Now for the pro plan, you get around 45 messages per 5 hour window.
00:01:32The max plan gives you 225 and the max 20 times plan, which is more expensive than the
00:01:37100 dollar plan, gives you 900 messages in the same window.
00:01:41These numbers can vary depending on the model you use as you get more messages with Sonnet
00:01:46and fewer with Opus.
00:01:47Now you might think that this number of messages sounds more than enough for your use case.
00:01:51But this is just a rough count and there are other factors that affect it.
00:01:54The first one is the model you are using.
00:01:56Opus models consume around 3 times more tokens for the same request than Sonnet because they
00:02:01are far more powerful and compute intensive.
00:02:03So if you are using Opus all the time, you won't get 45 messages in your 5 hour window
00:02:08and your limit will run out much faster.
00:02:10The pro plan has a lower limit overall.
00:02:12As for the max plan, while a single person might manage on it, max is usually purchased
00:02:16by organizations and distributed across team members, so it won't hold up with multiple
00:02:20people on board.
00:02:21We do the same at AI labs, we've purchased a max plan and distributed it across our team.
00:02:26Even with that, we still run out of the limit frequently which led us to research ways to
00:02:30make it last longer.
00:02:31The second factor is the type of task you are performing.
00:02:34Compute intensive tasks or tasks that require multiple tools consume a lot of tokens.
00:02:38So the window will run out much faster than usual and you might not even make it to 45
00:02:43messages on the pro plan.
00:02:44And on top of all that, Anthropic has recently reduced the session limit faster during peak
00:02:48working hours when many people are using the service heavily at once.
00:02:52So your Claude plan will run out even faster before you can get any actual work done.
00:02:56This is why now is the right time to learn how to make the most out of your window and
00:03:00use Claude effectively all day.
00:03:02But before we move forwards, let's have a word by our sponsor, Twin.
00:03:05If you've tried automating with tools like Zapier or N8N, you know the deal.
00:03:09Rigid workflows, constant breakdowns and hours wasted connecting apps.
00:03:13And local agents like Claudebot are security nightmares and way too expensive.
00:03:17Twin changes that.
00:03:18It's a no-code AI agent that actually does the work for you while you sleep.
00:03:21It connects to tools via APIs when they exist and when they don't, it builds integrations
00:03:26on the fly, giving you an infinite integration library.
00:03:29And if there's no API, Twin can just browse and interact like a human.
00:03:33On top of that, you get built-in access to tools like Perplexity, Gamma, VO3 and Nanobanana.
00:03:38They've just launched the Twin API.
00:03:40So you can trigger agents from anywhere and plug them into your existing workflows.
00:03:44And the best part?
00:03:45These agents learn.
00:03:46They fix themselves when something breaks, improve over time and run 24/7.
00:03:50Stop babysitting broken automations.
00:03:52Click the link in the pinned comment and check out Twin.
00:03:55Now you might already know that the Claude code source code was leaked.
00:03:58And a lot of people identified that there are many issues inside it that can make limits
00:04:02run out faster than intended.
00:04:04One of these is truncated responses staying in the context.
00:04:07So if you get an error message like a rate limit being reached, it can create a partial
00:04:12response.
00:04:13And regarding that, it retries while keeping the previous context along with the partial
00:04:17error-filled message.
00:04:18This bloats the context with unnecessary information and wastes tokens.
00:04:22The skill listings are also injected mainly for faster access, even though they don't provide
00:04:27much value because faster handling through the skill tool already exists.
00:04:31Similar to that, there are some other issues as well.
00:04:33Because of all this, a lot of people are complaining about Claude limits being hit faster than expected.
00:04:38So to counteract both the official limits and these hidden token drains, you have to take
00:04:43certain measures to make Claude code last longer when you're building your products.
00:04:47We share everything we find on building products with AI on this channel.
00:04:51So if you want more videos on that, subscribe and keep an eye out for future videos.
00:04:55We'll start with the tips you might have already heard from us if you've watched our previous
00:04:59videos.
00:05:00The first one is the clear command.
00:05:01Use this whenever you've completed a task and don't need the previous context anymore.
00:05:05For example, when you are done implementing the app and want to move to the testing phase,
00:05:09you don't need the earlier context.
00:05:11So it's better to reset it and start the next task with a fresh context window.
00:05:15But sometimes you do want to retain some of that context.
00:05:18In that case, you can run the compact command instead.
00:05:21It summarizes the whole interaction and frees up space with a summary in the context.
00:05:25The reason we want you to use these is because every time Claude sends a message, it includes
00:05:29the entire conversation so far, along with system prompts, your tools, and all previous
00:05:34conversation history.
00:05:35With each new message, this keeps growing, resulting in a bloated context window and higher
00:05:40token usage per message.
00:05:41Now even with compacting, if you ask side questions in the main window, you're still bloating it
00:05:46with unrelated content.
00:05:47So you can use the by the way command to ask a quick side question.
00:05:50It responds in a separate session context window.
00:05:53This side question wouldn't go with the next message you send, leading to fewer tokens per
00:05:57request.
00:05:58Now even though planning might sound like a token intensive task, you need to start your
00:06:02projects with it.
00:06:03This is because if you don't spend time planning, you will have to course correct Claude later
00:06:07when its implementation is not aligned with what you need.
00:06:10Spending tokens upfront on planning saves you from wasting far more tokens on corrections
00:06:14down the line.
00:06:15Sometimes Claude doesn't follow your instructions as you want to.
00:06:18In those times, we often prompt it again with the correct way of implementation.
00:06:22But instead of re-prompting, you can run the rewind command to restore the conversation
00:06:26and code to a previous point before the message where Claude didn't align and make the changes
00:06:31directly in the prompt.
00:06:32You can also double press the escape key to do the same thing.
00:06:35This removes the incorrect implementation from the context window and the wrong outputs don't
00:06:39get sent to the model.
00:06:41Now all of these commands help you save tokens during a session.
00:06:44But the bigger impact comes from how your project is structured in the first place.
00:06:47You might have already structured your projects using different frameworks like Beemad, SpecKit
00:06:52or more.
00:06:53But the majority of these frameworks are actually token intensive.
00:06:56So if you use them in your own app, expect your token limit to be reached faster.
00:07:00While these frameworks might sustain on max plans, they definitely won't on pro.
00:07:04Now even if you're not using frameworks, you might have set up your own.
00:07:07For creating Claude.md file you must have used the init command which goes through your codebase
00:07:12and creates a Claude.md file for you.
00:07:14It does create one, but it contains a lot of issues.
00:07:17This file is supposed to provide guidance to the AI agent, but it lists certain things that
00:07:20the AI already knows on its own.
00:07:22For example, the commands it shows are ones used to run every dev server and Claude already
00:07:27knows how to do that.
00:07:28Unless you have a different running flag for running the server, there's no need to add
00:07:31those in.
00:07:32As with the architecture, Claude can read file names and deduce what each file is about based
00:07:37on the name because it understands file systems and uses it for navigating around.
00:07:41So there's no real need for these kinds of instructions unless there are specific cases
00:07:45where additional guidance is required.
00:07:47If you're going to write your own Claude.md, it should ideally be less than 300 lines.
00:07:52The shorter the file, the better it will perform and the more focused Claude will be on what
00:07:56actually matters.
00:07:57It should act as a guiding file, not a detailed manual explaining how to do everything.
00:08:01Whatever you include should be generically applicable across the project, not specific
00:08:05details of each part all packed into one file.
00:08:08Include what Claude shouldn't do, any of your development practices and other similar instructions
00:08:13which Claude doesn't know by default only in the Claude.md.
00:08:16You need to configure this file properly because this file gets loaded into the context once
00:08:20every session and stays there.
00:08:22So unnecessary information in the context window means you're wasting tokens with each turn
00:08:27which aren't even needed upfront.
00:08:28For specific aspects of the project like database, schema or other areas where different rules
00:08:33are required, split them into separate documents and link them in the Claude.md file.
00:08:37This allows Claude to progressively pull in only the docs it actually needs.
00:08:41We also mentioned this in our previous video, creating project rules that are specific to
00:08:45certain paths helps Claude stay focused.
00:08:48This way, Claude only has relevant information in context and avoids unnecessary token usage.
00:08:53So you should also separate rules files for area specific logic so that Claude can load
00:08:57only what's required.
00:08:58You also need to make use of skills for repetitive workflows and add scripts and references so
00:09:03it can perform tasks more accurately.
00:09:05Skills help by progressive loading only the required part and this makes Claude stay focused
00:09:10on the relevant aspect of the task.
00:09:12Bundling with scripts help by not wasting tokens on the deterministic tasks which can
00:09:16be handled programmatically.
00:09:17The reason for separating files is simple.
00:09:19If Claude is working on one part, it doesn't need information about unrelated areas.
00:09:24But if everything is placed in the same Claude.md file, all of it will be loaded every time,
00:09:29leading to unnecessary token usage.
00:09:30You can also use the append system prompt flag to add specific instructions directly to the
00:09:35system prompt.
00:09:36The session starts with those instructions instead of putting everything into the Claude.md
00:09:40file.
00:09:41These instructions are temporary and will be removed once the session ends.
00:09:44Now this might sound like it's adding to the context, but it's actually more efficient than
00:09:48putting a one-time instruction in Claude.md.
00:09:51If you add it there, Claude keeps it in the context permanently, wasting tokens unnecessarily.
00:09:56With appending, you provide the instructions exactly when you need them.
00:09:59Also, if you are enjoying our content, consider pressing the hype button because it helps us
00:10:03create more content like this and reach out to more people.
00:10:06You also need to set the effort level of the model you are using.
00:10:10If you are not working on a task that requires much thinking, set it to low since the low
00:10:14setting saves tokens.
00:10:15By default, it's set to effort auto which means the model decides how much effort to
00:10:20use but you can manually change it.
00:10:21If your task isn't very complex, there's no need to use a high effort setting.
00:10:25Now as we mentioned earlier, Opus is the most token consuming model.
00:10:28So if you are working on straightforward tasks, switch to Haiku.
00:10:31If your task requires a reasonable level of thinking, use Sonnet.
00:10:34It might not be as powerful as Opus, but it is still efficient and saves more tokens.
00:10:39If you've configured multiple MCPs for a project and don't need a particular one, just disable
00:10:43it so it doesn't waste tokens by injecting unnecessary information into the context window.
00:10:48Another important step is creating hooks that filter out content that shouldn't belong
00:10:52in Claude's context window.
00:10:54For example, I've configured test cases for my project.
00:10:57When we run them, they report both passed and failed tests and all of that gets loaded
00:11:01into the context.
00:11:02But Claude's main concern is the failed tests since those are what need fixing.
00:11:05So you can create a hook that uses a script to prevent the passed test cases from entering
00:11:10the context window and only the failed ones get included.
00:11:13This saves a significant amount of tokens compared to injecting all test reports.
00:11:17You can configure hooks for many other tasks the same way to optimize token usage.
00:11:21Now aside from all of that, there are certain configurations you need to make in your .claud
00:11:25folder to improve performance.
00:11:27The first one is setting disable prompt caching to false.
00:11:30This makes Claude cache your most commonly used prefixes, which reduces token usage.
00:11:34Anthropic doesn't charge you for parts that are sent repeatedly, you only pay for the new
00:11:38content.
00:11:39You can also disable auto memory to prevent it from adding content to your context and
00:11:43increasing token usage.
00:11:44Auto memory is a background process that analyzes your conversations and consolidates useful
00:11:49information into memory files for your specific project.
00:11:52Disabling it means it won't track your habits but it will save tokens by not running in the
00:11:56background.
00:11:57There's another flag called disable background task which stops background processes from
00:12:00consuming tokens continuously.
00:12:02These include dream, memory refactoring and cleaning and background indexing.
00:12:06Turning this off helps save tokens because even if you're not actively chatting, these
00:12:10processes would still be working on your conversation.
00:12:13You should also disable thinking when it's not needed because thinking consumes a lot
00:12:16of context and wastes tokens extensively on tasks that don't even need it.
00:12:20Now this is different from the effort setting we discussed earlier.
00:12:23The effort setting controls how much reasoning Claude does within a response, so lower effort
00:12:28means less thinking, but it still thinks.
00:12:30Disabling thinking completely turns off the internal reasoning step and Claude just generates
00:12:34the response directly.
00:12:35So if your task doesn't need deep reasoning, disable thinking entirely.
00:12:39If it needs some reasoning but not a lot, lower the effort level instead.
00:12:43Finally configure max output tokens to a set number.
00:12:46There's no default, but limiting this controls how much the model generates.
00:12:50Set it lower if you want to save tokens aggressively or increase it if your task requires longer
00:12:55outputs.
00:12:56Now the Claude.md template and other resources are available in AI Labs Pro for this video
00:13:00and for all our previous videos from where you can download and use it for your own projects.
00:13:05If you've found value in what we do and want to support the channel, this is the best way
00:13:09to do it.
00:13:10The link's in the description.
00:13:11That brings us to the end of this video.
00:13:13If you'd like to support the channel and help us keep making videos like this, you can do
00:13:17so by using the super thanks button below.
00:13:19As always, thank you for watching and I'll see you in the next one.