This Claude Skill Cuts Your Token Costs In HALF

BBetter Stack
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00If you like saving money or just hate the way LLM's talk, this one might be for you.
00:00:03It's a new trending skill called Caveman and it promises to cut up to 75% of output
00:00:07tokens while keeping full technical accuracy.
00:00:10All thanks to the wise words of Kevin.
00:00:12Why waste time?
00:00:13Say lot word when few word do trick.
00:00:16This works on Claude, Codex and wherever and it takes your outputs from filler word, too
00:00:20long not reading responses to a nice tldr with the same technical accuracy and it's even
00:00:24customizable and has extras like wenyan mode, terse commits, one line code reviews and an
00:00:29input compression tool.
00:00:30It may seem a little crazy at first but there's even some science behind this so let's jump
00:00:34in and take a look.
00:00:40So I was testing this out in Claude code earlier with a demo Next.js app I have that actually
00:00:44has a fake authentication system and I was simply asking can you explain how auth is implemented
00:00:48in this app.
00:00:49Now this is normal Claude code without the skill installed, you see right away it gets
00:00:53into filler words saying this is a simulated authentication system.
00:00:56We have our emdash that says no backend, no passwords, no real security, exist to demonstrate
00:01:00better stack rum user tracking.
00:01:03After this it then goes on to explain the core files and how it works and everything's just
00:01:06sort of in readable English.
00:01:08If we then ask the same question but this time use the caveman skill, you see it just gets
00:01:11straight to the point and is a lot more concise.
00:01:13The first sentence is demo only, client side auth, no real security, built for better stack
00:01:17rum tracking demos.
00:01:18It doesn't have any of those filler words, the emdashes or anything like that.
00:01:21It doesn't need to make a proper sentence, it can just tell you the technical information
00:01:25straight away.
00:01:26The same thing goes for the how it works section, the flow and the integration points.
00:01:29You can see here instead of saying how this works sort of in a plain English sentence,
00:01:33just says app load and then has an error to check local storage for the saved user.
00:01:36So it's just way more concise and that's what I care about to be honest.
00:01:39I don't really care about it being in plain English, I just wanted the technical information
00:01:43from it.
00:01:44That conciseness is actually the main reason I like this skill but it's other selling point
00:01:47is that this means it should reduce output tokens and therefore theoretically you can
00:01:51get more out of your Claude code subscription or even save money on your API tokens.
00:01:55But I do think there's a small catch here.
00:01:57This is the result of a comparison test I was running earlier where I was comparing the baseline
00:02:00Claude code response vs a terse one which is where I literally say to Claude code beacon
00:02:04size vs using our caveman skill.
00:02:07This was on 10 prompts and it's things as simple as how does git rebase differ from a git merge.
00:02:11Now you can see the results are very positive.
00:02:14When we use the caveman skill vs the baseline we actually have a 45% reduction in our output
00:02:18tokens and a 39% one against just saying beacon size to Claude code.
00:02:22That's obviously going to relate to cost as well, there's going to be a 45% saving there
00:02:26in the output tokens so the baseline costs around 8 cents for them and caveman costs around
00:02:314 cents.
00:02:32So everything looks quite good initially.
00:02:34Where things start to get a little more interesting though is when we factor in the cost of input
00:02:37tokens.
00:02:38Obviously now that we're using the caveman skill we're loading in a markdown file which
00:02:41has a lot more text in it than our single sentence prompts so for the baseline where we're just
00:02:45sending that sentence it's fractions of a cent but when we use our skill you can see it's
00:02:49now around 4 cents.
00:02:50If we then combine our input and output token costs you can see that on average caveman
00:02:54is actually 10% more expensive than the baseline because the savings that we made on those output
00:02:58tokens have been lost to our input tokens.
00:03:01But this doesn't mean it's a loss for caveman and that's because this is only true in very
00:03:04specific scenarios.
00:03:05It's only true if we're sending a single small prompt and we're not asking any follow up questions.
00:03:10If you start to ask follow up questions you can hit the prompt cash pricing and when we
00:03:14do that you can see things swing back in favour of caveman and we're actually making a 39%
00:03:19cost saving.
00:03:20We've cut down a bit of a rabbit hole there but it does prove there is some logic to using
00:03:23caveman and that's before we've even factored in another possible advantage which is that
00:03:27a study this year showed that constraining large models to brief responses improved accuracy
00:03:31by 26% on certain benchmarks.
00:03:34So maybe kevin was the smart one after all and you'd be smart for subscribing.
00:03:38You can try out this skill for yourself by using the vacel skill package and running a
00:03:41command like this and in here we can also see what it's asking the agent to do.
00:03:45We have some rules like drop articles like a, an and e, drop any filler words, drop pleasantries,
00:03:49drop hedging.
00:03:50Then we also have use short synonyms so use big instead of extensive and say fix instead
00:03:54of implement a solution for and we also have what we want to keep which is technical terms,
00:03:58code blocks and errors.
00:04:00After this we then have the pattern of how it should be structured so we should have
00:04:03a thing, an action, a reason and then a next step.
00:04:05So nice and concise.
00:04:07There's even intensity modes in here to change just how caveman it gets.
00:04:10You can see it ranges all the way from light up to ultra.
00:04:12I was using full since that is the default but you can see in ultra it abbreviates everything,
00:04:17it strips conjunctions, it uses arrows for causality and it uses one word when one word
00:04:21enough.
00:04:22There's also a wenyan mode which is using classical Chinese characters because they're actually
00:04:26the most token efficient.
00:04:27Unfortunately I can't read them so it's not much use to me.
00:04:30That's not even all that caveman has to offer and there's actually a few more skills for
00:04:33specific scenarios.
00:04:34We have caveman commit to write terse and exact messages in a conventional commits format.
00:04:38We have caveman review to write code review comments that are one concise line per finding
00:04:42and we also have a compress skill to take your natural language files and cavemanify them
00:04:46so you can reuse them with slightly less input tokens.
00:04:49Let me know in the comments if you like the sound of any of these and while you're down
00:04:52there subscribe and as always see you in the next one.

Key Takeaway

The Caveman skill for Claude cuts output token costs by 45% and improves accuracy by 26% by stripping filler words and pleasantries to focus strictly on technical data.

Highlights

The Caveman skill reduces output tokens by 45% compared to baseline Claude responses while maintaining full technical accuracy.

Restricting large language models to brief responses improves benchmark accuracy by 26% according to recent research.

The skill utilizes intensity modes ranging from light to ultra, with ultra mode stripping all conjunctions and using arrows for causality.

Wenyan mode employs classical Chinese characters to maximize token efficiency as they are the most dense characters available.

Output token savings translate to a 50% cost reduction, dropping baseline costs from 8 cents to 4 cents in specific tests.

Caveman review generates one-line code review comments per finding to minimize response length.

Timeline

Introduction to Caveman Skill

  • The Caveman skill promises to reduce output tokens by up to 75% while preserving technical accuracy.
  • This tool functions across multiple platforms including Claude and Codex.
  • Available features include Wenyan mode, terse commits, and one-line code reviews.

Output is transformed from long, readable English into concise, data-driven summaries. This approach removes the filler words that typically inflate token counts in standard LLM responses. Users can access specialized sub-skills like input compression to further optimize their workflows.

Comparison of Standard and Caveman Responses

  • Standard Claude responses include unnecessary descriptive phrases such as 'this is a simulated authentication system'.
  • Caveman mode replaces complete sentences with direct technical facts like 'demo only, client side auth'.
  • Logical flow is represented through arrows rather than explanatory conjunctions.

Testing on a Next.js app shows that standard responses rely heavily on em dashes and pleasantries. Caveman mode focuses purely on the technical implementation, such as local storage checks and app loading states. The primary goal is extracting technical information without the overhead of plain English grammar.

Token Cost Analysis and Savings

  • The Caveman skill achieves a 45% reduction in output tokens compared to baseline Claude code.
  • Using the skill is 39% more efficient than simply telling Claude to 'be concise'.
  • Direct cost savings reduce an 8-cent response to approximately 4 cents.

A comparison test involving 10 prompts, such as Git rebase versus merge, demonstrates the efficiency of the skill. While manual prompts for conciseness help, the structured Caveman skill consistently outperforms them. These savings scale significantly when dealing with high-volume API usage.

Input Token Trade-offs and Caching

  • Loading the Caveman markdown file as a prompt adds an upfront cost of approximately 4 cents.
  • Single-prompt scenarios may result in Caveman being 10% more expensive due to input overhead.
  • Prompt caching makes Caveman 39% cheaper when asking follow-up questions.

The initial cost of the skill's instructions can outweigh output savings on isolated, short queries. However, the economic advantage shifts back to Caveman when users engage in longer conversations where the instructions are cached. Beyond cost, a 2026 study suggests that forcing brevity actually increases model accuracy by 26%.

Customization and Advanced Modes

  • Instructional rules include dropping articles like 'a', 'an', and 'the' along with all pleasantries.
  • Ultra mode strips all conjunctions and uses one-word abbreviations wherever possible.
  • Specific sub-skills include 'caveman commit' for terse Git messages and 'caveman review' for code audits.

The skill can be installed via the vacel skill package and offers granular control over intensity. It prioritizes keeping code blocks, errors, and technical terms while replacing words like 'extensive' with 'big'. Additional tools like the compress skill allow users to minify natural language files for reuse with lower input token requirements.

Community Posts

View all posts