00:00:00If you like saving money or just hate the way LLM's talk, this one might be for you.
00:00:03It's a new trending skill called Caveman and it promises to cut up to 75% of output
00:00:07tokens while keeping full technical accuracy.
00:00:10All thanks to the wise words of Kevin.
00:00:12Why waste time?
00:00:13Say lot word when few word do trick.
00:00:16This works on Claude, Codex and wherever and it takes your outputs from filler word, too
00:00:20long not reading responses to a nice tldr with the same technical accuracy and it's even
00:00:24customizable and has extras like wenyan mode, terse commits, one line code reviews and an
00:00:29input compression tool.
00:00:30It may seem a little crazy at first but there's even some science behind this so let's jump
00:00:34in and take a look.
00:00:40So I was testing this out in Claude code earlier with a demo Next.js app I have that actually
00:00:44has a fake authentication system and I was simply asking can you explain how auth is implemented
00:00:48in this app.
00:00:49Now this is normal Claude code without the skill installed, you see right away it gets
00:00:53into filler words saying this is a simulated authentication system.
00:00:56We have our emdash that says no backend, no passwords, no real security, exist to demonstrate
00:01:00better stack rum user tracking.
00:01:03After this it then goes on to explain the core files and how it works and everything's just
00:01:06sort of in readable English.
00:01:08If we then ask the same question but this time use the caveman skill, you see it just gets
00:01:11straight to the point and is a lot more concise.
00:01:13The first sentence is demo only, client side auth, no real security, built for better stack
00:01:17rum tracking demos.
00:01:18It doesn't have any of those filler words, the emdashes or anything like that.
00:01:21It doesn't need to make a proper sentence, it can just tell you the technical information
00:01:25straight away.
00:01:26The same thing goes for the how it works section, the flow and the integration points.
00:01:29You can see here instead of saying how this works sort of in a plain English sentence,
00:01:33just says app load and then has an error to check local storage for the saved user.
00:01:36So it's just way more concise and that's what I care about to be honest.
00:01:39I don't really care about it being in plain English, I just wanted the technical information
00:01:43from it.
00:01:44That conciseness is actually the main reason I like this skill but it's other selling point
00:01:47is that this means it should reduce output tokens and therefore theoretically you can
00:01:51get more out of your Claude code subscription or even save money on your API tokens.
00:01:55But I do think there's a small catch here.
00:01:57This is the result of a comparison test I was running earlier where I was comparing the baseline
00:02:00Claude code response vs a terse one which is where I literally say to Claude code beacon
00:02:04size vs using our caveman skill.
00:02:07This was on 10 prompts and it's things as simple as how does git rebase differ from a git merge.
00:02:11Now you can see the results are very positive.
00:02:14When we use the caveman skill vs the baseline we actually have a 45% reduction in our output
00:02:18tokens and a 39% one against just saying beacon size to Claude code.
00:02:22That's obviously going to relate to cost as well, there's going to be a 45% saving there
00:02:26in the output tokens so the baseline costs around 8 cents for them and caveman costs around
00:02:314 cents.
00:02:32So everything looks quite good initially.
00:02:34Where things start to get a little more interesting though is when we factor in the cost of input
00:02:37tokens.
00:02:38Obviously now that we're using the caveman skill we're loading in a markdown file which
00:02:41has a lot more text in it than our single sentence prompts so for the baseline where we're just
00:02:45sending that sentence it's fractions of a cent but when we use our skill you can see it's
00:02:49now around 4 cents.
00:02:50If we then combine our input and output token costs you can see that on average caveman
00:02:54is actually 10% more expensive than the baseline because the savings that we made on those output
00:02:58tokens have been lost to our input tokens.
00:03:01But this doesn't mean it's a loss for caveman and that's because this is only true in very
00:03:04specific scenarios.
00:03:05It's only true if we're sending a single small prompt and we're not asking any follow up questions.
00:03:10If you start to ask follow up questions you can hit the prompt cash pricing and when we
00:03:14do that you can see things swing back in favour of caveman and we're actually making a 39%
00:03:19cost saving.
00:03:20We've cut down a bit of a rabbit hole there but it does prove there is some logic to using
00:03:23caveman and that's before we've even factored in another possible advantage which is that
00:03:27a study this year showed that constraining large models to brief responses improved accuracy
00:03:31by 26% on certain benchmarks.
00:03:34So maybe kevin was the smart one after all and you'd be smart for subscribing.
00:03:38You can try out this skill for yourself by using the vacel skill package and running a
00:03:41command like this and in here we can also see what it's asking the agent to do.
00:03:45We have some rules like drop articles like a, an and e, drop any filler words, drop pleasantries,
00:03:49drop hedging.
00:03:50Then we also have use short synonyms so use big instead of extensive and say fix instead
00:03:54of implement a solution for and we also have what we want to keep which is technical terms,
00:03:58code blocks and errors.
00:04:00After this we then have the pattern of how it should be structured so we should have
00:04:03a thing, an action, a reason and then a next step.
00:04:05So nice and concise.
00:04:07There's even intensity modes in here to change just how caveman it gets.
00:04:10You can see it ranges all the way from light up to ultra.
00:04:12I was using full since that is the default but you can see in ultra it abbreviates everything,
00:04:17it strips conjunctions, it uses arrows for causality and it uses one word when one word
00:04:21enough.
00:04:22There's also a wenyan mode which is using classical Chinese characters because they're actually
00:04:26the most token efficient.
00:04:27Unfortunately I can't read them so it's not much use to me.
00:04:30That's not even all that caveman has to offer and there's actually a few more skills for
00:04:33specific scenarios.
00:04:34We have caveman commit to write terse and exact messages in a conventional commits format.
00:04:38We have caveman review to write code review comments that are one concise line per finding
00:04:42and we also have a compress skill to take your natural language files and cavemanify them
00:04:46so you can reuse them with slightly less input tokens.
00:04:49Let me know in the comments if you like the sound of any of these and while you're down
00:04:52there subscribe and as always see you in the next one.