Log in to leave a comment
No posts yet
Claude 3.5 Sonnet is programmed to be helpful and polite. If you simply ask it to be brief, it will still waste tokens on pleasantries out of habit. Models focus most on the beginning and the end of a prompt. Leverage this characteristic by assigning a "Caveman Engineer" persona at the very top of the system message, and explicitly forbidding greetings and summaries at the very bottom. Simply reinforcing instructions at the end can immediately save 30% in token costs per API call.
Reducing output doesn't mean you have to lower the model's intelligence. When writing code with complex logic, utilize the <thinking> tag. Force the detailed reasoning process to occur within internal tags, and apply the "Caveman" style only to the final <answer> tag that contains the actual result. As of 2026, Claude 4.6 Sonnet shows high pass rates at only 30% of the cost of Opus models. By handling the thought process with cheaper cached tokens and focusing expensive output tokens only on core code, you capture both accuracy and economy.
When told to speak like a caveman, the model occasionally breaks JSON syntax or omits necessary import statements. For solo developers, these parsing errors incur the cost of manual correction. Force the use of delimiters like ---BEGIN JSON--- in the system prompt, and implement a post-processing script using Python's re module to strip away Markdown code fences. This single guardrail blocks over 90% of manual interventions in automated pipelines.
As of 2026, the price for Claude 3.5 Sonnet output tokens is $15.00 per million tokens—five times more expensive than input. A developer making 100 coding requests daily can lower their monthly cost from approximately $54 to $31 by applying Caveman mode. Adjust the intensity based on the task: use "Lite" for simple edits and "Ultra" templates for bulk data transformation. Investing just 15 minutes in refining your prompts can save $276 annually. An efficient engineer doesn't have long conversations with AI; they extract the exact information density they need.