Hidden Features To 10x Your Hermes Agent Setup

Englishالعربية Deutsch Español 한국어

컴퓨터/소프트웨어경영/리더십AI/미래기술

Transcript

00:00:00ever since we started using hermes we've set up a lot of our workflows on it as we showed you in

00:00:04the previous videos it's been monitoring our apps coordinating the team on slack and more but the

00:00:09more we used it the more we ran into the same problems and it started to feel like our setup

00:00:13wasn't enough like we always do we started looking for ways to solve the issues but that's when we

00:00:18realized we didn't need to add anything else because everything we needed was already in hermes

00:00:23itself we just weren't using it to its full potential now if you're new to the channel then

00:00:27welcome we're a software company and this is ai labs where we show you how to optimize a business

00:00:32with ai using proven methods from our own team and in this video we're going through all the settings

00:00:37we change to improve our workflow so the first category is all about context and output limits

00:00:43hermes uses the dot hermes folder which holds all the configs and info that run the agent and all of

00:00:49that lives in one single file called config.yaml it's a really long file and it contains every config

00:00:55tied to the agent setup so if you're managing multiple profiles like we are each one gets its

00:01:00own separate folder and every profile has its own config.yaml file so the first one we'll change is

00:01:06max bytes by default this is set to 50 000 which means it pulls 50 000 characters from any tool output

00:01:13into the context window at once and the rest get cut off that became a problem when we were using it

00:01:17to monitor test runs because it wouldn't properly see the issues when they were long so we needed more of

00:01:23that output in the context window for that you can either set max bytes directly in the config.yaml file

00:01:29or change it to the number you need using the hermes config command once that's done it pulls that many

00:01:34characters into the context window from all tool outputs but you'll need to make sure the right

00:01:39profile is selected because the changes you make with the hermes config command show up in your active

00:01:44profiles config file another problem shows up when the agent reads a file with a lot of lines this happened to

00:01:50us when we connected hermes to our company's knowledge base where we have these large policy documents that

00:01:55are easily more than 2000 lines so when it pulled them in by breaking them into chunks it kept missing

00:02:01important details so we set it to 5000 and let the agent read more of the file at once there's another

00:02:07limit that becomes a problem when you have a lot of large markdown files if your document has long

00:02:12paragraphs stored as a single long line and that line is more than 2000 characters it won't be fully read so if you

00:02:18want to increase that you can change it with the hermes config command and set the character count you

00:02:23need that way the agent can read more than 2000 characters in a single line the first three settings

00:02:28mostly matter if you work with large files but this next one's important for everyone and that's the

00:02:33compression threshold by default the compression threshold is set to 50 which means once 50 of

00:02:39the context window is filled it compresses everything in there but a lot of other agents like codex and

00:02:45clawed code have this set to around 75 we ran into this ourselves while running hermes since we'd set

00:02:51it up with a smaller model on 200 000 context it compressed too early which isn't ideal when you

00:02:56actually want to get things done now models like opus or the gemini ones with a million token window

00:03:02would be fine here because compression only happens at 500 000 tokens for them but for models with 200 000

00:03:08context it happens at 100 000 tokens which causes issues on a long run so we set the compression threshold

00:03:15to 0.75 that way we can at least use 75 of the context window before it hits compression another

00:03:22setting is called target ratio which is set to 20 by default when hermes hits compress it doesn't

00:03:28compress the entire chat instead it leaves 20 of the conversation uncompressed and starts the new

00:03:34conversation with that uncompressed part along with the summary so that uncompressed 20 becomes your tail

00:03:40once the new compressed conversation starts now how much is left uncompressed depends on how big

00:03:45your context window is for a 1 million token context window 100 000 tokens get added and for a 200 000

00:03:52token context window only 20 000 tokens get added and this tail gives the agent more context on the

00:03:58previous conversation so it can pick up easily so 20 works for us on a 200 000 context window but if

00:04:04you're on a larger model you can use the config command to set it higher the ideal range is between

00:04:0910 to 80 percent the higher the number the more tokens stay in your context window but you'll also

00:04:15have less free room to work with as we talked about in the previous video the memory.md and user.md

00:04:21files that hermes keeps have a hard limit on how many characters they can hold after that hermes

00:04:26starts dropping information the agent thinks it no longer needs you can change these limits too either

00:04:32directly in your config.yaml file or through the hermes desktop app from the settings pane from there

00:04:37you can also change most of the settings we just talked about and if you're enjoying the video so far

00:04:42subscribe to the channel and hit the hype button this small gesture of support goes a long way for us

00:04:47the second category is sub-agents on hermes you're limited to spawning three sub-agents at once and when we

00:04:53were working on our projects we hit this limit and things ended up taking longer than they needed to

00:04:58in the config this limit comes from the max concurrent children value which is set to three by default

00:05:03since we were running into issues we use the config command and change this value to five from that

00:05:08point on whenever it spins up sub-agents it can run up to five of them together but this is token

00:05:13heavy so if you're working with a lot of sub-agents cost is something you need to watch out for now in

00:05:18claude code each sub-agent can create its own sub-agents and that's helpful when you're working

00:05:23with a large folder where one agent can branch out into more agents to explore nested repos but hermes

00:05:28blocks this with the max spawn depth flag which is set to one by default and that stops any sub-agent

00:05:34from creating more so you can push the max spawn depth above one after that your sub-agents can create

00:05:39their own sub-agents too there's another sub-agent feature called auto-approve which is set to false by

00:05:44default this means the sub-agents you spawn only inherit the parents permissions and they might

00:05:50still get blocked by permission prompts so if you want to change this you can set it to true directly

00:05:55here once you've done that your sub-agents can run in auto-approve mode and won't get blocked by any

00:06:00permission prompts sub-agents handle simple tasks like web searches that don't need the heavy lifting of

00:06:06your main model but running them on that powerful model burns a lot of cost for work like this so you

00:06:11can change the model used for any sub-agent and switch it to a smaller one which saves you tokens

00:06:16and if that smaller model is from a different provider you can add it using the hermes auth command

00:06:21which lets you pull in models from whichever provider you want but before we move towards

00:06:25the settings that save us costs let's have a word by our sponsor helix every week there's a new ai

00:06:31tool that helps you build apps websites and products faster than ever but nobody talks about what happens

00:06:37before you start building most people jump straight into coding with a half-baked idea and end up

00:06:42rebuilding the same thing three times helix is an ai guided product planning platform that takes a rough

00:06:48idea and turns it into a structured plan you can actually hand off to a developer or a stakeholder

00:06:53you describe your idea in one sentence and five ai specialist agents go to work covering validation

00:06:59market research product development business modeling and growth strategy it pulls live market data in real

00:07:05time connects to over 20 tools you already use like notion jira and air table and the canvas adapts to

00:07:11your actual product needs instead of forcing you into a generic template when you're done you export an

00:07:16investor ready pdf blueprint that's actually built on real research not guesswork click the link in

00:07:21the description and try helix for free the third category is cost these are basically the settings that save you

00:07:27tokens when you first set up hermes you give it the models for different purposes but you can set up

00:07:32auxiliary models as well auxiliary models are basically the cheaper faster ones that hermes uses for

00:07:38background subtasks that way the expensive main model you've set up isn't wasted on small tasks that

00:07:43aren't that complicated by default when you leave the auxiliary models empty hermes falls back to the

00:07:48lowest cost model in your config since we were using open router it was set to gemini flash so these

00:07:54cheaper models could handle tasks behind the scenes so if you want to save costs you can set up cheaper

00:07:59models manually they can save you a lot of money on tasks like web searches or compression if your main

00:08:05model is something like opus you probably don't want to waste it on trivial tasks on saving costs another

00:08:11thing you can configure is the effort level of the model you're using effort is basically how much

00:08:15thinking the model puts into a task if the effort is higher even though the output will be better but the

00:08:20tokens consumed will also be higher you can set it to low or minimum so the agent doesn't waste tokens

00:08:26you can also turn off thinking completely if you don't want to use effort levels the fourth category is

00:08:32workflow and it covers a bunch of other features that make hermes so much better to use the first one is

00:08:37quick commands if you've been using clawed code you might know slash commands where you add custom reusable

00:08:43instructions they do a similar job but hermes handles them differently because it doesn't use prompt

00:08:48instructions the way clawed code and other agents do quick commands come in two types the first one is

00:08:54exec which runs a terminal command and drops its output into the context window this is helpful for

00:08:59creating scripts that run a whole series of commands from just a single one for example for git operations

00:09:04you can set up a custom exec command and run it whenever you want the agent to use those commands the

00:09:09other type is alias this is less of a custom command and more of a way to rename existing ones for

00:09:15example if you want a quicker way to run compress you can set an alias to just a single letter and run it

00:09:20fast there's no direct way to set this up so you actually have to do it in config.yaml or you can just

00:09:25ask claude code or hermes to do it for you and it'll make the changes itself aside from that hermes has

00:09:31a checkpointing mechanism too a checkpoint is basically a saved state of your files at a certain point in

00:09:36time you can roll back to it if an experiment breaks something it's turned off by default so you'll have to

00:09:41set it to true once checkpointing is on you can use the rollback command to go back to a previous

00:09:46checkpoint another thing you can change is background process notifications if you set this to all you'll

00:09:52get a notification for everything hermes is doing in the background you can change it if you don't want

00:09:56those there's also a flag called hermes ephemeral system prompt which lets you add content into the

00:10:01system prompt of the agent this is an environment variable and the instruction you add in as the value it

00:10:06becomes part of the system prompt so you can add whatever instructions you want this way but this

00:10:11prompt only applies to the session you open in that terminal and it doesn't stick around long term

00:10:16so it's mainly useful for one-time use cases you can also run hermes in yolo mode which is the same

00:10:22as the dangerously skip permissions mode in claude this stops the agent from sitting there waiting for

00:10:27you to approve every action you can turn it on with the yolo command or by launching hermes with the

00:10:32yolo flag in the terminal at one point we ran into an error and weren't sure if it was coming from hermes

00:10:37itself or from some config we'd set up that's when we came across the ignore user config mode it strips

00:10:43the agent of all the configs in your dot hermes folder and runs it in isolation so you can figure

00:10:48out what's actually causing the error and fix it you can also switch between the multiple personalities

00:10:53that come with it and have fun with the different voice styles already in the configs using the personality

00:10:59command since a lot of people have been asking about it we've put together a starter pack with all the

00:11:03guides and resources you'll need it's available inside our community ai labs pro so if you'd like

00:11:09to support the channel and get access to this resource pack be sure to check it out the link is in the

00:11:14description that brings us to the end of this video if you'd like to support the channel and help us keep

00:11:19making videos like this you can do so by using the super thanks button below as always thank you for

00:11:24watching and i'll see you in the next one

Key Takeaway

Optimizing Hermes configuration files—specifically adjusting context limits, compression thresholds, and sub-agent concurrency—significantly improves agent performance and efficiency for complex workflows.

Highlights

Increase 'max bytes' in config.yaml to pull more than the default 50,000 characters from tool outputs into the context window.
Setting the compression threshold to 0.75 instead of the default 0.50 allows the agent to utilize 75% of the context window before triggering compression.
Raise the 'max concurrent children' limit from 3 to 5 to run more sub-agents simultaneously, though this increases token consumption.
Enable 'auto-approve' for sub-agents to bypass permission prompts, allowing them to inherit parent permissions directly.
Configure auxiliary models in the config to offload trivial background tasks to cheaper, faster models like Gemini Flash.
Use 'checkpointing' and the 'rollback' command to revert file states to previous saved versions if an experiment fails.

Timeline

Context and Output Management

Default context limits often truncate important tool output and file data.
Increasing the compression threshold improves performance on models with 200,000 token windows.
Target ratio settings determine how much conversation context remains uncompressed after reaching the threshold.

Managing the config.yaml file is critical for handling large files and long-running tasks. Increasing 'max bytes' ensures the agent reads entire tool outputs rather than truncating them at 50,000 characters. Modifying the compression threshold to 0.75 prevents premature data loss on smaller-context models. Adjusting the target ratio helps balance the amount of retained context tail versus available working space.

Sub-Agent Configuration

The default limit of three concurrent sub-agents can create bottlenecks.
Max spawn depth determines whether sub-agents can create their own recursive sub-agents.
Auto-approve mode removes recurring permission prompts for sub-tasks.

Sub-agents allow for parallelized workflows, but default constraints limit their utility. Raising 'max concurrent children' enables up to five parallel tasks. Changing 'max spawn depth' from the default of 1 allows for deeper, nested task execution. Enabling 'auto-approve' prevents the agent from stalling on permission prompts for routine operations.

Cost Optimization and Workflow Features

Auxiliary models handle trivial background tasks to save costs on primary expensive models.
Effort levels and thinking toggles directly impact token consumption.
Quick commands, checkpointing, and ephemeral system prompts customize behavior for specific sessions.

Cost efficiency is managed by delegating background tasks to cheaper models, avoiding the use of expensive primary models like Opus for trivial queries. Model 'effort' levels provide control over reasoning depth versus token usage. Workflow tools like custom quick commands, persistent file checkpointing, and Yolo mode for non-interactive execution further tailor the agent to specific operational requirements.

Community Posts

Write about this video